Health and homelessness in Scotland: research

Study exploring the relationship between homelessness and health.


Annex D: The H2H Personal Identifiable Data, the Matching Process and Creating the Controls.

H2H: Personal identifiable data of homelessness applications

In order to conduct the study across Scotland, it was first necessary to obtain personal identifiable information – first name, last name, date of birth, gender and postcode - for people who had made homelessness applications ( HL1) and sought assistance on housing related issues (PREVENT1). This information is not submitted to the Scottish Government as it is not required for monitoring of homelessness legislation at the national level. However, it is held on the same IT systems used to generate the HL1 returns as Local Authorities need this information for case management purposes.

In order to carry out this study, Local Authorities were invited to submit personal identifiable information to the National Records of Scotland ( NRS) Indexing Service for all people with HL1 homeless applications and for all people with PREVENT1 returns, in accordance with the H2H data specification [68] . Due to uncertainty around the quality of the data and how much data each Local Authority would submit, all HL1 and PREVENT1 data was asked for.

The H2H data specification was created specifically for this data linkage exercise. This involved signing Data Processing Agreements with each Local Authority to enable data delivery and subsequent data processing. The 32 local authority specific H2H datasets were combined to create a single dataset.

The H2H dataset contains the following information:

  • Local Authority Code
  • PREVENT1 Approach Reference Number
  • HL1 Application References Number
  • First Name
  • Middle Name
  • Last Name
  • Date of Birth
  • Gender
  • Postcode of current address or last settled address

Using this personal identifiable information, the H2H records could be linked back to the HL1 and PREVENT1 datasets through the application reference numbers which were common to both datasets. A person could appear multiple times within the H2H dataset if they appeared in more than one homelessness application or PREVENT1 approach.

It is important to note that the H2H data only contains personal identifiable information from HL1 and PREVENT1 cases. There is no payload data relating to these cases. As such, it is not possible to tell from the H2H data alone which people were assessed as statutory homeless and which people were assessed as not statutory homeless.

Matching health and homelessness data

In order to create a link between homelessness data and health data, a separation of function approach was used. A separation of function approach is used to ensure that no single party or individual has access to all of the data. It involves a trusted third party – in this case the Indexing Service at the National Records of Scotland [69] – performing the matching exercise with only the necessary personal identifiable information required for matching. Importantly, the party performing the matching does not have access to the payload data. Following the matching exercise, the matched results can be re-combined with the payload data and the personal identifiable data is removed. This dataset is then accessed by the analysis party in a separate and secure environment.

To achieve this, the H2H data, which only contains the HL1 and PREVENT1 references numbers beyond personal identifiable information, was submitted to the NRS Indexing Service. Using this personal identifiable information, each person in the H2H dataset was matched to the Research Indexing Spine ( RIS) ( Table 2.2). The Research Indexing Spine is a population compiled by NRS that uses information based on GP Registrations as at June 2016 (for this study) as a snapshot of the Scottish population.

All health datasets in Scotland contain the Community Health Index ( CHI) number – a variable used to trace an individual's usage of various health services. The RIS does not contain the CHI number. However, the NRS Indexing Service has access to a separate lookup table which links the people on the RIS to their CHI Number. Once the H2H data was linked to the RIS, this lookup table was used to obtain the CHI Number for each person.

Table D.1: Results from National Records of Scotland's Indexing Service

Number of H2H Input Records

1,031,841

Number of Input Records with valid LA code:

1,031,824

Number of Matches to Research Indexing Spine

973,578

94.4%

Step

These records were matched to the Research Indexing Spine as follows:

0

Exact matches on Forename, Surname, DOB, Sex & postcode

423,385

1

Exact matches on Forename & Surname Initials, DOB, Sex & postcode

32,470

2

Exact matches on DOB, Sex & postcode

5,864

3

Exact matches on Forename, Surname, DOB, Sex & 2-character postcode

223,355

4

Exact matches on Forename, Surname, DOB, & Sex

224,529

5

Exact matches on first 4 characters of both Forename & Surname, Year of Birth, & Sex

55,373

6

Exact matches on first 4 characters of both Forename & Surname, Month & Day of Birth, & Sex

7,754

7

Exact matches on first 4 characters of both Second Forename & Surname, Year of Birth, & Sex

332

8

Exact matches on first 4 characters of both Third Forename & Surname, Year of Birth, & Sex

501

9

Exact matches on first 4 characters of both Forename & Alternative Surname, Month & Day of Birth, & Sex

15

973,578

Number of Matches to CHI Lookup

969,667

99.6%

Number of unique persons ( CHI numbers) amongst matched homeless records

564,501

Age-sex Matched Controls

Number of age-sex matched CHI numbers from SIMD1 cohort

563,207

99.8%

Number of age-sex matched CHI numbers from SIMD5 cohort

564,501

100.0%

Total Index Numbers provided for health data

1,692,209

Table 2.3 shows the results from the linking exercise, provided by NRS's Indexing Service. In total, over 1 million records were received in the H2H dataset, containing identifiable information relating to people who made homelessness applications ( HL1) and sought assistance on housing related issues (PREVENT1). Of these, just over 970,000 were matched to the Research Indexing Spine: a 94.4% match rate. The majority of these (99.6%) were successfully linked to the CHI lookup and were assigned their CHI number.

As mentioned previously, by design, the H2H is known to contain duplicate individuals as one might appear multiple times within the H2H dataset if they appeared in more than one homelessness application or PREVENT1 approach. Using the CHI number obtained from the CHI lookup, the dataset was then de-duplicated to identify 564,501 unique individuals.

Creating control groups

In order to measure and understand the impacts of health activity on homelessness, as well as homelessness on health activity, it is necessary to create a control group to compare with the homelessness group.

The Indexing Service created two control groups for the study by linking the H2H dataset (564,501 unique individuals) to individuals on the RIS on age (assumed age at 31 st March 2015 [70] ) and sex. The first control group was defined by only containing individuals living in the 20% most deprived areas of Scotland ( SIMD1), and the second control group as only containing individuals living in the 20% least deprived areas of Scotland ( SIMD5). Area deprivation was calculated using the Scottish Index of Multiple Deprivation ( SIMD) 2012, based on the postcodes on the Research Indexing Spine at June 2016.

Note: an individual is not able to be in the H2H dataset as well as in one of the two control groups. Therefore, homeless individuals residing in these SIMD1 and SIMD5 areas are not available for selection as a control. In essence, the sampling frame of potential controls is the true underlying SIMD1 and SIMD5 populations minus all H2H individuals. The impact of this on the study is explored in Section 2.9.1.

As a result, just under 1.7 million people were selected from the Scottish population to be used in the study (Table 2.3, 'Age-sex Matched Controls'). They can be classified in one of the following three groups:

  • 564,501 unique individuals sourced from H2H
  • 563,207 unique individuals from the 20% most deprived SIMD1 areas, with the same age and sex distribution as the H2H group. However, the size of the H2H group was so large that there were not enough people on the RIS in SIMD1 areas, with the same age and sex breakdown, to create a complete control group.
  • 564,501 unique individuals from the 20% least deprived SIMD5 areas, with the same age and sex distribution as the H2H group.

It is important to note that the matched controls were identified randomly by age and sex only. No other factors were controlled for between the three different groups. Known factors exist beyond age and sex that will influence one's level of health activity, such as, for example, economic activity and household structure. It likely there will be differences in these between those individuals in the homeless group, and those in the 20% most and least deprived quintiles. As this study does not control for these factors, there may be biases in the results which could be explained by these factors.

The dataset containing these individuals was then transferred by the NRS Indexing Service to the National Services Scotland National Safe Haven, a secure environment located at the Farr Institute, Scotland. Here, the study's analysis team accessed the de-identified data and conducted analyses.

Contact

Back to top