Chapter 2: Study Methodology
This chapter explains how the study was prepared and conducted. This includes introducing the study's research questions, the homelessness and health activity datasets, the creation of the analysis cohorts, limitations and implications on findings, and the study's approach to answering the research questions.
2.1 Study design
2.1.1 The study's health and homelessness framework
This study aims to better understand the relationship between health and homelessness. During the analysis, the following research questions arose to describe the relationship between health and homelessness:
1. How does health prior to the first homelessness assessment influence homelessness?
2. Does the point at which someone becomes homeless have an impact on one's health? Is a crisis with a health component involved?
3. How does homelessness influence health?
4. Is there a relationship between health, homelessness, and area-based deprivation?
These correspond to the following statements:
1. Health-related issues may cause situations leading to homelessness.
2. Some short-term crises could be linked to both health activity and homeless episodes.
3. Homelessness may cause health problems.
4. Some other factors ( e.g. income) affect both health and the likelihood of becoming homeless.
The relationship between health and homelessness can be illustrated by comparing health activity in relation to the date of homelessness. There are three distinct time periods: the time period before the homelessness assessment (t A) (when an individual applies to a Local Authority and is assessed as statutory homeless), the time period at and around the homelessness assessment (t B), and the time period after the homelessness assessment (t C). During each time period, individuals have an underlying state of health and create measureable health related activity. Each of the four statements above will result in different health activity profiles across the three time periods (Figure 2.1).
Figure 2.1: Profiles of health activity for homeless and non-homeless people. Each numbered plot shows the activity profile relating to the corresponding research questions and statements above.
In (1) it may be that underlying health problems (which would clearly impact on health activity) could impact on, for example, employment and relationships. In turn, these could impact on the likelihood of becoming homeless. This would be observed as an increase in health activity over time prior to homelessness assessment among the people that become homeless.
In (2) it may be that one experiences a short-term health-related crisis that results in homelessness. There may be temporary health related problems that could lead people finding themselves homelessness. When we consider that some of the health activity explored in the study is related to drugs or alcohol this seems plausible.
In (3) we might imagine that the stress of homelessness could cause health problems, which would then result in increased health activity after homelessness assessment. The length of time before health activity returns to the level it was prior to assessment would indicate how long the health impacts typically last.
In (4) it may be that there is a wider range of underlying factors that affect both health and homelessness. In this case we would expect to see a higher health activity level for homeless people in all three time periods.
It is likely that all these effects would be present to some degree (Figure 2.2). However the overall shape of the trend would indicate which effects are of the greatest influence.
Figure 2.2: Cumulative health activity, by effect of statements 1–4 corresponding to each research question
2.1.2 Study design options
A number of study designs exist to examine relationships. These include:
Randomised Controlled Trials ( RCT)
A RCT involves selecting a group of participants and randomly allocating them to receive some sort of intervention in the experiment ('treatment group').Those that were not assigned to the treatment group are included in the 'control group'. RCTs aim to measure and compare the outcomes of the participants who receive the intervention, and those that do not.
Cohort studies begin with a group of people. An example of this might be an investigation into the relationship between smoking and its' impact on health. The people in the cohort are grouped by whether or not they smoke. The whole cohort is followed over time to see how their health is affected. The health differences between smokers and non-smokers are compared. This provides evidence as to how smoking affects health.
Case Control studies
A case-control study begins with the selection of cases (people with an outcome) and controls (people without the outcome). An example of this might be people with a particular kind of cancer (the cases). The controls would be a sample of people who had not developed the particular cancer. Factors which are a potential cause of the cancer would then be determined for both cases and controls, e.g. diet, level of physical activity, where they live. How these factors differ between the cases and their controls is then calculated.
A RCT study design, thought of as one the best designs to analyse the relationship between exposure and outcome, is not possible for this study as one cannot intervene and, at random, create homeless or differing health individuals.
2.1.3 Chosen study designs
To answer the research questions, the study adopts both a Cohort and Case Control design.
A Case Control design is used to understand whether health, as measured by health activity, influences homelessness. The case cohort is constructed using available homelessness data in Scotland. The case cohort is defined by including only those people being assessed as homeless. That is, all people in the case cohort exhibit the outcome of homelessness. A control data of equal size is constructed from the population of Scotland, linked by age and sex. It consists of people who have not been assessed as homeless. The study then analyses whether health activity is a potential cause of homelessness by measuring health activity differences between the cohorts.
A Cohort design is used to understand whether homelessness influences health, as measured by health activity. Compared with the Case Control design, homelessness is no longer the outcome, rather, it is the causal variable. Instead, health, as measured by health activity, is the outcome. Similarly, both case and control cohorts are constructed based on whether or not people have been assessed as homeless. The study analyses whether homelessness is a potential cause of further health activity.
To address the final research question relating to area-based deprivation, the study creates two control cohorts for each design. One control cohort is constructed with people from areas with high area-based deprivation, while the other is based on people from areas with low area-based deprivation. In this way the study can make inferences on deprivation at the individual level (using homelessness as a proxy) versus area-based measures of deprivation.
2.2 Homelessness data
This section introduces the source of homelessness data used in the study, what alternatives were available, and how representative the data is of homelessness in Scotland. It is from this data that the people are chosen to form the case cohort used in both the Case Control and Cohort designs. This homelessness data is used as the outcome variable in the Case Control design, and as the explanatory variable in the Cohort design.
2.1.1 HL1 returns
When someone is homeless in Scotland, they may apply to their Local Authority for assistance under section 28 of the Housing (Scotland) Act 1987. When this happens, an HL1 return is completed and submitted to the Scottish Government Housing Statistics branch. As such, the Scottish Government HL1 datasets is the main record level source of administrative data on homelessness in Scotland. This system has been in place nationally since December 2001, although some Local Authorities began recording earlier in 2001. There are three distinct stages to each case recorded on the HL1: the application stage, the assessment stage, and the outcome stage. Returns to the Scottish Government cover complete stages. However, information from earlier stages may be updated as an application progresses  . For more information, see the Scottish Homelessness Statistics webpage.
Between 4 June 2001 and 7 November 2016, there were 562,255 applications assessed as homelessness, or threatened with homelessness, by Local Authorities in Scotland. Threatened with homelessness assessments - that is where the household was likely to become homeless within two months - accounted for 13% of all those assessed as homeless in 2002/3, reducing to 6% in 2014/15  .
Alternative measures of homelessness in Scotland exist and were considered ( Annex A).
2.1.2 Definition of homelessness and repeat homelessness used in this study
For the purpose of this study, homeliness is defined by section 24 of the Housing (Scotland) Act 1987 (as amended)  . Broadly this defines someone as homeless if they are:
- sleeping on the streets
- staying with friends or family
- staying in a hostel or bed and breakfast hotel
- living in overcrowded conditions
- at risk of violence in the home
- living in poor conditions that affects their health.
- living in a house that is not suitable for them because they are sick or disabled.
Under the European Typology of Homelessness and housing exclusion ( ETHOS )  , the definition of homelessness used in the research covers people classed as roofless, houseless, in insecure accommodation and in inadequate accommodation.
For the purpose of this project, a person is defined as experiencing repeat homelessness if they appear in two or more homeless applications, where these applications have been assessed as homeless or threatened with homelessness during the period of the study.
This is different from the definition of repeat homelessness which is used in Scottish Government Homelessness statistics  . This considers whether a homelessness assessment is a repeat assessment. To be classed as a repeat homelessness assessment the applicant household must:
1. be assessed as homeless or threatened with homelessness in both applications;
2. the previous case must have been closed within 12 months of the current assessment and;
3. the adults and family circumstances also need to be the same in both applications.
2.1.3 Representativeness of homelessness data in the Study
The HL1 data collection records all people who apply to Scottish Local Authorities for assistance under the Homelessness Legislation. As such it does not record people who may be homeless but who do not apply to their Local Authority for assistance.
The number of homeless applications in Scotland has been influenced by policy changes. The increase in homelessness between 2000/1 and 2006/7 was, in part, a consequence of Scottish homelessness legislation which extended councils' duties to non-priority homeless households. Applications peaked at just over 60,000 applications during 2005/6. The priority need test was abolished on 31 st December 2012, giving all unintentionally homelessness households the right to settled accommodation, and not just those assessed as also having a priority need. Meanwhile, the number of homelessness applications decreased in recent years – from 2011/12 onwards - mainly due to the impact of the introduction of Housing Options services in Scottish local authorities with a focus on prevention  .
The focus on prevention has led to an increasing proportion of homeless people with more complex needs. The following support needs are identified at the time of homelessness assessment (as recorded on the HL1 return). These support needs cover:
- Mental health problems
- Learning disability
- Physical disability
- Medical conditions
- Drug or alcohol dependency
- Basic housing management / independent living skills
If a person has support needs, any combination of the above can be selected. Between 2007/08 and 2013/14, the proportion of households assessed as homeless (or threatened with homelessness) with one or more support needs remained stable at around 32% to 35%. However since 2014/15, this proportion has increased from 29% to 44% in 2016/17  . This may be due to better completion of this question by LAs or in an increase in complex needs amongst homeless people.
In the study we have included records for all those assessed as homeless or threatened with homelessness – that is those likely to become homeless within two months. The proportion assessed as threatened with homelessness is relatively small (as mentioned previously 13% of those assessed as homeless in 2002/3, reducing to 6% in 2014/15). However, some of those assessed as threatened with homelessness may not have gone on to become homeless – their homelessness may have been prevented. As such, there may be a minority of people in our study who did not go on to become homeless.
Nonetheless, the HL1 dataset represents the most comprehensive dataset available covering homelessness in Scotland, at an individual record level.
A complete description of the homelessness data used in this study is contained in Annex B.
2.3 Health data
This section introduces the NHS health activity data used in the study. Firstly, this section addresses some limitations of using activity data, next a summary of the activity datasets is presented, followed by an in-depth look at each.
2.3.1 Health data limitations
Health Activity Data as a proxy for health need
Use of health services – as measured through health activity - is recognised as being an imperfect proxy of health need. The Inverse Care Law  is the principle that the availability of good medical or social care tends to vary inversely with the need of the population served. On this basis, the availability of services, and ease of access to these services, has an impact on whether a person appears in these datasets.
According to Audit Scotland  , findings from the Deep End project indicate that GPs working in the most deprived areas of Scotland face significant challenges in tackling health inequalities. For example, GPs in these practices reported that:
- they treat patients with higher levels of multiple health problems than GPs working in less deprived areas
- public sector budget reductions and changes to the benefits system were increasing patients' visits to GPs and having detrimental effects on patients' mental and physical health
- they are constrained by a shortage of consultation time with patients which limits the opportunity to provide appropriate treatment, advice and referral to suitable services.
These findings suggest that the Inverse Care Law is potentially in effect in Scotland. However, the extent to which the Inverse Care Law applies to Scotland is not investigated in this study, but should be kept in mind.
Under-coverage of health data: Private health care
The health activity data considered in the study is only sourced from the NHS. Health activity in the private sector  is not covered by the data used in the study. By having a potential under-coverage present in the health activity data, a bias could be introduced.
If under-coverage is present, then it is most likely to affect the health activity of those people that are more inclined to access private health care – i.e. those in the least deprived areas. One would presume that homeless individuals are unlikely to use private health care. Therefore, this limitation would increase health activity among homeless individuals relative to their non-homeless controls.
However, the use of private health care accounts for a very small proportion of health activity in Scotland. It is therefore assumed that the effect of this on the results is negligible.
2.3.2 Health activity data sources in the study
The study requested and was approved to use the following health activity datasets. They were all provided by NHS National Services Scotland, Information Services Division  , apart from the Deaths dataset, which was provided by the National Records of Scotland.
- Accident and Emergency data (A&E2)
- Inpatients and Day Cases ( SMR01)
- Outpatient data ( SMR00)
- Prescribing Information System ( PIS)
- Mental Health Inpatient and Day Case ( SMR04)
- Scottish Drug Misuse Database ( SDMD) ( SMR24 and SMR25a)
- National Records of Scotland – Deaths
Further details on each of these datasets in contained in Annex C.
Table 2.1 presents a summary of the health activity datasets, their data reference periods, and the number of records that are attributed to the study's analysis population.
The proportion of population health activity that these records correspond to is discussed in detail in section 2.6.2.
Table 2.1: Summary table of Health Activity Data included in the study
|Study data set name||Data period||Records*|
|Accident and Emergency data (A&E2)||1 January 2011 to 31 December 2016||2,118,143|
|Inpatients and Day Cases ( SMR01)||1 April 2002 to 31 March 2015||2,266,144|
|Outpatient data ( SMR00)||1 April 2002 to 31 March 2015||9,014,864|
|Prescribing Information System ( PIS)||14 January 2009 to 31 March 2015||9,488,022|
|Mental Health Inpatient and Day Case ( SMR04)||1 April 2002 to 31 March 2015||100,055|
|Scottish Drug Misuse Database ( SDMD) ( SMR24/ SMR25a)||1 April 2002 to 31 March 2015||89,281|
|National Records of Scotland – Deaths||1 April 2002 to 31 March 2015||23,718|
*Note: the number of records correspond to those that are attributed to the study's analysis population. This is defined in section: 2.5.4 – The Analysis Cohorts.
2.4 Data linkage
In order to conduct the study across Scotland, it was first necessary to obtain personal identifiable information – first name, last name, date of birth, gender and postcode - for people who had made homelessness applications ( HL1). This information is not submitted to the Scottish Government as it is not required for monitoring of homelessness legislation at the national level. This information was collected from local authorities via the H2H return and submitted directly to the NRS Indexing Service.
Each person in the H2H dataset was matched to the Research Indexing Spine ( RIS). In total 564,501 unique individuals were identified.
The Indexing Service created two control groups for the study by linking the H2H dataset (564,501 unique individuals) to individuals on the RIS on age (assumed age at 31 st March 2015  ) and sex. The first control group was defined by only containing individuals living in the 20% most deprived areas of Scotland ( SIMD1), and the second control group as only containing individuals living in the 20% least deprived areas of Scotland ( SIMD5). Area deprivation was calculated using the Scottish Index of Multiple Deprivation ( SIMD) 2012, based on the postcodes on the Research Indexing Spine at June 2016.
As a result, just under 1.7 million people were selected from the Scottish population to be used in the study ( Table 2.2, 'Age-sex Matched Controls'). They can be classified in one of the following three groups:
- 564,501 unique individuals sourced from H2H
- 563,207 unique individuals from the 20% most deprived SIMD1 areas, with the same age and sex distribution as the H2H group. However, the size of the H2H group was so large that there were not enough people on the RIS in SIMD1 areas, with the same age and sex breakdown, to create a complete control group.
- 564,501 unique individuals from the 20% least deprived SIMD5 areas, with the same age and sex distribution as the H2H group.
Further information on the H2H dataset, the matching process and creating the controls is contained in Annex D.
2.5 Defining the three cohorts to be used for analysis
Not all of the 1.7 million people selected from the Scottish population for the study were appropriate to be used for analysis. Firstly, the control groups were created using assumed age at 31 st March 2015. It is possible for a control to have died prior to their linked homeless counterpart being assessed as homeless. These needed to be removed. Secondly, not all groups were the same size, and needed to be rebalanced. Lastly, as the H2H data itself does not contain HL1 payload information, it is not possible to determine which of the 564,501 unique H2H individuals were assessed as statutory homeless. This study is only interested in those assessed as statutory homeless. Therefore those people not assessed as homelessness needed to be removed.
2.5.1 Removing deaths
In some cases, one of the controls from either the 20% most or 20% least deprived quintile died before the first assessment date of the homeless person to whom they were linked. By design, people associated with homelessness applications survive until the date of homeless assessment. If a homeless individual had died before their first assessment, they would never have had an assessment in the first place, and so could not have been included in the HL1 data. There was a concern that this could introduce bias when comparing health activity data between the cohorts.
To avoid bias, it was therefore important to ensure that everyone in the matched controls survived until the data of first homelessness assessment, too. All individuals that died prior to the date of first homelessness assessment of their matched homeless person were removed from the study. In total, just under 6,000 individuals were removed from the study due to early deaths (Table 2.3). In this way the probability of death prior to the first assessment date is zero for the homeless person and their controls. Therefore for the early years of the study, most of the people cannot have died by definition. The number of people who could have died will increase over time. This will result in a large increase in the death rate over the time of the study.
Table 2.2: Cases selected for the study
|Number of unique persons ( CHI numbers) amongst matched homeless records||564,501||563,207||564,501||1,692,209|
|Deaths prior to assessment date||88 ||4,402||1,453||5,943|
|Removed unbalanced Controls||6,931||1,323||5,566||13,820|
|Removed people not assessed as homeless||121,629||121,629||121,629||364,887|
|Total cases removed||128,648||127,354||128,648||384,650|
|Total number of unique persons in study||435,853||435,853||435,853||1,307,559|
2.5.2 Further balancing
As mentioned earlier, the size of the H2H group was so large that there were not enough people on the Research Indexing Spine in SIMD1 areas with the same age and sex breakdown to create a full SIMD1 control group. In practical terms, this means that there were just over 1,000 H2H individuals successfully linked to a SIMD5 person, yet unable to link to a SIMD1 person. These can be referred to as 'unbalanced' matched-pairs.
In addition, the previous Section 2.5.1 removed almost 6,000 individuals from the study that had died before the first assessment date of the homeless person to whom they are linked. Doing so created another source of unbalanced matched-pairs.
Complete matched-pairs are necessary for analysis purposes. Almost 14,000 individuals were removed from the study (Table 2.3) to remove all occasions where an unbalanced matched-pair exists.
2.5.3 Removing those not assessed as homeless
After removing those who died prior to the first assessment and after further balancing, not all of the remaining unique individuals in the H2H dataset had been assessed as statutory homeless, or threatened with homelessness.
It was decided that this study will focus only on individuals who were assessed as statutory homeless. An entire further study could be designed around the differences between those that were assessed as homeless, and those that were not. Using health activity to investigate this difference would align well with this study's research question. However, this has been noted for future, and not included in the study.
By combining the HL1 homelessness data with the H2H data using the HL1 Application Reference Number, it is possible to determine each person's homelessness assessment decision, or decisions if they appeared in multiple HL1 applications. In total, 121,629 people in the H2H dataset had never been assessed as either homeless or threatened with homelessness by a Local Authority ( Table 2.4). These individuals were removed from the study, along with an identical number from each of the other 20% most and 20% least deprived groups.
In total, 384,650 individuals were removed from the study due:
- to people dying before the first assessment date (5,943),
- unbalanced matched-pairs (13,820), and
- not being assessed as homeless or threatened with homelessness (364,887).
2.5.4 The analysis cohorts
After removing individuals as in Table 2.2, the study included just over 1.3 million people. These people are included in one of three specific cohorts that are defined here, and will be used throughout the study for analysis:
The Ever Homeless Cohort ( EHC):
This contains 435,853 individuals included on one or more homelessness applications, where the HL1 application was assessed as either homeless, or threatened with homelessness, with an assessment date between 4 June 2001 and 7 November 2016.
The Non-homeless Most Deprived Cohort ( MDC) :
This contains 435,853 individuals residing in the 20% most deprived SIMD areas (recorded on the Researching Indexing Spine as residing on the date the RIS was extracted at a postcode that is included in the 20% most deprived datazones according to the Scottish Index of Multiple Deprivation 2012). They were matched to the EHC on age and sex. The MDC individual was still alive at the date of first homelessness assessment of the matched individual in the EHC. As these individuals do not appear in the HL1 data provided for this study, they are assumed not to have been assessed as homeless or threatened with homelessness in the study period.
The Non-homeless Least Deprived Cohort ( LDC):
The 435,853 individuals residing in the 20% least deprived SIMD areas (recorded on the Researching Indexing Spine as residing at a postcode that is included in the 20% least deprived datazones according to the Scottish Index of Multiple Deprivation 2012). They were matched to the EHC on age and sex. The LDC individual was still alive at the date of first homelessness assessment of the matched individual in the EHC. As these individuals do not appear in the HL1 data provided for this study, they are assumed not to have been assessed as homeless or threatened with homelessness in the study period.
From here on, when we refer to people in the study, we explicitly mean people in all three cohorts.
2.6 Data coverage
2.6.1 Homelessness data coverage
Coverage of HL1 homelessness applications in the cohorts
In order to remove those individuals that were not assessed as statutory homeless, the H2H individuals were linked back to the HL1 payload data by the HL1 application reference number. This payload data can then be used to determine the number of HL1 assessments used in the study compared with the total number of assessments over the study period, i.e., is there under-coverage?
The 435,853 individuals in the EHC were associated with 429,078 HL1 homelessness assessments made by Local Authorities between 4 June 2001 and 7 November 2016. The numbers slightly differ as multiple individuals can be associated with a single assessment, and people can be associated with more than one assessment.
As stated, Local Authorities were invited to submit personal identifiable information for those who had made HL1 homelessness applications, via the H2H return. Table 2.4 displays the total number of HL1 homelessness assessments, made by Local Authorities between 4 June 2001 and 7 November 2016, where the people associated with the application were assessed as being homeless, or likely to become homeless within two months (threatened with homelessness). This is compared with the number of HL1 assessments in the study associated with the EHC.
Table 2.4: Total HL1 homelessness assessments made where the people associated with the application were assessed as being homeless, or likely to become homeless within two months (threatened with homelessness), by Local Authorities between 4 June 2001 and 7 November 2016, compared with the EHC.
|Local Authority||EHC HL1 assessments in the study||All HL1 Assessed as homeless*||Proportion of total HL1 Assessments assessed as homeless in the study|
|Argyll & Bute||573||7,918||7%|
|Dumfries & Galloway||12,113||13,100||92%|
|Perth & Kinross||1,854||11,819||16%|
*Source: Scottish Government Communities Analysis Division. HL1 Dataset as at 6 July 2017.
There is notable variation in the proportion of homelessness assessments available for use in the study across the 32 Local Authorities. In sum, there were 562,255 homeless assessments made over the period, of which, 429,078 (76%) are available for use in the study. Eighteen Local Authorities have coverage above the Scotland average of 76% or greater. Seven local authorities have coverage of less than 50%.
Looking beneath these numbers, not all Local Authorities submitted data for the entire period between 4 June 2001 and 7 November 2016 ( Table 2.5). The Local Authorities with the lowest rates of coverage – Orkney, West Lothian, Argyll & Bute, Stirling, Perth & Kinross , South Ayrshire and East Ayrshire – have only submitted data for part of the period or only a relatively small proportion of their cases.
Furthermore, Local Authorities were only asked to submit data to 31st March 2015 but a number of Local Authorities submitted data beyond this point, for 2015/16 and even into 2016/17. As a consequence, coverage for 2015/16 and 2016/17 is much lower than for other years and results in lowering coverage overall.
Table 2.5: EHC HL1 homelessness assessments provided to the study where the people associated with the application were assessed as being homeless, or likely to become homeless within two months (threatened with homelessness), by Local Authorities between 4 June 2001 and 7 November 2016, by year.
|Argyll & Bute||0%||0%||0%||0%||0%||0%||0%||0%||0%||0%||0%||1%||11%||91%||65%||7%||7%|
|Dumfries & Galloway||82%||96%||96%||95%||95%||95%||97%||96%||96%||96%||97%||97%||97%||98%||75%||1%||92%|
|Perth & Kinross||0%||0%||0%||0%||0%||0%||17%||24%||24%||45%||42%||26%||21%||28%||5%||0%||16%|
Number of people in the assessment
Each HL1 homelessness application must contain a main applicant, and may contain one or more other people (hereafter called 'associated applicants'). As previously mentioned, the EHC contains roughly 435,000 individuals that can be attributed to roughly 429,000 HL1 homelessness assessments.
The rebalancing performed during the creation of the analysis cohorts (Table 2.3) resulted in approximately 7,000 H2H individuals being removed from the study group. Of these:
- some will be associated with assessments that were not assessed as being homeless and therefore not relevant in this study,
- some will be associated with the 429,000 EHC assessments included in the study. Those assessments that contained more than one person, i.e. a main plus one or more associated applicants, may have had an individual removed due to either a death of one of their controls, or because the Research Indexing Spine ran out of suitable controls in SIMD1. This results in the number of EHC assessments being unaffected, yet the number of individuals in the EHC decreasing.
- some will be individuals that applied for homelessness as the main applicant in a single person application ( i.e., no associated applicants). By removing these individuals, both the number of EHC HL1 assessments and individuals in the EHC will decrease.
Examining the number of individuals associated with each EHC HL1 assessment ( Table 2.6), it is clear that not all assessments contain the correct number of individuals. Eight Local Authorities only provided one person (the main applicant), three provided up to two people (the main applicant plus one associated applicant), and the remaining 21 Local Authorities provided all the associated applicants for all of their respective HL1 assessments.
The eleven Local Authorities which have did not provide all associated applicants, only account for roughly 65,000, or 15% of all assessments attributed to the EHC. As the majority of assessments pertain to a single person (70%), the majority of these assessments will also likely be for a single main applicant.
In summary, nearly all of the 429,000 EHC HL1 assessments provided for the study will contain the main applicant. Only those assessments where the main application was removed due to unbalancing will not contain the main applicant. At least 9 out of 10 assessments in the study will contain the correct number of individuals.
As a result, some undercoverage is present. However, the scale of this issue is not considered significant and should have a negligible impact on the results.
Table 2.6: EHC HL1 Assessments by the number of people associated with the assessment, by Local Authority. Shaded cells suggest undercoverage.
|Assessments containing one person only||Assessments containing two people only||Assessments containing three or more people||Assessments in study|
|Argyll & Bute||77%||23%||0%||573|
|Dumfries & Galloway||95%||5%||0%||12,113|
|Perth & Kinross||47%||28%||25%||1,854|
Duplicate homelessness records
Table 2.5 displayed the EHC HL1 homelessness assessments provided by year and Local Authority. Three Local Authorities have coverage which exceeds 100% – Clackmannanshire (104% in 2002/03), Falkirk (over 100% in 2007/8 through to 2009/10) and Scottish Borders (103% in 2008/9).
These figures that exceed 100% relate to duplicate records in their submitted data. Overall, there are around 260 duplicate homelessness cases in the study (out of roughly 429,000), so the impact of these on the results will be negligible.
2.6.2 Coverage of Health Activity data in the cohorts
The study contains just over 1.3 million people, split into three cohorts (the EHC, MDC and LDC – section 2.5.4). The number of health activity records that can be attributed to the study population over the time period were summarised (Table 2.7). To understand the coverage of health activity in this study, this section examines how this compares with all of the health activity data in Scotland over a given period.
Table 2.7: Proportion of Scotland's health activity records in the study, by activity datasets in the study, for the period 2014/15 (Deaths for 2014).
|Health Activity Dataset||Period||Activity records in study||Activity records in Scotland||% of records in study|
|Accident and Emergency||2014/15||356,122||569,412||62.5%|
|Inpatients and Day Cases||2014/15||222,920||1,586,533||14.0%|
|Prescribing Information System||2014/15||1,843,364||101,147,994||1.8%|
|Mental Health Inpatient and Day Case||2014/15||7,421||36,542||20.3%|
|Scottish drug and Misuse||2014/15||6,761||14,542||46.5%|
|NRS – Deaths||2014||3,171||54,239||5.8%|
Given the number of people in the study (just over 1.3 million, approximately one-fifth of the Scottish population), one would expect to see this account for around one-fifth of all activity records in Scotland for each healthy activity type. The exception to this is for the Prescribing Information System data: the study only received PIS activity for a subset of all possible prescriptions  . In order for the study population to account for around one-fifth of all health activity, the study population needs to be representative of the Scottish population. However, this is not the case for two main reasons:
- the study contains the vast majority of homeless individuals in Scotland. Homeless individuals are likely to be among the most deprived, and it is known that there is a relationship between deprivation and health activity.
- the study contains a younger age-distribution compared with the entire Scottish population. This will likely result in over representativeness of health activities that are more common among younger people ( e.g. A&E and SDMD), and under representativeness of health activities that are more common among older people ( e.g. Deaths). The next section examines the age distribution of those in the study.
2.7 Cohorts: What do we know about them
2.7.1 Age and sex distribution
As discussed, each person in the EHC has been linked to a unique person in both the LDC and the MDC, where controls are selected randomly from people who have the same sex and age on the 31 st of March. In this way the age and sex distribution of all three cohorts is identical.
By design, the study covers a period of several years. There is no single age distribution for the cohorts, rather, the distribution will change over time as individuals enter the study group (when someone is born during the study period) and leave the study group (when someone dies during the study period). In 30 June 2002 there were around 1,110,000 people alive in the study, and in 30 June 2015, there were around 1,280,000 people alive in the study – a net difference of roughly 175,000 individuals (births minus deaths). The Scotland mid-year estimate for 2002 was 5,070,000 and for 2015 it was 5,370,000  . These numbers are used to construct population pyramids ( Figure 2.3). The following points are of note about the study distributions:
- The study contains a significant amount of data
- The study appears to contain a relatively larger proportion of young females compared to young males
- The study contains a very small proportion of older persons
- The study population ages between the two time periods, as expected.
Figure 2.3: The age distributions of people in the study in 2002 and 2015 compared with the Scottish population mid-year estimate
To further examine the proportion of people in the study across the different age groups, Figure 2.4 presents the population coverage of people in the study, in both time periods.
At mid-year 2002:
- The study included the majority of females aged 11–19 years in Scotland. This peaked at over 60% for females aged 15 years.
- The peak study coverage for males was for similar age groups, yet for slightly less than 50% of the Scottish population.
- For age groups 25+ years, there are more males in the study, as a proportion of males in Scotland, compared with females in the study.
- The study includes a very small proportion of older male and female persons. From the peak coverage rate at roughly 15 years, both coverage rates decrease as age increases.
- People in EHC accounted for 7.3% of the Scottish population
At mid-year 2015:
- The study population in 2015 has higher coverage rates for older people compared with the 2002 distribution. However, it is still considerably skewed towards being more representative of younger people aged 10–40 years.
- There is a very low coverage of children aged 0–5 years. This is due to the study only containing new individuals that were born during the study period, hence the relatively linear slope over these ages.
- People in EHC accounted for 8.0% of the Scottish population
Figure 2.4: The proportion of the population included in the study, by sex and single year age group, at mid-year 2002 and at mid-year 2015
2.7.2 Age and sex distribution at date of first homelessness assessment
In order to be a main applicant on a HL1 homelessness assessment, the main applicant must be aged 16 years or older. Figure 2.5 illustrates the age distribution of those in the EHC at the date of their first homelessness. Almost 30% of males and females in the EHC were under 16 years of age at the date of their first assessment (for many, this would be their first and only assessment). This indicates that a almost a third of the EHC are likely to have been children at the time of their first homelessness assessment.
Figure 2.5: Age distribution of individuals in the EHC at date of first homelessness assessment, by sex
2.7.2 Age and sex distribution of Repeat and Once-only homelessness
Over the study period – 4 June 2001 and 7 November 2016 – individuals in the EHC could potentially be assessed as homeless or threatened with homelessness on multiple occasions. Out of those in the study (435,853), 316,067 individuals were assessed as homeless only once, and 119,786 individuals were assessed as homelessness on two or more occasions.
The study regularly compares health activity of the EHC by Once-only and Repeat homelessness. In order to better understand any observed differences, it is important examine the age-sex distribution of each group (Figure 2.6).
Figure 2.6: Age distribution of Once-only and Repeat homeless individuals, by sex (male blue, female red), at 31 March 2015
The age distributions of Once-only and Repeat homeless individuals in the EHC is fairly similar at 31 March 2015. Individuals associated with Repeat homelessness in the study appear to be disproportionately centred towards young adults, compared with Once-only individuals. These differences are:
- A lower proportion of Repeat males and females are aged 0–15 compared with Once-only individuals.
- A higher proportion of Repeat males are aged 21–45 compared with Once-only males.
- A higher proportion of Repeat females are aged 21–35 compared with Once-only females.
- A lower proportion of Repeat are older persons (males 51+, females 41+) compared with Once-only.
The peak difference between the once-only and repeat homeless populations occurs at age 26 to 30 years for males and females. In this age group there are around 1.4 times more repeat homeless people than once-only people.
These differences could explain a fairly significant proportion of observed differences in health activity between the two groups, depending on the health activity data set in question. However, if observed differences are very large ( i.e. greater than 1.4 times), there is likely to be an observed relationship with repeat homelessness.
2.7.3 Age and sex distribution of those removed from the study
In the process of creating the EHC, almost 19,000 individuals were removed the from the final study group due to deaths prior to assessment date and removing unbalanced matched-pairs ( Table 2.4), roughly a third of which coming from the group of H2H individuals.
In-scope individuals for the study are those HL1 people assessed as statutory homeless. The study will have removed a few in-scope individuals during removing people who died before the study and from the rebalancing process. However, as the number is small compared to the size of the EHC (436,000), this should have little effect on the study.
2.8 Impact on analysis
2.8.1 Age bands
Throughout this report we analyse levels of particular types of health activity. In each of the analysis chapters we show the number of activity events for each of the EHC, MDC and LDC broken down by sex and age bands. The age bands used in the study are determined by the age on the 31 st of March 2015. In some cases the person may not be alive at that date, so their theoretical age on that date is used ( i.e. the age they would have been had they been alive then). The bands are 0–15 years and then 5-year bands beyond this up to 61–65, and then 66+. Note that this is slightly different from the bands used by NRS (which use 15–19, 20–24, and so on). However this allows children (those under 16) to be specifically identified.
It should be remembered that as the study is over several years, the age on the 31 st of March 2015 will not be the same as the age at the date of the activity in most cases. Despite this, the age breakdown gives a useful indication of the approximate age distribution of those using the particular service, and of particular differences between the cohorts for various age groups.
2.8.2 Main tables and "Standardisation"
Initially, the raw count of activity events is given, however, this will not allow comparison between the age bands. This is because there will be a different number of people in each of the age bands. To make these comparable we continue by dividing the number of activity events by the number of people that these relate to. This "standardization" will then allow direct comparison both between the cohorts, and also between age-sex bands within each cohort. These tables also include totals across all ages for each cohort–sex combination. As each cohort has the same age–sex distribution, these are therefore comparable between the different cohorts (although not directly for the different sexes). In this way these totals have effectively been age standardized.
It should be noted that this standardization is to the age distribution of the homeless cohort. Therefore, while comparisons can be made across the cohorts within this study, comparisons cannot be made to age-standardized figures from other publications, where they will have standardized to the age distribution of a general population. Standardization has not been done to the general population here as this would place focus on parts of the population that are highly underrepresented among the homeless population (in particular older people, see Figure 2.3). Doing so would mean that the results would not be as relevant to the majority of the homeless population. Also, greatly inflating parts of the population for which little data is available risks amplifying noise in the data.
As mentioned these standardized figures within age bands can be compared between the cohorts. To make this comparison clearer we follow up these analyses with ratios of the EHC and MDC to the corresponding values of the LDC.
By design, no one in the study will have died prior to the date of first homelessness assessment. Following the first assessment date, the probabilities of death will return to normal, however, it is likely that these will not be the same across the three cohorts. This will mean that there a different number of deaths in each cohort. This in turn will affect the health activity comparisons. For example if one of the cohorts has a higher death rate then at later times this cohort will have fewer people present to potentially make use of the health services. Furthermore, given that deaths will often be preceded by higher than usual hospital activity it could be that some of the increase in activity after assessment is due to the deaths after assessment, which by definition cannot happen before assessment. This effect will apply to the control cohorts as well as the homeless cohort. However we might expect it to be larger for the homeless cohorts that includes more deaths than the control cohorts. This will therefore affect the total amount of health activity of that cohort.
To address this worry some of the analyses were repeated with an adjusted cohort, where each person in the matched-pairs ( i.e., from the EHC, MDC and LDC) was removed if any had died at any time during the study. It was found that this adjustment study population made little difference, and did not qualitatively affect the comparisons. The results presented in the study therefore include the activity of people who die after the first homelessness assessment (and the others in the matched-pairs). Indeed it may be that the health activity of those who do die within the study period is particularly relevant to the findings, as these are the people who are particularly at risk.
2.8.4 Geographic effects
People move between SIMD quintiles
People in the EHC are defined to be people who are included on homelessness application that resulted in an assessment of homelessness or threatened with homelessness in the time period 4 June 2001 and 7 November 2016. The controls in the MDC are defined to be people who are not in the EHC, and are recorded as residing in a one of the 2012 SIMD 20% most deprived datazones as at June 2016. The LDC is defined similarly for the 20 % least deprived datazones. It should be noted that while the people in the LDC and MDC are known to reside in these SIMD quintiles on this date, their location is not known at other times. It is likely that many people in the LDC and MDC will also spend time during the study period residing in other SIMD quintiles. This effect will dilute the difference between the LDC and MDC.
People don't all stay in Scotland all the time
It is also likely that not all the people in these cohorts spend the whole study period in Scotland. Thus the average number of people in each cohort at particular times within the study period is likely to be lower than the total number of people in the cohorts. This should not affect the results unless this effect is substantial and affects the different cohorts differently. It may be that people in the LDC are more likely to move in and out of Scotland if they have more resources to do so, and seek more particular jobs. Conversely, however, the MDC may include people such as refugees who also might move in and out of Scotland. In the absence of data on this we assume that the effects on the results are negligible.
Controls may have been homeless
Note that, while people in the LDC and MDC are not known to have been assessed as homeless it is not guaranteed that none of them were included on homelessness applications. It may be that some had made homelessness applications prior to 4 June 2001 or after 7 November 2016. It is also the case that not all the individuals included on homelessness applications were included in the dataset that is used in the study. Therefore there may be some people in the LDC and MDC who have been assessed as homeless at some time. Indeed, the homeless data for this study only covered around three-quarters of HL1 applications which resulted in an assessment of either homeless or threatened with homelessness ( Table 2.4). This means that the ratio of activity in the EHC : MDC or EHC : LDC will be underestimates. Activity will be assigned to the controls rather than to the homeless population. This issue will impact upon the MDC cohort more than the LDC cohort - more people experiencing homelessness had a last settled address in the most-deprived quintile (see section 2.9.1 for more information).
In which Local Authorities are the controls?
Using information taken from the Scottish Government HL1 dataset, the Local Authority of the homeless person's most recent homeless application is used to place people in the EHC in each Local Authority. The study's controls were taken at random from the non-homeless population, and whilst matched on age and sex, were not matched by Local Authority. This was because some Local Authorities do not have datazones classified as 20% most deprived, and the problem the study encountered of running out of controls would have been worse. Instead, the controls are assumed to be drawn at random from the remaining non-homeless populations in SIMD1 (20% most deprived datazones) and SIMD5 (20% most deprived datazones). The estimated distribution of the EHC and controls in each Local Authority is shown in Table 2.8
Table 2.8: Estimated Local Authority for the people in each cohort
|Argyll & Bute||0.1%||0.8%||0.9%|
|Dumfries & Galloway||2.0%||1.1%||1.0%|
|Edinburgh, City of||12.9%||4.7%||19.5%|
|Perth & Kinross||0.7%||1.0%||3.1%|
The Island Authorities of Eilean Siar, Orkney and Shetland have no populations areas assigned as SIMD1 so subsequently none of the MDC controls are taken from these areas. Additionally, Eilean Siar has no population in SIMD5 areas and hence does not appear in the controls for SIMD5.
The Glasgow effect
The Glasgow effect refers to the unexplained poor health of people in Glasgow compared with people in Scotland that cannot be explained, for example, by deprivation  .
There is a different proportion of people in each cohort that were estimated to be living in Glasgow (20.1% EHC, 27.5% MDC, 4.9% LDC, Table 2.8), therefore the Effect will have a different impact on each cohort. Part of the health activity differences that we observe between the cohorts could be due to this. However, the proportion of total people in each cohort from Glasgow is relatively small, so this shouldn't be a huge effect. The fourth research question on the relationship between health activity, homelessness and deprivation addresses this.
2.9 Interpretation of results
One of the complexities of this study is that homelessness is an issue which affects individual people. However, the measures of deprivation we are using are area-based. Not all people living in areas ranked within the lowest SIMD deciles will be deprived (however deprived is defined).
From SIMD, counts are available of income deprived individuals and employment deprived individuals. There is no global count of deprived people within each area. The number of income deprived people is equivalent to the count of adults and their dependants in receipt of Income Support, Employment and Support Allowance, Job Seekers Allowance, Guaranteed Pension Credits, and Child and Working Tax Credits  . Similarly, the count of Employment Deprived individuals is equal to the number of men aged 16-64 and women aged 16-60 who are on the claimant count, receive Incapacity Benefit, Employment and Support Allowance, or Severe Disablement Allowance  . Neither of these counts match the demographic profile of the study population.
Figure 2.7 below shows the distribution of HL1 cases assessed as homeless (or threatened with homelessness) across the SIMD 2012 quintiles, according to their last known postcode of residence. This is done on the 50 per cent of records that have valid postcodes included on the dataset. This is compared to the distribution of the income deprived people and employment deprived people (as indicated in the SIMD 2012 dataset). The distribution of cases assessed as homeless across quintiles roughly follows the distribution of both income deprived people and employment deprived people. It is likely that there are overlaps between these groups. However, the extent of overlap is not estimated here.
Our study contains approximately 76% of all HL1 applications which resulted in an assessment of homeless (or threatened with homelessness). As we are missing approximately a quarter of cases, the individuals associated with these cases will be in the wider non- EHC population, from which our two control groups were sampled. Therefore, in-scope homeless individuals that were missed from the study could in fact be contained within our two MDC and LDC control groups. If these missing people are distributed as in Figure 2.7, we would expect these to be based more amongst the MDC controls with a smaller proportion within the LDC controls. Any ratios we construct, comparing activity amongst the EHC with the MDC and LDC, may therefore under-estimate any differences observed.
However, there is another force at play. The MDC and LDC cohorts were constructed by randomly selecting individuals from SIMD1 and SIMD5 areas based on the age and sex of H2H individuals. If homeless people are typically deprived, then the EHC contains a large number of people that were residing in SIMD1 and, to a lesser extent SIMD5 areas. As they are in the EHC, these homeless individuals are no longer available for selection for the MDC and LDC cohorts. The remaining individuals in SIMD1 and SIMD5 will therefore not be as deprived, on average, compared with individuals in these SIMD1 and SIMD5 datazones more generally. By having such a large EHC cohort, we have affected the sampling frame for our controls, so that the controls are no longer representative of the true SIMD1 and SIMD5 populations. This effect is likely to have more of an impact on the EHC : MDC ratio than the EHC : LDC ratio. What this means is that the population in our SIMD1 and SIMD5 areas is not comparable with other studies as homeless people have been removed from these areas. In addition, the age-sex profile of the controls also means that LDC and MDC are not representative of the populations living in these areas.
If homelessness was the sole explanation for different rates of health activity, we would expect the ratios EHC : MDC and EHC : LDC to be very similar
Figure 2.7: Distribution of Income Deprived People, Employment Deprived People and HL1 Homeless Applications Assessed as Homeless* across the SIMD 2012 quintiles
Notes: Assessed as homelessness* (or threatened with homelessness) includes 562,447 homelessness assessments where the application was assessed as homeless or threatened with homelessness between 4 June 2001 and 7 November 2016. Of these records, 280,057 had a postcode of their last settled address which could be mapped to a SIMD quintile. This is slightly different to the number of assessments in Table 2.1 (562,255 assessments). The HL1 dataset is a live dataset where some records may have been entered in error and deleted at a later date. This information was sourced from Scottish Government Communities Analysis Division using bespoke analysis of the HL1 dataset as at 22 February 2018.
2.9.2 Null Hypothesis - Are we just measuring "deprivation" or does homelessness have an additional effect?
By comparing the health activity among the three different cohorts we are seeking to show whether this is different or the same. Our null hypothesis is that people in all cohorts are the same. The main difference between the LDC and the MDC is that the MDC has more deprived people than does the LDC. Not all people in the MDC are deprived, but the proportion is higher. There will be other differences (such as more of the MDC than the LDC reside in Glasgow, so it will be more affected by the Glasgow Effect  ), but it is expected that the proportion of deprived people is the dominant difference.
If homelessness alone is not the sole explanator for different rates of health activity amongst the cohorts, the ratios of EHC: LDC are expected to be greater than EHC: MDC - purely as a result of aspects of deprivation affecting people in MDC than in LDC.
Whilst area-based indicators of deprivation have been developed, no definitive definition or source of deprived people exists (either at an aggregate or record level). We are therefore unable to quantify this effect on the ratios.
However, by instead looking at the timing of health activity relative to the date when people in the EHC are first assessed as homeless, the impact of health activity before or after this date can be used to determine whether homelessness itself has an additional effect. Under the null hypothesis, if there is no effect, health activity either side of this date will be the same. To measure the relationship between homelessness and health activity, temporal analysis is undertaken.
More information on the temporal analysis method is contained in Annex E.
This chapter detailed the study design, the health data and homelessness data used in the study and the construction of the study population. In doing so, three analysis cohorts were created: the EHC (the Ever Homeless Cohort), and its controls the MDC (the Most Deprived Cohort) and the LDC (the Least Deprived Cohort). This enables the study to better compare the health activity of the EHC with people living in the 20% most and least deprived areas in Scotland.
In the following chapters, analysis is conducted that compares the health activity of the EHC to that of the MDC and LDC, for the health activity datasets introduced in this chapter. Comparisons in health activity between the cohorts are presented to introduce the relative differences in health activity. The ratios of health activity between the cohorts themselves (especially the particular values of these) aren't directly comparable with other studies given the aforementioned issues. However, they are still useful in seeing if differences exist in health activity.
These lead on to the temporal analyses - the main focus of each chapter. These addresses the study's research questions, which attempt to understand how homelessness and health activity are related.