The Scottish Health Survey 2024 - Volume 2: Technical Report
This publication presents information on the methodology and fieldwork from the Scottish Health Survey 2024.
Part of
1.7 Weighting the data
1.7.1 Introduction
This section presents information on the weighting procedures applied to the survey data. Since 2012, the weighting for SHeS has been undertaken by the Scottish Government rather than the survey contractor (as had previously been the case), but the methodology applied was largely consistent with that of the 2008 to 2011 sweeps of the survey. The procedures for the implementation of the weighting methodology were developed by the Scottish Government working with the Methodology Advisory Service at the Office for National Statistics[i].
To undertake the calibration weighting, the ReGenesees Package for R was used. Within this, to execute the calibration, a raking function was implemented.
1.7.2 Main adult weights
The main adult weight is applicable to analysis of questions asked of all adults. There were six steps to calculating the overall adult weights. These were as follows:
1) Address selection weights (w1)
The address selection weights were calculated to compensate for unequal probabilities of selection of addresses in different survey strata. For the main sample, there were 32 strata (one for each local authority).
2) Dwelling unit selection weights (w2)
The Multiple Occupancy Indicator (MOI) for the PAF was used to
ensure that if there were multiple dwelling units at a single address point then they would have the same selection probability as individual addresses. However, there are likely to have been some cases where the MOI was incorrect. The following correction was applied where this was the case:
W2 was trimmed to a minimum of 0.67 and a maximum of 3.
3) Household selection weights (w3)
Similarly to w2, within a very small number of dwelling units, fieldworkers usually find multiple households, of which only one is selected for participation. The following correction was applied for multiple households:
W3 was trimmed to a maximum of 3.
4) Calibrated household weights (w4)
The three selection weights were combined (w1*w2*w3) before the household calibration stage. This combined weight was applied to the survey data to act as entry weights for the calibration. The execution of the calibration step then modified the entry weights so that the weighted total of all members of responding households matched the population totals for Health Boards, Scotland-level population totals for age/sex breakdown, and the population within each SIMD quintile. The population totals that were used were the National Records of Scotland’s (NRS) mid-2022 estimates for private households.
5) Adult non-response weights (w5)
All adults within selected households were eligible for interview, but within responding households not all individuals completed an interview. The profiles of household members that did not complete the interview were different from those that did. Information on all individuals within responding households was available through information gathered as part of the household interview. This allowed the differential response rates for individuals within households to be modelled using logistic regression to calculate a probability of responding based on their profiles. The logistic regression was only applicable for households containing more than one adult since households consisting of only one adult either responded to the household and individual interviews or did not respond at all.
The following variables were considered for inclusion in the model:
- Health Board
- Age/sex
- Number of adults in the household
- Employment status of household reference person
- Presence of a smoker in the household
- Marital status
- Tenure
- Urban/rural classification
- Access to a car
- Located within SIMD15 area
- Frequency of eating meals together
- SIMD quintile
Through running backwards and forwards selection procedures for the logistic regression the following variables were included in the final model:
- Health Board
- Age/sex
- Number of adults in the household
- Located within SIMD15 area
- Marital status
- SIMD quintile
- Access to a car
The final logistic regression model was then used to calculate the probability of response for all individuals that did respond. The adult non-response weight (w5) was then calculated as the reciprocal of this probability:
W5 was trimmed to a maximum of 4. For households of only one adult, the non-response weight was one.
6) Individual calibration and final adult weight (int24wt)
The household (w4) and non-response (w5) were combined (w4*w5) and applied to the survey data prior to the final stage of calibration weighting which matched weighted totals for the survey data to the NRS 2022 mid-year population estimates for Health Boards, age/sex distribution at Scotland level and age/sex distribution for the Glasgow and Greater Clyde Health Board shown in the tables below.
Table 6: 2022 Mid-year population estimates for private households in Scotland by Health Board
|
Health Board |
Adults |
Children |
Total |
|
Ayrshire & Arran |
303,040 |
58,357 |
361,397 |
|
Borders |
97,549 |
18,064 |
115,613 |
|
Dumfries & Galloway |
121,954 |
22,026 |
143,980 |
|
Fife |
301,284 |
61,441 |
362,725 |
|
Forth Valley |
246,151 |
50,106 |
296,257 |
|
Grampian |
468,643 |
97,183 |
565,826 |
|
Greater Glasgow & Clyde |
958,850 |
191,097 |
1,149,947 |
|
Highland |
266,490 |
49,428 |
315,918 |
|
Lanarkshire |
548,548 |
115,267 |
663,815 |
|
Lothian |
731,459 |
146,772 |
878,231 |
|
Orkney |
18,360 |
3,444 |
21,804 |
|
Shetland |
18,738 |
4,112 |
22,850 |
|
Tayside |
338,886 |
65,245 |
404,131 |
|
Western Isles |
21,874 |
3,945 |
25,819 |
|
Total |
4,441,826 |
886,487 |
5,328,313 |
Table 7: 2022 Mid-year population estimates for private households in Scotland by SIMD Quintile
|
SIMD Quintile |
Total population |
|
1 – 20% most deprived data zones |
1,038,755 |
|
2 |
1,037,595 |
|
3 |
1,046,051 |
|
4 |
1,135,445 |
|
5 – 20% least deprived data zones |
1,070,467 |
|
Total |
5,328,313 |
Table 8: 2022 Mid-year population estimates for private households in Scotland by age group
|
Age group |
Male |
Female |
Total |
|
0-4 |
126,442 |
119,800 |
246,242 |
|
5-9 |
143,707 |
136,499 |
280,206 |
|
10-15 |
184,016 |
176,023 |
360,039 |
|
16-24 |
262,304 |
261,527 |
523,831 |
|
25-34 |
331,065 |
350,840 |
681,905 |
|
35-44 |
328,852 |
350,535 |
679,387 |
|
45-54 |
343,547 |
367,236 |
710,783 |
|
55-64 |
377,229 |
401,128 |
778,357 |
|
65-74 |
285,051 |
309,997 |
595,048 |
|
75+ |
204,748 |
267,767 |
472,515 |
|
Total |
2,586,961 |
2,741,352 |
5,328,313 |
1.7.3 Biological module weights
A similar process was applied to derive the weights for the biological module. The steps are outlined below.
1) Address selection weight (bw1)
New address selection weights were calculated using the same process as described for w1.
2) Dwelling unit (w2) and household selection weights (w3)
The dwelling unit and household selection weights from the main adult weight were applied as above.
3) Calibrated household weight (bw4)
The three selection weights were combined (bw1*w2*w3) and applied to the survey data before the household calibration was run so that survey data matched the population totals for Health Boards, Scotland-level age/sex breakdowns, and the population within SIMD15 areas.
4) Adjustment for biological module selection (bw5)
33% of the main sample was allocated to the biological module. To incorporate this probability of selection, a correction was applied to the calibrated household weight (bw4). The correction was:
5) Application of adult non-response (w5)
For within household non-response, the non-response weight (w5) calculated for all households was also applicable for the biological module.
6) Non-response weight for biological module interview
Not all adults who responded to the main section of the interview responded to the biological module. Information collected for
the respondent in the main interview and household interview was used to calculate the likelihood of responding to the biological module and was modelled with logistic regression.
The following variables were considered for inclusion in the model:
- Health Board
- Age/sex
- Number of adults in the household
- Employment status of household reference person
- Presence of a smoker in the household
- Marital status
- Tenure
- Urban/rural classification
- Access to a car
- Located within SIMD15 area
- Frequency of eating meals together
- Self-assessed general health
- Whether done gardening/DIY/building work in the past 4 weeks
- Whether has longstanding illness
- Highest achieved qualification
- Level of physical activity
- Economic activity (including if retired)
- Ever had high blood pressure
- Whether smokes cigarettes or drinks nowadays
- Number of natural teeth
- Whether done any housework in past 4 weeks
Through running backwards and forwards selection procedures for the logistic regression the following variables were included in the final model:
- Health Board
- Age/sex
- Number of adults in the household
- Located within SIMD15 area
- Marital status
- Frequency of eating meals together
- Presence of a smoker in the household
- Access to a car
- Highest achieved qualification
- Whether done any housework in past 4 weeks
- Whether done gardening/DIY/building work in the past 4 weeks
The final logistic regression model was then used to calculate the probability of response for all individuals that did respond. The adult non-response weight (w5) was then calculated as the reciprocal of this probability:
The top 1% of bw6 was trimmed.
7) Final calibration for biological module (bio24wt)
The household (bw4), biological sample correction (bw5) and adult non-response (w5), and biological non-response (bw6) weights were combined (bw4*bw5*w5*bw6) and applied to the survey data.
For the final stage of biological module weighting, the weighted totals for the survey data were calibrated to match the NRS 2022 mid-year population estimates for private households for Health Boards and age/sex distribution at Scotland level. However, due to the low sample size for the module several categories had to be collapsed. In terms of Health Boards, all areas except for Grampian, Greater Glasgow and Clyde, Lanarkshire and Lothian were grouped together. For the age groups, the youngest two age groups were combined.
1.7.4 Adult version A weights
A weight titled “Version A” was calculated for the adult respondents in the main sample that were not selected for the biological module. These weights are for analysis of questions included in the version A rotating module which are only asked of respondents in households not selected for the biological module. The following steps were followed to derive the weight:
1) Address selection weight (bw1)
As derived in the first step of the biological module weight.
2) Dwelling unit (w2) and household selection weights (w3)
The dwelling unit and household selection weights from the main adult weight were applied as above.
3) Calibrated household weight (bw4)
As derived for the biological module.
4) Adult non-response weight (w5)
For within household non-response, the non-response weight (w5) calculated for all households was also applicable for the biological module.
5) Final calibration for Version A weight (vera24wt)
The household (bw4) and adult non-response (w5) weights were combined (bw4*w5) and applied to the survey data. As was the case with the main adult weight and biological module weight, the weighted totals for the survey data were calibrated to match the NRS 2022 mid-year population estimates for private households for Health Boards, age/sex distribution at Scotland level.
1.7.5 Overall child weights
An overall child weight was derived for child responses from the main sample and from the child boost combined. Separate logistic regression non-response weights were not required for the child samples as the response rate for children within cooperating households was sufficiently high. The weighting steps are shown below. Steps (1) and (2) followed the same process as described in 1.7.2 above.
1) Address selection weight for main sample and child boost combined (cw1)
2) Dwelling unit (cw2) and household (cw3) selection weights
3) Selection of children within each household (cw4)
A maximum of two children were eligible for interview in each household. To ensure that children in larger households were not under-represented in the final sample the following child selection weight was calculated for households with more than two children to compensate for the probability of selection:
For households with two or fewer children cw4=1.
4) Calibration for child interview weight (cint24wt)
The address selection (cw1), dwelling unit (cw2), household (cw3) and child selection weights (cw4) were combined (cw1*cw2*cw3*cw4) and applied to the survey data. The weighted totals for the survey data were calibrated to match the NRS 2022 mid-year population estimates for private households for Health Boards, age/sex distribution at Scotland level.
1.7.6 Child version A weights
A weight was calculated for the child respondents in the main sample in households allocated to the version A sample. These weights are for analysis of questions included in the version A rotating module. (cvera24wt).
1.7.7 Child main sample weights
Weights were also created specifically for within household analysis, comparing children’s characteristics with those of their parents. As data were only collected with respect to both children and adults in the main sample, these weights were only created for children at main sample addresses. They were created in a similar fashion to that described for the whole of the overall child weights (cmint24wt).
1.7.8 Intake24 weights
1) Selection and SHeS non-response
The basis for the Intake24 adult weight was the main adult weight (int24wt), which adjusts for the probability of selection and non-response to the survey. This weight was rescaled to a mean of one for all adult respondents eligible for the Intake24 survey.
2) Intake24 non-response weight
Not all of the adults that were invited to take part in the Intake24 survey responded. Using the information collected for the respondent in the main interview and household interview, the likelihood of responding to the Intake24 survey was modelled with logistic regression.
Through running backwards and forwards selection procedures for the logistic regression the following variables were included in the final model for response to the Intake24 survey:
- Health Board
- Age/sex
- Highest achieved qualification
- Whether done gardening/DIY/building work in the past 4 weeks
- Presence of a smoker in the household
- Whether done any housework in past 4 weeks
- Number of adults in the household
- Ethnic background of respondent
- Level of physical activity
- Whether smokes cigarettes
- Currently drinks alcohol
- Number of children in the household
- Access to a car
- Whether has longstanding illness
- Whether provide help or care
- Ever had high blood pressure
The final logistic regression model was then used to estimate the probability of response for all individuals that did respond to the Intake24 survey. The Intake24 non-response weight (intake24_NR) was then calculated as the reciprocal of this probability:
3) Final calibration of Intake 24 adult weights (shes_intake24_wt_sc)
The adult weight (int24wt) and non-response (intake24_NR) were combined (int24wt * intake24_NR) and applied to the data prior to the final stage of calibration weighting which matched weighted totals for the survey data to the NRS 2022 mid-year population estimates for Health Boards, age/sex distribution at Scotland level and age/sex distribution for the Glasgow and Greater Clyde Health Board.
|
Weight name |
Purpose of weight |
|
int24wt |
For analysis of 2024 adult data |
|
bio24wt |
For analysis of 2024 biological module data |
|
vera24wt |
For analysis of 2024 version A adult module data |
|
cint24wt |
For analysis of 2024 child data |
|
cvera24wt |
For analysis of 2024 version A child module data |
|
cmint24wt |
For analysis of 2024 combined child data core sample only (for within household analysis) |
|
shes_intake24_wt_sc |
For analysis of 2024 Intake24 data |
1.7.9 Combined weights
A number of different combinations of annual sweeps have been produced to allow the analysis of combined datasets. Due to disruption to the survey at the onset of the pandemic, the survey data collected in 2020 was published as experimental statistics and was not comparable with the time series[ii]. This data has not been included in the survey trends or the combined years’ analysis.
|
Weight name |
Purpose of combined weight |
|
int21222324wt |
For analysis of 2021, 2022, 2023 and 2024 combined adult data |
|
intsc21222324wt |
For analysis of 2021, 2022, 2023 and 2024 combined adult self-completion data due to lower response to the online self-completion surveys in 2023. See the 2023 technical report chapter 1 for more information. |
|
cint21222324wt |
For analysis of 2021, 2022, 2023 and 2024 combined child data |
|
cmint21222324wt |
For analysis of 2021, 2022, 2023 and 2024 combined child data core sample only (for within household analysis) |
|
bio21222324wt |
For analysis of 2021, 2022, 2023 and 2024 combined depression, anxiety, suicide and self-harm data |
|
int222324wt
|
For analysis of 2022, 2023 and 2024 combined adult data |
|
intsc222324wt
|
For analysis of 2022, 2023 and 2024 combined adult self-completion data due to lower response to the online self-completion surveys in 2023. See the 2023 technical report chapter 1 for more information. |
|
biophy222324wt |
For analysis of 2022, 2023 and 2024 combined bio measurements (blood pressure, hypertension, waist circumference and saliva). Created to account for reduced bio samples taken in 2022 during recovery of the interviewer panel following the pandemic. See the 2022 technical report for more information. |
|
bio222324wt |
For analysis of 2022, 2023 and 2024 combined depression, anxiety, suicide and self-harm data |
|
cint222324wt
|
For analysis of 2022, 2023 and 2024 combined child data |
|
cmint222324wt
|
For analysis of 2022, 2023 and 2024 combined child data core sample only (for within household analysis) |
|
int2224wt |
For analysis of 2022 and 2024 combined adult data |
|
cint2224wt |
For analysis of 2022 and 2024 combined child data |
|
vera2224wt |
For analysis of 2022 and 2024 combined version A adult module data |
|
cvera2224wt |
For analysis of 2022 and 2024 combined version A child module data |
|
cmint2224wt |
For analysis of 2022 and 2024 combined child data core sample only (for within household analysis) |
|
int2324wt |
For analysis of 2023 and 2024 combined adult data |
|
intsc2324wt |
For analysis of 2023 and 2024 combined adult self-completion data |
|
bio2324wt |
For analysis of 2023 and 2024 combined depression, anxiety, suicide and self-harm data |
|
cint2324wt |
For analysis of 2023 and 2024 combined child data |
|
cmint2324wt |
For analysis of 2023 and 2024 combined child data core sample only (for within household analysis) |
In each case, the calculation of the weights followed the same procedure. The pre-calibration weights which had already been calculated for the individual years (which take into account selection weighting and (except for the child weights) non-response weighting) were combined and calibrated to Health Board and age/sex 2022 population totals for private households.
References and notes
[i] A report on the development of the weighting procedures is available at: https://webarchive.nrscotland.gov.uk/3000/https://www.gov.scot/Topics/Statistics/About/Surveys/WeightingProjectReport
[ii] Scottish Health Survey – telephone survey – August/September 2020: main report. Edinburgh, the Scottish Government. Available at: https://www.gov.scot/publications/scottish-health-survey-telephone-survey-august-september-2020-main-report/
Contact
ScottishHealthSurvey@gov.scot