The Scottish Health Survey 2024 - Volume 2: Technical Report

This publication presents information on the methodology and fieldwork from the Scottish Health Survey 2024.


1.7    Weighting the data

1.7.1   Introduction

This section presents information on the weighting procedures applied to the survey data. Since 2012, the weighting for SHeS has been undertaken by the Scottish Government rather than the survey contractor (as had previously been the case), but the methodology applied was largely consistent with that of the 2008 to 2011 sweeps of the survey. The procedures for the implementation of the weighting methodology were developed by the Scottish Government working with the Methodology Advisory Service at the Office for National Statistics[i].

To undertake the calibration weighting, the ReGenesees Package for R was used. Within this, to execute the calibration, a raking function was implemented.

1.7.2     Main adult weights

The main adult weight is applicable to analysis of questions asked of all adults. There were six steps to calculating the overall adult weights. These were as follows:

1) Address selection weights (w1)

The address selection weights were calculated to compensate for unequal probabilities of selection of addresses in different survey strata. For the main sample, there were 32 strata (one for each local authority).

The address selection weight is calculated by dividing the number of PAF addresses in the stratum by the number of addresses selected for the stratum.

2) Dwelling unit selection weights (w2)

The Multiple Occupancy Indicator (MOI) for the PAF was used to

ensure that if there were multiple dwelling units at a single address point then they would have the same selection probability as individual addresses. However, there are likely to have been some cases where the MOI was incorrect. The following correction was applied where this was the case:

The dwelling unit selection weight is calculated by dividing the recorded dwelling units at the address by the PAF multiple occupancy indicator for the address.

W2 was trimmed to a minimum of 0.67 and a maximum of 3.

3) Household selection weights (w3)

Similarly to w2, within a very small number of dwelling units, fieldworkers usually find multiple households, of which only one is selected for participation. The following correction was applied for multiple households:

The household selection weight equals the number of households within dwelling unit.

W3 was trimmed to a maximum of 3.

4) Calibrated household weights (w4)

The three selection weights were combined (w1*w2*w3) before the household calibration stage. This combined weight was applied to the survey data to act as entry weights for the calibration. The execution of the calibration step then modified the entry weights so that the weighted total of all members of responding households matched the population totals for Health Boards, Scotland-level population totals for age/sex breakdown, and the population within each SIMD quintile. The population totals that were used were the National Records of Scotland’s (NRS) mid-2022 estimates for private households.

5) Adult non-response weights (w5)

All adults within selected households were eligible for interview, but within responding households not all individuals completed an interview. The profiles of household members that did not complete the interview were different from those that did. Information on all individuals within responding households was available through information gathered as part of the household interview. This allowed the differential response rates for individuals within households to be modelled using logistic regression to calculate a probability of responding based on their profiles. The logistic regression was only applicable for households containing more than one adult since households consisting of only one adult either responded to the household and individual interviews or did not respond at all.

The following variables were considered for inclusion in the model:

  • Health Board
  • Age/sex
  • Number of adults in the household
  • Employment status of household reference person
  • Presence of a smoker in the household
  • Marital status
  • Tenure
  • Urban/rural classification
  • Access to a car
  • Located within SIMD15 area
  • Frequency of eating meals together
  • SIMD quintile

Through running backwards and forwards selection procedures for the logistic regression the following variables were included in the final model:

  • Health Board
  • Age/sex
  • Number of adults in the household
  • Located within SIMD15 area
  • Marital status
  • SIMD quintile
  • Access to a car

The final logistic regression model was then used to calculate the probability of response for all individuals that did respond. The adult non-response weight (w5) was then calculated as the reciprocal of this probability:
The adult non-response weight is calculated by dividing 1 by the probability of individual’s response.

W5 was trimmed to a maximum of 4. For households of only one adult, the non-response weight was one.

6) Individual calibration and final adult weight (int24wt)

The household (w4) and non-response (w5) were combined (w4*w5) and applied to the survey data prior to the final stage of calibration weighting which matched weighted totals for the survey data to the NRS 2022 mid-year population estimates for Health Boards, age/sex distribution at Scotland level and age/sex distribution for the Glasgow and Greater Clyde Health Board shown in the tables below.

Table 6: 2022 Mid-year population estimates for private households in Scotland by Health Board

Health Board

Adults

Children

Total

Ayrshire & Arran

 303,040

 58,357

 361,397

Borders

 97,549

 18,064

 115,613

Dumfries & Galloway

 121,954

 22,026

 143,980

Fife

 301,284

 61,441

 362,725

Forth Valley

 246,151

 50,106

 296,257

Grampian

 468,643

 97,183

 565,826

Greater Glasgow & Clyde

 958,850

 191,097

 1,149,947

Highland

 266,490

 49,428

 315,918

Lanarkshire

 548,548

 115,267

 663,815

Lothian

 731,459

 146,772

 878,231

Orkney

 18,360

 3,444

 21,804

Shetland

 18,738

 4,112

 22,850

Tayside

 338,886

 65,245

 404,131

Western Isles

 21,874

 3,945

 25,819

Total

 4,441,826

 886,487

 5,328,313

Table 7: 2022 Mid-year population estimates for private households in Scotland by SIMD Quintile

SIMD Quintile

Total population

1 – 20% most deprived data zones

 1,038,755

2

 1,037,595

3

 1,046,051

4

 1,135,445

5 – 20% least deprived data zones

 1,070,467

Total

 5,328,313

Table 8: 2022 Mid-year population estimates for private households in Scotland by age group

Age group

Male

Female

Total

0-4

 126,442

 119,800

 246,242

5-9

 143,707

 136,499

 280,206

10-15

 184,016

 176,023

 360,039

16-24

 262,304

 261,527

 523,831

25-34

 331,065

 350,840

 681,905

35-44

 328,852

 350,535

 679,387

45-54

 343,547

 367,236

 710,783

55-64

 377,229

 401,128

 778,357

65-74

 285,051

 309,997

 595,048

75+

 204,748

 267,767

 472,515

Total

 2,586,961

 2,741,352

 5,328,313

1.7.3     Biological module weights

A similar process was applied to derive the weights for the biological module. The steps are outlined below.

1) Address selection weight (bw1)

New address selection weights were calculated using the same process as described for w1.

2) Dwelling unit (w2) and household selection weights (w3)

The dwelling unit and household selection weights from the main adult weight were applied as above.

3) Calibrated household weight (bw4)

The three selection weights were combined (bw1*w2*w3) and applied to the survey data before the household calibration was run so that survey data matched the population totals for Health Boards, Scotland-level age/sex breakdowns, and the population within SIMD15 areas.

4) Adjustment for biological module selection (bw5)

33% of the main sample was allocated to the biological module. To incorporate this probability of selection, a correction was applied to the calibrated household weight (bw4). The correction was:

The adjustment for the biological module selection is calculated by dividing the number of PAF addresses in the stratum by the stratum selected addresses for the biological module and dividing the result by the calibrated household weight.

5) Application of adult non-response (w5)

For within household non-response, the non-response weight (w5) calculated for all households was also applicable for the biological module.

6) Non-response weight for biological module interview

Not all adults who responded to the main section of the interview responded to the biological module. Information collected for

the respondent in the main interview and household interview was used to calculate the likelihood of responding to the biological module and was modelled with logistic regression.

The following variables were considered for inclusion in the model:

  • Health Board
  • Age/sex
  • Number of adults in the household
  • Employment status of household reference person
  • Presence of a smoker in the household
  • Marital status
  • Tenure
  • Urban/rural classification
  • Access to a car
  • Located within SIMD15 area
  • Frequency of eating meals together
  • Self-assessed general health
  • Whether done gardening/DIY/building work in the past 4 weeks
  • Whether has longstanding illness
  • Highest achieved qualification
  • Level of physical activity
  • Economic activity (including if retired)
  • Ever had high blood pressure
  • Whether smokes cigarettes or drinks nowadays
  • Number of natural teeth
  • Whether done any housework in past 4 weeks

Through running backwards and forwards selection procedures for the logistic regression the following variables were included in the final model:

  • Health Board
  • Age/sex
  • Number of adults in the household
  • Located within SIMD15 area
  • Marital status
  • Frequency of eating meals together
  • Presence of a smoker in the household
  • Access to a car
  • Highest achieved qualification
  • Whether done any housework in past 4 weeks
  • Whether done gardening/DIY/building work in the past 4 weeks

The final logistic regression model was then used to calculate the probability of response for all individuals that did respond. The adult non-response weight (w5) was then calculated as the reciprocal of this probability:

The non-response weight for biological module interview is calculated by dividing 1 by the probability of individual’s response to the biological module.

The top 1% of bw6 was trimmed.

7) Final calibration for biological module (bio24wt)

The household (bw4), biological sample correction (bw5) and adult non-response (w5), and biological non-response (bw6) weights were combined (bw4*bw5*w5*bw6) and applied to the survey data.

For the final stage of biological module weighting, the weighted totals for the survey data were calibrated to match the NRS 2022 mid-year population estimates for private households for Health Boards and age/sex distribution at Scotland level. However, due to the low sample size for the module several categories had to be collapsed. In terms of Health Boards, all areas except for Grampian, Greater Glasgow and Clyde, Lanarkshire and Lothian were grouped together. For the age groups, the youngest two age groups were combined.

1.7.4     Adult version A weights

A weight titled “Version A” was calculated for the adult respondents in the main sample that were not selected for the biological module. These weights are for analysis of questions included in the version A rotating module which are only asked of respondents in households not selected for the biological module. The following steps were followed to derive the weight:

1) Address selection weight (bw1)

As derived in the first step of the biological module weight.

2) Dwelling unit (w2) and household selection weights (w3)

The dwelling unit and household selection weights from the main adult weight were applied as above.

3) Calibrated household weight (bw4)

As derived for the biological module.

4) Adult non-response weight (w5)

For within household non-response, the non-response weight (w5) calculated for all households was also applicable for the biological module.

5) Final calibration for Version A weight (vera24wt)

The household (bw4) and adult non-response (w5) weights were combined (bw4*w5) and applied to the survey data. As was the case with the main adult weight and biological module weight, the weighted totals for the survey data were calibrated to match the NRS 2022 mid-year population estimates for private households for Health Boards, age/sex distribution at Scotland level.

1.7.5     Overall child weights

An overall child weight was derived for child responses from the main sample and from the child boost combined. Separate logistic regression non-response weights were not required for the child samples as the response rate for children within cooperating households was sufficiently high. The weighting steps are shown below. Steps (1) and (2) followed the same process as described in 1.7.2 above.

1) Address selection weight for main sample and child boost combined (cw1)

2) Dwelling unit (cw2) and household (cw3) selection weights

3) Selection of children within each household (cw4)

A maximum of two children were eligible for interview in each household. To ensure that children in larger households were not under-represented in the final sample the following child selection weight was calculated for households with more than two children to compensate for the probability of selection:

The  child selection weight is calculated by dividing the number of children in the household by 2.

For households with two or fewer children cw4=1.

4) Calibration for child interview weight (cint24wt)

The address selection (cw1), dwelling unit (cw2), household (cw3) and child selection weights (cw4) were combined (cw1*cw2*cw3*cw4) and applied to the survey data. The weighted totals for the survey data were calibrated to match the NRS 2022 mid-year population estimates for private households for Health Boards, age/sex distribution at Scotland level.

1.7.6 Child version A weights

A weight was calculated for the child respondents in the main sample in households allocated to the version A sample. These weights are for analysis of questions included in the version A rotating module. (cvera24wt).

1.7.7 Child main sample weights

Weights were also created specifically for within household analysis, comparing children’s characteristics with those of their parents. As data were only collected with respect to both children and adults in the main sample, these weights were only created for children at main sample addresses. They were created in a similar fashion to that described for the whole of the overall child weights (cmint24wt).

1.7.8 Intake24 weights

1) Selection and SHeS non-response

The basis for the Intake24 adult weight was the main adult weight (int24wt), which adjusts for the probability of selection and non-response to the survey. This weight was rescaled to a mean of one for all adult respondents eligible for the Intake24 survey.

2) Intake24 non-response weight

Not all of the adults that were invited to take part in the Intake24 survey responded. Using the information collected for the respondent in the main interview and household interview, the likelihood of responding to the Intake24 survey was modelled with logistic regression.

Through running backwards and forwards selection procedures for the logistic regression the following variables were included in the final model for response to the Intake24 survey:

  • Health Board
  • Age/sex
  • Highest achieved qualification
  • Whether done gardening/DIY/building work in the past 4 weeks
  • Presence of a smoker in the household
  • Whether done any housework in past 4 weeks
  • Number of adults in the household
  • Ethnic background of respondent
  • Level of physical activity
  • Whether smokes cigarettes
  • Currently drinks alcohol
  • Number of children in the household
  • Access to a car
  • Whether has longstanding illness
  • Whether provide help or care
  • Ever had high blood pressure

The final logistic regression model was then used to estimate the probability of response for all individuals that did respond to the Intake24 survey. The Intake24 non-response weight (intake24_NR) was then calculated as the reciprocal of this probability:

The Intake24 non-response wight is calculated by dividing 1 by the probability of individual’s response to Intake24 survey

3) Final calibration of Intake 24 adult weights (shes_intake24_wt_sc)

The adult weight (int24wt) and non-response (intake24_NR) were combined (int24wt * intake24_NR) and applied to the data prior to the final stage of calibration weighting which matched weighted totals for the survey data to the NRS 2022 mid-year population estimates for Health Boards, age/sex distribution at Scotland level and age/sex distribution for the Glasgow and Greater Clyde Health Board.

Weight name

Purpose of weight

int24wt

For analysis of 2024 adult data

bio24wt

For analysis of 2024 biological module data

vera24wt

For analysis of 2024 version A adult module data

cint24wt

For analysis of 2024 child data

cvera24wt

For analysis of 2024 version A child module data

cmint24wt

For analysis of 2024 combined child data core sample only (for within household analysis)

shes_intake24_wt_sc

For analysis of 2024 Intake24 data

  1.7.9   Combined weights

A number of different combinations of annual sweeps have been produced to allow the analysis of combined datasets. Due to disruption to the survey at the onset of the pandemic, the survey data collected in 2020 was published as experimental statistics and was not comparable with the time series[ii].  This data has not been included in the survey trends or the combined years’ analysis.  

Weight name

Purpose of combined weight

int21222324wt

For analysis of 2021, 2022, 2023 and 2024 combined adult data

intsc21222324wt

For analysis of 2021, 2022, 2023 and 2024 combined adult self-completion data due to lower response to the online self-completion surveys in 2023. See the 2023 technical report chapter 1 for more information.

cint21222324wt

For analysis of 2021, 2022, 2023 and 2024 combined child data

cmint21222324wt

For analysis of 2021, 2022, 2023 and 2024 combined child data core sample only (for within household analysis)

bio21222324wt

For analysis of 2021, 2022, 2023 and 2024 combined depression, anxiety, suicide and self-harm data

int222324wt

 

For analysis of 2022, 2023 and 2024 combined adult data

intsc222324wt

 

For analysis of 2022, 2023 and 2024 combined adult self-completion data due to lower response to the online self-completion surveys in 2023. See the 2023 technical report chapter 1 for more information.

biophy222324wt

For analysis of 2022, 2023 and 2024 combined bio measurements (blood pressure, hypertension, waist circumference and saliva). Created to account for reduced bio samples taken in 2022 during recovery of the interviewer panel following the pandemic. See the 2022 technical report for more information.

bio222324wt

For analysis of 2022, 2023 and 2024 combined depression, anxiety, suicide and self-harm data

cint222324wt

 

For analysis of 2022, 2023 and 2024 combined child data

cmint222324wt

 

For analysis of  2022, 2023 and 2024 combined child data core sample only (for within household analysis)

int2224wt

For analysis of 2022 and 2024 combined adult data

cint2224wt

For analysis of 2022 and 2024 combined child data

vera2224wt

For analysis of 2022 and 2024 combined version A adult module data

cvera2224wt

For analysis of 2022 and 2024 combined version A child module data

cmint2224wt

For analysis of 2022 and 2024 combined child data core sample only (for within household analysis)

int2324wt

For analysis of 2023 and 2024 combined adult data

intsc2324wt

For analysis of 2023 and 2024 combined adult self-completion data

bio2324wt

For analysis of 2023 and 2024 combined depression, anxiety, suicide and self-harm data

cint2324wt

For analysis of 2023 and 2024 combined child data

cmint2324wt

For analysis of  2023 and 2024 combined child data core sample only (for within household analysis)

In each case, the calculation of the weights followed the same procedure. The pre-calibration weights which had already been calculated for the individual years (which take into account selection weighting and (except for the child weights) non-response weighting) were combined and calibrated to Health Board and age/sex 2022 population totals for private households.

References and notes 

[i]     A report on the development of the weighting procedures is available at:  https://webarchive.nrscotland.gov.uk/3000/https://www.gov.scot/Topics/Statistics/About/Surveys/WeightingProjectReport

 

[ii]     Scottish Health Survey – telephone survey – August/September 2020: main report. Edinburgh, the Scottish Government. Available at: https://www.gov.scot/publications/scottish-health-survey-telephone-survey-august-september-2020-main-report/

 

 

Contact

ScottishHealthSurvey@gov.scot

Back to top