Scottish Social Attitudes Survey 2025: Technical Report

Technical report supporting the Scottish Social Attitudes Survey 2025 core module and attitudes towards discrimination module.


8 Weighting

Certain subgroups in the population are less likely than others to respond to surveys. This is referred to as differential non-response. These groups can end up being under-represented in the sample, which can bias the survey estimates. Weights are applied to the SSA survey that correct for these biases. Such non-response could occur within households as well as at the level of the selected postal address. Separate non-response models were constructed to deal with each of these elements of non-response. Finally, calibration weighting was used to adjust the profile of the responding sample so that it matched the population in terms of age, sex, education, tenure, ethnicity, economic activity (employment status) and Scottish Index of Multiple Deprivation (SIMD) quintiles.

The different stages of the weighting scheme are outlined in the detail below.

8.1 Selection weights

Oversampling and stratification within the sample design led to an uneven probability of address selection. To account for this, address selection weights (W1) were calculated as the inverse of the selection probabilities for each of the five strata, so that the weighted number of addresses in each stratum was in the correct proportion.

Individuals in households with more than two people aged 16 or over have a lower probability of selection than households with one or two 16+ people. These were accounted for by the within-household non-response modelling.

8.2 Non-response model

Specific subgroups can end up being over-represented in the sample, which can bias the survey estimates. As already noted, non-response could occur at the household level, when no one from the selected address responds, or within households, when only one person responds in households with two or more adults. Where information is available about non-responding addresses, the propensity for households (at selected addresses) to respond can be modelled, and the results used to generate a non-response weight. Similarly, where information is available about responding households, the expected number of responses within these households can also be modelled. Hence, there are two components to the non-response weights – one for between household non-response and one for within household non-response. These are intended to reduce bias in the responding sample resulting from differential response to the survey.

Between household response probability was modelled using logistic regression, with the dependent variable indicating whether or not someone at each selected address responded to the survey. Responding addresses were coded 1 and non-responding addresses were coded 0. The model was fitted weighted by the selection weights (W1). A number of variables that described the character of the area in which a selected address was located, including aggregated census data and deprivation indices, were considered for possible inclusion in the response model.

The variables found to be related to household response, once the other predictors included in the model had been controlled for, were: percentage of owner-occupied properties in the Census Output Area (quintiles), percentage managerial, administrative and professional socio-economic classification (NS-SEC12) in the Census Output Area (quintiles), whether the Census Output Area is majority Urban or Rural, the Census Output Area Classification (eight categories), the Index of Multiple Deprivation (quintiles), the percentage of residents 65+ in the postcode sector (quintiles) and population density (quintiles). The model shows that the likelihood of response increases with higher rates of home ownership, and households located in urban areas had lower odds of response compared with those in rural areas. Likelihood of response was generally higher in less deprived areas compared with the most deprived IMD quintile, with the largest increase observed in the least deprived quintile. The full model is shown in Appendix Table 2. The model generated an estimated probability of responding for each selected address. From this model, the between household non-response weight was calculated as the inverse of this estimated probability of responding for each responding address (W2). A composite weight (W3) was then calculated as the product of W1 and W2.

Non-response within households was also modelled using logistic regression, with the dependent variable indicating whether each responding address had one or two responses to the survey. Addresses that contained only one person aged 16+ and addresses from which there was no response were not included in this stage of the non-response modelling. The model was run weighted by the composite weight (W3). As well as the area-level information used in the previous model, additional household-level variables (gathered from the survey responses that were received) such as household size, tenure, whether anyone in the household has a degree, whether the household contains children and income were also considered for possible inclusion in the model[6]. The variables found to be related to the probability of receiving two responses, once the other predictors included in the model have been controlled for, were: whether someone in the household holds a degree, housing tenure (owned outright, mortgage or renting/other), the number of adults in the household, whether the household contains children and total weekly pre-tax household income. The model also shows that households who answer the income band question are more likely to have two respondents relative to the group who refuse to answer, and that rented households are less likely to have two respondents relative to owner occupiers. The full model is shown in Appendix Table 3.

The predicted probability from this model was used to estimate the expected number of completed surveys in responding households. This was calculated as (1-p) + 2p = 1+p, where p is the probability of two responses.

The within household non-response weight (W4) was calculated as the ratio of the number of people aged 16+ in the household (capped at 4) divided by the expected number of responses for each responding household, i.e. numad / (1+p), where numad is the number of people aged 16+ in the household (capped at 4). This was then combined with the previous composite weight (W3) to create the pre-calibration weight.

8.3 Calibration weighting

The final stage of weighting was to adjust the pre-calibration weight so that the weighted composition of the sample was in line with the best available population estimates of the characteristics of people (16+) in Scotland.

Only adults aged 16 or over living in Scotland were eligible to take part in the survey. Consequently the data have been weighted to the Scottish population aged 16 and over according to the 2024 mid-year population estimates published by the National Records for Scotland (ONS, 2024) for age, sex, the 2021 census estimates by Scottish Index of Multiple Deprivation (SIMD quintiles) and the latest ONS Labour Force Survey (ONS, 2025 Q2) for education, ethnicity, economic activity and housing tenure. The demographic composition of the original and final weighted sample, and how this compares with the population estimates, is shown in Appendix Table 4.

The calibration weight (SSA25_final_wt) is the final weight used in the analysis of the 2025 survey; this weight has been scaled so that the total sample size is unchanged. The range of the final calibrated weights is between 0.13 and 5.86.

8.4 Weighting efficiency and effective sample size

The effect of the weights on the precision of the survey estimates is indicated by the effective sample size (neff). The effective sample size measures the size of an (unweighted) simple random sample that would achieve the same precision (that is, the range of the standard error associated with each estimate) as the weighted design that has been implemented. If the effective sample size is close to the actual sample size, then we have an efficient design with a good level of precision. However, the overall level of precision also depends on the absolute size of the sample, as even an efficient design may yield less precise estimates if the sample size is small. The efficiency of a sample is given by the ratio of the effective sample size to the actual sample size. The effective sample size (neff) of SSA 2025 after weighting is 2,033 with an efficiency of 67%.

8.5 Sampling errors

Most of the questions asked of all sample members have a margin of error of around plus or minus two to three of the survey percentage. This means that we can be 95% certain that the true population percentage is within two to three percentage points (in either direction) of the percentage we report. However, sampling errors for percentages based only on respondents to just one or a few versions of the questionnaire, or on subgroups within the sample, are larger than they would have been had the questions been asked of everyone.

The design effect (DEFF) quantifies how a survey’s complex sampling design affects the statistical precision of survey estimates, by comparing the actual variance to that from a simple random sample of the same size. For SSA 2025, the overall DEFF is 1.50 which indicates that the variance of estimates is 50% higher than it would be under simple random sampling for a sample of the same size.

The implications of this increased variance are particularly relevant when interpreting margins of error for survey estimates. Table 3 below illustrates the adjusted margins of error around single percentage estimates, across a range of sample sizes (n = 250 to 5,000) and percentage values (from 10% to 90%). As expected, the margin of error decreases with larger sample sizes and is largest when proportions approach 50%, where statistical variability is highest.

It is important to note that while the table applies a constant DEFF of 1.50 across all estimates for simplicity, in practice, DEFF can vary between subgroups due to differences in sample structure and response variability. These subgroup-specific variations are not captured in the table but should be considered in subgroup analyses.

Table 4 Margins of error for different sample sizes with a DEFF of 1.50
N = 5%/95% 10%/90% 20%/80% 30%/70% 40%/60% 50%/50%
250 3.3 4.6 6.1 7.0 7.4 7.6
500 2.3 3.2 4.3 4.9 5.3 5.4
750 1.9 2.6 3.5 4.0 4.3 4.4
1,000 1.7 2.3 3.0 3.5 3.7 3.8
1,500 1.3 1.9 2.5 2.8 3.0 3.1
2,000 1.2 1.6 2.1 2.5 2.6 2.7
3,000 1.0 1.3 1.8 2.0 2.1 2.2
5,000 0.7 1.0 1.4 1.6 1.7 1.7

Contact

Email: socialresearch@gov.scot

Back to top