6 Technical Notes
6.1 Source Surveys and Core Questions
Results from the three large-scale Scottish Government population surveys are published separately as National Statistics:
- Scottish Crime and Justice Survey (SCJS)
- Scottish Health Survey (SHeS)
- Scottish Household Survey (SHS)
Further information on Population Surveys in Scotland can be found on the SG website.
Since the beginning of 2012 each of the surveys included a set of 20 core questions that provide information on the composition, characteristics and attitudes of Scottish households and adults across a number of topic areas including equality characteristics, housing, employment and perceptions of health and crime. Responses on these questions from all three surveys have been pooled to provide the Scottish Surveys Core Questions (SSCQ) dataset with a sample size of around 20,000 responses.
Full details of the harmonised questions are available online and questionnaires are provided on the websites of each of the individual surveys.
Due to the different sampling nature of each survey, which is necessary to meet their primary aims, the number of respondents varies between different SSCQ questions. The questions were hence batched into three groups: household questions, individual questions and crime questions, and three different sets of weights calculated to ensure representative results. Sampling, weighting and pooled sample numbers are described separetely for each survey below.
Scottish Crime and Justice Survey (SCJS) technical notes
Sampling, survey response and weighting are described in full in the SCJS technical report. Briefly, the survey consists of a simple random sample, designed to achieve a robust sample at national and subgroup level. The target samples size at national level is 6,000 interviews per year. One random adult per household is interviewed and asked all SSCQ and SCJS questions.
Scottish Health Survey (SHeS) technical notes
Sampling, survey response and weighting are described in full in the SHeS 2017 technical report. The SHeS sample is clustered in each calendar year and unclustered over four years. All adults and up to two children in each household are eligible for interview. Only one adult in each household was asked the crime and household questions, to remain in line with the SCJS sampling procedure. The SHeS sample is boosted by participating health boards. It is further boosted to interview children in further households. These households are excluded from the SSCQ dataset.
Scottish Household Survey (SHS) technical notes
Sampling, survey response and weighting are described in full in the SHS technical report. The SHS consists of a simple random sample with a target minimum effective sample size of 250 per local authority. The SSCQ household questions are answered by the highest income householder or their spouse/partner, and one adult is randomly selected to answer the individual and crime questions, in line with the other two surveys.
Datasets from the three source surveys were combined into three new SSCQ datasets: SSCQ household variables (19,220 responses), SSCQ individual variables (18,984 responses) and SSCQ crime variables (17,756 responses), see Table 19.
Each variable response category in each of the surveys carries a different design effect. If we were solely seeking the most efficient estimate for each variable separately, then separate scale factors could be derived for each one. However, this would restrict the use of the dataset. Rather, for each constituent survey dataset the design effects were estimated for each category and then the median design effect over all categories was used as the representative design effect of that survey. These design effects were then used along with the sample sizes to calculate the effective sample sizes (neff) and scaling factors for combining the three datasets.
Table 19: Numbers of sample and effective sample pooled from the source surveys
To combine the data the scale factors were applied to the grossing weights for the individual surveys (described in section 6.1). The neff of each survey contribution formed the basis for the scaling factors:
survey A weight scaling factor = neff (surveyA) / (sum of three survey neffs).
The weights were then re-scaled to be proportionate to effective sample size contribution of each survey and used as pre-weights. The three pooled SSCQ datasets were then weighted again to be representative of population estimates. See SSCQ Weighting tables.
6.3 Confidence Interval Calculations
All three source surveys are stratified to ensure sufficient sample sizes in smaller local authorities. SHeS is clustered in each annual fieldwork period and, while this effect cancels out over each four-year period, must be accounted for in producing annual results.
Confidence intervals have been calculated using a method to account for stratification and clustering (surveyfreq in SAS). Confidence intervals across all subgroup estimates in SSCQ are provided in the accompanying supplementary tables.
Confidence intervals are plotted on point estimates on all charts and figures in this report. If the intervals surrounding two different point estimates do not overlap then there is a significant difference between the two points, but if they do overlap it does not necessarily mean there is no significant difference (see further guidance). In the report text the term “significant” refers to “statistically significant” differences.
A comparison of estimates of key variables across the three constituent surveys and the SSCQ are provided in Annex A.
6.4 Statistical Disclosure Control
All estimates based on one or two respondents and displayed in main and supplementary tables have been denoted with ‘*’ to safeguard the confidentiality of respondents with rare characteristics. Cells with true zero counts are denoted with ‘.’ throughout, unless denoted ‘*’ as part of disclosure control.
For individual variables crossed with individual variables (e.g. Ethnic group by Religion), further cells with zero or low respondent numbers in the same row and column as the single response have also been suppressed with ‘*’ to ensure confidentiality. For household and geographic variables, only one further cell in the same row was suppressed, as these cross-tabulations are not transposed.
6.5 Presentation of Data on Country of Birth
Due to errors in coding survey fieldwork, the country of birth for individuals outside of the UK countries and Ireland were not recorded for ~400 respondents of the Scottish Crime and Justice Survey in 2017. This complicated their assignment to country of birth in the “Rest of the EU” or in the “Rest of the World”. We assigned respondents with “White: Polish” ethnicity to the “Rest of the EU” category, based on the country of birth of nearly all other survey respondents with this charachteristic. We imputed the remaining 331 respondents’ country of birth category with a logistic regression model based on correllating variables (ethnic group, religion, tenure, age, urban-rural area). Those born in the Ireland were excluded from the “Rest of the EU” group prior to the logisitc regression as they had been correctly coded.
6.6 Presentation of Data on Religion
Table 20: Grouping of religion in the SSCQ 2017
|Base Collection Categories||Sample||SSCQ Groups||Sample|
|Church of Scotland||5237||Church of Scotland||5237|
|Roman Catholic||2577||Roman Catholic||2577|
|Other Christian||1637||Other Christian||1637|
6.7 Presentation of Data on Ethnic Group
Table 21: Grouping of ethnic group in the SSCQ 2017
|Base Collection Categories||Sample||SSCQ Groups||Sample|
|A - White - White Scottish||14908||White: Scottish||14908|
|A - White - Other British||2428||White: Other British||2428|
|A - White – Polish||281||White: Polish||281|
|A - White – Irish||171||White: Other||709|
|A - White - Gypsy/Traveller||4|
|A - White - Any other white ethnic group||534|
|C - Asian, Asian Scottish or Asian British - Pakistani, Pakistani Scottish or Pakistani British||109||Asian||355|
|C - Asian, Asian Scottish or Asian British - Indian, Indian Scottish or Indian British||111|
|C - Asian, Asian Scottish or Asian British - Bangladeshi, Bangladeshi Scottish or Bangladeshi British||9|
|C - Asian, Asian Scottish or Asian British - Chinese, Chinese Scottish or Chinese British||65|
|C - Asian, Asian Scottish or Asian British - Other Asian, “Asian” Scottish or “Asian” British||61|
|B - Mixed or Multiple Ethnic Group - Any mixed or multiple ethnic groups||42||All other ethnic groups||269|
|D - African - African, African Scottish or African British||73|
|D - African - Other African background||21|
|E - Caribbean or Black - Caribbean, Caribbean Scottish or Caribbean British||7|
|E - Caribbean or Black - Black, Black Scottish or Black British||7|
|E - Caribbean or Black - Other Caribbean or Black background||2|
|F - Other Ethnic Group - Arab, Arab Scottish or Arab British||30|
|F - Other Ethnic Group – Other||87|
6.8 Mental Wellbeing Scoring
Wellbeing is measured in the Scottish Health Survey using the Warwick–Edinburgh Mental Wellbeing Scale (WEMWBS) questionnaire. It has 14 items designed to assess: positive affect (optimism, cheerfulness, relaxation) and satisfying interpersonal relationships and positive functioning (energy, clear thinking, self-acceptance, personal development, mastery and autonomy). The scale uses positively worded statements with a five-item scale ranging from '1 - none of the time' to '5 - all of the time'. The total score is the sum of these responses across the 14 questions. The scale therefore runs from 14 for the lowest levels of mental wellbeing to 70 for the highest.
SWEMWBS is a shortened version of WEMWBS which is Rasch compatible. This means the seven items included have undergone a more rigorous test for internal consistency than the 14 item scale and have superior scaling properties. The seven items relate more to functioning than to feeling and therefore offer a slightly different perspective on mental wellbeing. However, the correlation between WEMWBS and SWEMWBS is high at 95.4%. The SWEMWBS scale runs from seven for the lowest levels of mental wellbeing to 35 for the highest.
SWEMWBS statements are as follows:
- I've been feeling optimistic about the future
- I've been feeling useful
- I've been feeling relaxed
- I've been dealing with problems well
- I've been thinking clearly
- I've been feeling close to other people
- I've been able to make up my own mind about things
Peaks at multiples of seven are produced by column effects, where respondents are more likely to place answers down a column giving the same response for each question. SWEMWBS scores undergo a metric conversion to correct somewhat for this effect and produce a distribution that is closer to normal, also reducing the boundary effect at the scale maximum of 35.
6.9 Age Standardisation
When comparing sub-groups for a variable on which age has an influence, differences in age distributions are likely to affect any observed differences between groups. Age standardisation enables groups to be compared after adjusting for the effects of differences in their age distributions.
Age standardisation was carried out using the direct standardisation method: the age distribution of sub-groups was adjusted was the mid-2017 population estimates for Scotland. All age standardisation has been undertaken separately for each gender.
The age-standardised proportion p' was calculated as follows, where pi is the age specific proportion in age group i and Ni is the standard population size in age group i:
Therefore p' can be viewed as a weighted mean of pi using the weights Ni.
Age standardisation was carried out using the age groups: 16-24, 25-34, 35-44, 45-54, 55-64, 65-74 and 75 and over broken down by gender.
The variance of the standardised proportion can be estimated by:
The populations used for age standardisation are the same as those used for weighting. See the associated Weighting Base tables for details.
6.10 Statistical Tests
Statistical tests are used throughout this publication to determine whether apparent differences are statistically significant.
For ordinal or categorical variables, a logistic regression model is used to determine whether differences between subgroups are statistically significant. Testing is relative to a “reference group” which is always the largest subgroup (see Guide to this report). This is performed using proc surveylogistic in SAS to account for the complex design of SSCQ.
To determine changes over time we use a similar technique, coding data years as a continuous integer variable.
- Change “from 2016” excludes data prior to 2016 and regresses year against the indicator variable overall or within subgroup domains or geographical areas.
- Change “from 2012” (or 2014) retains all data years (i.e. not testing 2012 (or 2014) against 2016) and indicates whether a trend exists over the longer time base.
To determine whether a change over time is statistically significant, we examine adjusted chi-squared statistics and odds ratio confidence limits. We require 95% confidence. Odds ratio confidence intervals, which indicate the strength of the signal, are required to exclude the value of 1 (either to lie above or below equal odds) with the same 95% confidence bounds. In cases where the two indicators disagree (i.e. where the odds ratio interval includes the value of 1 but the p-value is below 0.05, or p-value exceed 0.05 but the signal is strong) are taken not to be statistically significant.
SWEMWBS is the only continuous indicator variable in SSCQ. A regression analysis is implemented using SAS proc surveyreg to account for the complex survey design. Testing is relative to a reference category which is always the most populated subgroup in the domain.
Formal testing between subnational geographies is produced using contrasts to compare the area in question with the combined total of all other areas.