The Scottish Government’s Quality Assessment of the Office for National Statistics Labour Force Survey and Annual Population Survey data for Scotland
The Scottish Government outline their use of the Office for National Statistics Labour Force Survey and Annual Population Survey. We explore the current quality of the data and summarises what that means for Labour Market Statistics in Scotland.
Part of
3 Quality Assessment Methods: Scottish Labour Force Survey and Annual Population Survey data
This quality assessment focuses on the current status of the Labour Force Survey (LFS) and Annual Population Survey (APS) statistics for Scotland used in our regular publications and the key indicators we supply data for. To complete the quality assessment of the LFS and APS data for Scotland, we identified 12 key estimates from the LFS, and a further 229 estimates from the APS.
Specifically, we have evidenced the quality issues affecting the Scottish data and why this is impacting our ability to produce quality statistics on the economy that meet users’ needs. We assessed all 241 key estimates against the current data quality rules which consider sample sizes and coefficients of variation to determine whether data is of high enough quality to publish. We also assess each estimate based on a data confidence rating that informs users of the level of confidence in the estimates.
We have compared the quality of the Scottish data in 2019 (the last time point before the COVID-19 pandemic) with the quality of the most recent Scottish calendar quarter LFS and calendar year APS data available at the time of producing this analysis (November 2024 onwards). APS data for Jan-Dec 2024 is now available and shows that the sample is slightly higher for Scotland than it was in 2023.
For the LFS data, we compare Jul-Sep 2019 with Jul-Sep 2024 for nine estimates and Oct-Dec 2019 with Oct-Dec 2023 for the remaining three estimates. For the APS data, we compare Jan-Dec 2019 with Jan-Dec 2023 for 222 estimates and Jan-Dec 2020 with Jan-Dec 2023 for the remaining seven estimates. This provides snapshots in time for pre- and post-pandemic comparison purposes but does not capture all of the variation or uncertainty seen in the Scottish data at different time points.
In the Quality Assessment Results and Conclusion sections, we refer to the time period in aggregate form for each survey for simplicity. When discussing the LFS, the time periods are referred to as 2019 and 2023/2024. When discussing the APS, the time periods are referred to as 2019/2020 and 2023.
We have provided annexes which set out the quality assessment results by publication or key use to demonstrate the impact the results will have on the key data we publish. We have also published supporting tables to allow users to see in detail the assessment we have made for each estimate.
Table 3: Primary publications and frameworks in Scotland that use Labour Force Survey and Annual Population Survey data
Publication Name |
Data Source |
Publication frequency |
Detailed Assessment |
---|---|---|---|
Labour Market Trends |
LFS |
Monthly |
|
Labour Market Statistics for 16 to 24 year olds |
APS |
Quarterly |
|
People, Places, and Regions |
APS |
Annually |
|
National Performance Framework: National Indicators |
LFS and APS |
Annually |
|
Fair Work Convention: Fair Work Indicators |
LFS and APS |
In development |
|
Job Quality Indicators |
APS |
In development |
Coefficient of Variation Estimates
In this context, the Coefficient of Variation (CV) is the ratio of the standard error of an estimate from the survey sample to itself. It shows the extent of variability in relation to the estimate, with a larger value indicating we should treat the estimate with more caution.
Coefficient of Variation estimates are useful as they are not dependent on the estimate being calculated and can be used to directly compare across and within different characteristics of a population in the survey.
We have applied the methodology we have historically used for assessing uncertainty in the data by calculating the CVs based on the proportion/rate estimates rather than the level estimates. There are various methods which can be used to assess the uncertainty in estimates and we will continue to explore the best approach to use.
Confidence Intervals
We review confidence intervals (CIs) for the key estimates we publish for Scotland. We calculate 95% confidence intervals. The CIs tell us how much uncertainty there is in an estimate, with wider CIs indicating more uncertainty.
We considered confidence intervals as part of our quality assessment, and found that all confidence intervals in the Scottish data assessed increased between the two time periods compared. As a rule, when sample sizes get smaller, and CVs increase, the confidence intervals around the estimates produced also increase. We have therefore not provided further details of the confidence intervals in this paper, instead focussing on the CV and sample sizes since these two aspects are correlated with confidence intervals and form the basis for our quality rules and confidence ratings.
Design Factors
The Labour Force Survey and Annual Population Survey are both household surveys which provide estimates for individuals. The selection of households is based on a simple random sample of addresses but people within households often share characteristics, such as ethnicity. The estimates for individuals are not always random in the same way as the household selection and will often cluster within households.
In addition to this, the Office for National Statistics (ONS) targets a proportion of households in each Local Authority area when collecting responses. This partitioning, or stratification, of the sample into pre-defined areas also reduces the randomness of the sample.
One method for accounting for this clustering of similar characteristics and stratification of areas is to apply a design effect or design effect factor (design factor). Design effects and design factors attempt to adjust the variance of an estimate to account for loss of randomness in the sample. The design effect is used when a response may not be sampled in the future and the design factor is used when a response may be sampled in the future. As households may be repeatedly sampled in the LFS and APS, the design factor is the appropriate measure to account for the clustering and stratification.
The design factor itself is a positive number. When it is greater than 1 it indicates there is more variance in an estimate than there would be if the sample was truly random. Similarly, when it is less than 1 it indicates there is less variance in an estimate, and when it equals 1 it indicates the variance is the same as if the sample was truly random.
Scottish Government analysts do not have access to the design factors used by ONS in their calculations. As such, the quality assessments in this paper are based on CVs using a design factor of 1. This is not realistic for some estimates, such as ethnicity and island local authorities. However, it is to be expected that in most instances the design factor would be greater than 1 and not have varied significantly between the two time periods. This in turn leads us to expect that the CVs we have calculated are smaller than if the correct design factor was applied. As such, our estimates will likely be underestimating the variability (CV estimates) of the data.
Weighting Methodology
In January 2025, ONS advised the Scottish and Welsh Governments that the standard weighting methodology applied to the LFS and APS data takes account of 32 categories, including region, sex and age. ONS also advised that their weighting methodology is appropriate for sample sizes as low as 30 and will provide an accurate national level figure for Scotland from this. However, due to continuously falling sample sizes, ONS have re-calibrated their weighting model to account for sample sizes as low as 20 and have provided assurance that a Scotland level estimate can still be accurately achieved from such a small sample.
Office for National Statistics Quality Rules
An article published by the ONS in May 2020 outlined disclosure rules based on sample sizes. Following a review of that article and consideration of a range of estimates regularly released by the Scottish Government, we expanded the ONS rules to additionally check the coefficient of variation for estimates with a sample size of between 3 and 10 inclusive. This additional check was introduced as the Scottish Government often release more detailed analysis for smaller populations than ONS statistics and these estimates are based on smaller and more variable samples.
There are three possible outcomes when applying the data quality rules:
- Robust – estimates can be published, although they may still be subject to greater sampling variability. They are based on a larger sample size which is likely to result in estimates of higher precision, although they will still be subject to some sampling variability
- Less Robust – estimates can be published and used with additional caution. They are based on a smaller sample size which may result in less precise estimates. These estimates would normally be shaded and their limitations highlighted to users.
- Not Robust – estimates cannot be published and are below reliability threshold due to small sample size or no people were recorded in this category in the survey. These estimates would be suppressed in any published outputs.
The quality assessment methodology used here applies our amended rules which are used to determine whether data is “robust”, “less robust”, or “not robust”.
Scottish Government Quality Rules
The amended data quality rules that the Scottish Government currently apply are based on sample sizes and the coefficients of variation. The rules are:
Robust
- estimates with a sample greater than or equal to 26 (these estimates may still be subject to greater sampling variability)
Less Robust
- sample size between 11 and 25 inclusive
or
- sample size less than 11 and coefficient of variation less than or equal to 20
Not Robust
- sample size less than 3
or
- sample size less than 11 and coefficient of variation greater than 20
We apply the above rules to the estimates included in our publications and indicators that are based on the LFS and APS data.
The current data quality rules also form part of the Statistical Disclosure Control methods applied to statistical outputs. These represent the minimum conditions that must be met in order to ensure that no individuals can be identified from the data published.
Data Confidence Rating
Estimates that are classed as “robust” and “less robust” under current rules could be:
- lower quality: at the lower end of the threshold needed to be given the quality rating, i.e. a “robust” estimate is close to being a “less robust” estimate or a “less robust” estimate is close to being “not robust”
or
- higher quality: data is comfortably in the range for sample sizes and below the threshold for uncertainty, i.e. a “not robust” estimate is close to being a “less robust” estimate or a “less robust” estimate is close to being a “robust” estimate
The standard data quality rules do not capture these nuances. This means that although some of the LFS and APS data will be classed as “robust” under the data quality rules, they may still have low sample sizes and large CVs.
To ensure we capture these important nuances and can accurately advise users about how to use the Scottish LFS and APS data, we applied data confidence ratings to each estimate we assessed. The data confidence ratings are based on established categories used by the Cabinet Office and we applied these to commonly used CV categories. We added an additional category of “No Confidence” to reflect the significant quality issues we are experiencing in some estimates for Scotland.
The categories we used are shown in Table 4.
Table 4: Description of data confidence ratings and coefficient of variation thresholds for each rating
Rating |
Coefficient of Variation Category |
Description |
---|---|---|
High Confidence |
Coefficient of Variation less than 5 |
Estimate can be used to draw accurate conclusions. Substantial trust in the information presented, which is likely to provide a good reflection of reality. |
Moderate Confidence |
Coefficient of Variation equal to or greater than 5 but less than 10 |
Estimate presented is a suitable, but incomplete, measure of reality and conclusions can be drawn but limitations should be understood. |
Limited Confidence |
Coefficient of Variation equal to or greater than 10 but less than 15 |
Estimate provides a restricted view on reality. It should be considered alongside other more reliable indicators and limitations must be understood before drawing conclusions or making decisions. |
Low Confidence |
Coefficient of Variation equal to or greater than 15 but less than 20 |
Caution should be taken when using this estimate to make decisions. Careful consideration of wider information is needed to put this estimate in context as it has a number of drawbacks and should not be taken as the principal estimate. |
No Confidence |
Coefficient of Variation greater than 20 |
Estimate has a number of drawbacks and should not be used. Consideration of other evidence is needed. |
Confidence not Assessed |
N/A |
Estimate has not been assessed and no conclusion of reliability can be made. |
This more granular look at the CV of an estimate allows us to make a more informed assessment of the estimates usability for our users
Contact
For enquiries about this publication please contact:
Labour Market Statistics,
Office of the Chief Economic Adviser
Telephone: 0131 244 6773,
E-mail: LMStats@gov.scot
For general enquiries about Scottish Government statistics please contact:
Office of the Chief Statistician
E-mail: statistics.enquiries@gov.scot