Appendix B - Note on weighting for SPACES project
GUS SPACES was a follow-up study for Phase 1 participants of GUS Sweep 8 hence there were three types of respondents, namely: participants in GUS SPACES, non-participants who were given a chance to take part and non-participants who weren't given a chance to participate ( GUS Phase 2 sample members). To account for:
i) the fact that GUS SPACES is only based on phase 1 and full productives;
ii) (non-) consent to take part in follow-ups;
iii) (non-) response to GUS SPACES.
non-participants were treated as one group of GUS SPACES unproductives and one step calibration was made to adjust the achieved sample to totals. GUS sweep 8 longitudinal sample weights (w8_baby) were used as entry weights for calibration and development of longitudinal weights for GUS SPACES and the GUS sweep 8 cross-sectional weight.
As an outcome, two weights were developed for GUS SPACES.
1. MRC_child: A longitudinal weight for analysis of GUS SPACES data for children whose prime carer has responded at every earlier sweep of GUS
2. MRC_fullC: A cross-sectional weight that should be used for any cross-sectional (sweep 8/age 10) analysis of the GUS SPACES data ( i.e. data collected about the child). All children that completed follow-up have a cross-sectional child weight.
For the purposes of describing the weighting, respondents in GUS SPACES have been named Sample A and Sample B. These samples are defined as follows:
- Sample A - children whose carers had responded at all previous sweeps 1-8
- Sample B - children whose carers had participated in GUS SPACES but had missed one or more interviews in GUS Sweeps 2-7.
The two samples were treated separately during the weighting. This is because the Sample B respondents are likely to have different characteristics to those in Sample A, as suggested by their much lower response rates. There were 737 Sample A respondents and 775 Sample A+B respondents.
Longitudinal weights ( MRC_child)
Longitudinal weights were only generated for respondents in Sample A. Calibration weighting methods were applied which take the pre-calibrated weighted combined sample and adjusts the weights using an iterative procedure. The resulting weighting factors, when applied to the combined data, will make the survey estimates match a set of population estimates for a set of key variables. The population estimates in this instance are survey estimates from Sample A, weighted by the main GUS sweep 8 longitudinal weight (w8_baby). Since the longitudinal weight corrects for sampling error and non-response bias at each stage of GUS, the weighted Sample A estimates are the best population estimates available. The choice of the variables to use in the calibration was dictated by the bias remaining in the data after the SW8 longitudinal weights were applied to Sample A. The key variables used in the weighting are presented in the table below:
Variables used in the calibration of the longitudinal sample
- Respondent age (Ragegrp)
- Last known tenure (tenure)
- Family type (whether the respondent was a lone parent) (DhHGrsp04)
- Location of household (UR2FOLD)
- Scottish Index of Multiple Deprivation 15% most deprived data zones ( SIMD15_12)
Cross-sectional weights ( MRC_fullC)
Cross-sectional weights were generated for all GUS SPACES respondents (Sample A+B). Calibration weighting methods were applied which takes the pre-calibrated weighted combined sample and adjusts the weights using an iterative procedure. The resulting weighting factors, when applied to the combined data, will make the survey estimates match a set of population estimates for a set of key variables. The population estimates in this instance are survey estimates from Sample A+B, weighted by the GUS sweep 8 cross-sectional weight (w8_fullB). The choice of the variables to use in the calibration was dictated by the bias remaining in the data after the SW8 cross-sectional weights were applied to Sample A+B. As the difference between sample A and A+B is only of 38 observations the key variables used in the weighting are exactly same as for longitudinal solution.
Sample efficiency of GUS SPACES data
Adding weights to a sample can affect the sample efficiency. If the weights are very variable ( i.e. they have very high and/or very low values) the weighted estimates will have a larger variance. More variance means standard errors are larger and confidence intervals are wider, so there is less certainty over how close the estimates are to the true population value.
The effect of the sample design on the precision of survey estimates is indicated by the effective sample size (neff). The effective sample size measures the size of an (unweighted) simple random sample that would have provided the same precision (standard error) as the design being implemented. If the effective sample size is close to the actual sample size then we have an efficient design with a good level of precision. The lower the effective sample size, the lower the level of precision. The efficiency of a sample is given by the ratio of the effective sample size to the actual sample size. The range of the weights, the effective sample size and sample efficiency for both sets of weights are given in the table below.
Range of weights and sample efficiency
Applying the weights
For any cross-sectional analysis, i.e. any analysis of GUS SPACES and GUS sweep 8 data only MRC_fullC weights should be applied.
The longitudinal weight should be used for any analyses of GUS SPACES data with information from previous GUS Sweeps.
Description of weights variables in the data file
GUS SPACES Weights for Longitudinal Child sample
GUS SPACES Weights for Cross Sectional Child sample
Email: Ganka Mueller, firstname.lastname@example.org
Phone: 0300 244 4000 – Central Enquiry Unit
The Scottish Government
St Andrew's House