Appendix 3: GHQ model development
Many models are discussed in this section; all odds ratios given are for the sample with complete data on all variables contained in any of the 'final' models, allowing them to be directly compared, unless it is stated otherwise. This sample size was 1,795, as variables only collected in the nurse subsample are included.
In the initial logistic regression model containing only age, sex and residence (Model 1), residence in Greater Glasgow and Clyde had an odds ratio of 1.67, representing increased odds of having a high GHQ-12 score for those who reside in Greater Glasgow and Clyde compared to the rest of Scotland. When SIMD was added to the model (Model 2) the odds ratio decreased slightly to 1.61, showing that a small amount of the increased risk was explained by the different rates of deprivation in Greater Glasgow and Clyde. McFadden's pseudo R 2 was 0.027 for Model 1, and 0.040 for Model 2, showing that Model 2 was a better fit to the data.
When the socio-economic variables had been added to the model and backward selection performed, age, sex, SIMD, receiving income-related benefits, economic activity and marital status remained in the model (Model 3). Residence in Greater Glasgow and Clyde had an odds ratio of 1.48, which was just outside of significance at a 5% level, however when the whole sample available was included in the model the odds ratio was significant, so it was kept in the model. The same applied to SIMD and income-related benefits; they were not significant predictors using the restricted sample, but remained in the model as they were significant when the full available sample was included. McFadden's pseudo R 2 for the model with the restricted sample was 0.094, showing this model to be a better fit than Model 2.
When the behavioural variables had been added to the model and backward selection had been performed (Model 4), the variables which remained in the model were age, sex, residence, receiving income-related benefits, economic activity, educational qualifications, marital status, smoking status, potential problem drinking, abstaining from alcohol and physical activity level. Residing in Greater Glasgow and Clyde had an odds ratio of 1.45, which was again not quite significant at a 5% level of significance, but in the model containing the full available sample residence was significant. The same is true of receiving income-related benefits, educational qualifications and smoking status, so all remain in the model. McFadden's pseudo R 2 for the model with the restricted sample was 0.117, showing this model to be a better fit than Model 3.
The biological variables were then added to the model and backward selection performed. As all the blood analytes dropped from the model it was re-run without the blood analytes, allowing the nurse sample to be used rather than the blood sample, increasing the sample size. After backward selection was run the variables remaining in the model (Model 5) were age, economic activity, marital status, binge drinking, potential problem drinking, physical activity level and waist-hip ratio. As it is not possible to re-run this model using a larger sample due to the inclusion of waist-hip ratio, which was measured at the nurse visit, it is not possible to know whether, as in the earlier models described, residence in Greater Glasgow and Clyde would have been significant in a larger sample. McFadden's pseudo R 2 for the model with the restricted sample was 0.097, showing Model 4 to be the best fitting model.