Scottish Schools Adolescent Lifestyle and Substance Use Survey (SALSUS): technical report 2018

Information on the fieldwork and data processing for the 2018 Scottish Schools Adolescent Lifestyle and Substance Use Survey (SALSUS).

This document is part of a collection

Data processing

This section covers the procedures used during the data processing stage. More detailed explanations are provided for variables that had to be derived from responses to questions.

Data Specification

Appendix D contains the full data specification that was followed in the data processing. Along with the question number and variable name, it shows the base for each question and rules that were applied when editing the data; for example, how missing values were treated and what happened when pupils did not follow the survey routing correctly (if using the paper questionnaire).

Specification of key derived variables

Scottish Index of Multiple Deprivation and urban/rural classification

The Scottish Index of Multiple Deprivation (SIMD) is a scale used to determine the relative deprivation of small areas across Scotland. An aggregate score is reached by combining 38 indicators from seven domains: current income; employment; health; education, skills and training; housing; geographic access to services; and crime. Postcodes were collected from pupils to establish the SIMD rank of the areas they lived in using the 2016 version. This was reported in quintiles, with 1 being the 20% most deprived areas and 5 being the 20% least deprived areas.

Overall, 65% of pupils (n=15,141) who returned questionnaires did not provide information on their postcode or gave incomplete postcode information. This is higher than in previous years, with 37% not providing an answer in 2013 and 43% in 2015. This in part reflects the increasing proportion of pupils completing the survey online (56% of those who completed a paper survey did not provide a valid postcode compared with 68% completing online).[19] Mode and non-response is discussed further in relation to individual questions (including postcode) in the 2015 Mode Effect Study.

Complete postcode information is important because it is used to assign SIMD categories. Due to the high number of pupils with missing postcode information, missing postcodes were imputed. This was done by sorting the data by class within schools. If a postcode was missing, the postcode of the preceding person was copied, provided they were in the same class. This allowed all pupils to be included in the SIMD analysis.

Although a large proportion of postcodes were imputed, it is important to note that SIMD still presents the best measure of deprivation for the analysis of survey findings. Had imputation not been conducted on postcodes, this could be problematic. For example, if pupils from more deprived areas were less likely to provide a postcode this would mean excluding them from analysis by SIMD, leaving smaller base sizes. We compared the findings from the family affluence question[20] with the postcode non-response and those who self-identify as less well-off were less likely to give a postcode. This suggests that it is indeed more likely for pupils living in deprived areas to be underrepresented in the SIMD analysis without postcode imputation.

The fact that base sizes are increased as a result of the imputation reduces the chance of a Type II error[21].


The Goodman Strengths and Difficulties Questionnaire (SDQ) was used to explore the relationship between substance use and mental health. The 'Strengths and Difficulties Questionnaire' was designed by Robert Goodman (1997) and is widely used by researchers, clinicians and education professionals. The questionnaire comprises 25 questions that are grouped into five scales, with each scale including five questions. The scales are:

  • emotional symptoms
  • conduct problems
  • hyperactivity/inattention
  • peer relationship problems
  • pro-social behaviour

Information on how to score the self-completed SDQ was obtained from the website . For each item in each of the five scales, the value of the responses 'Not true,' 'Somewhat true,' and 'Certainly true' are assigned a value from 0 to 2 (See Table 12). A total score of 0 through 10 is possible for each of the five scales.

Table 12: Values assigned to each item in each scale of the SDQ

  Variable name Not True Somewhat True Certainly True
Emotional Symptoms Scale
I get a lot of headaches, stomach aches or sickness somatic 0 1 2
I worry a lot worries 0 1 2
I am often unhappy, downhearted or tearful unhappy 0 1 2
I am nervous in new situations. I easily lose confidence clingy 0 1 2
I have many fears, I am easily scared afraid 0 1 2
Conduct Problems Scale
I get very angry and often lose my temper tantrum 0 1 2
I usually do as I am told obeys 2 1 0
I fight a lot. I can make other people do what I want fights 0 1 2
I am often accused of lying or cheating lies 0 1 2
I take things that are not mine from home, school or elsewhere steals 0 1 2
Hyperactivity Scale
I am restless. I cannot stay still for long restles 0 1 2
I am constantly fidgeting or squirming fidgety 0 1 2
I am easily distracted. I find it difficult to concentrate distrac 0 1 2
I think before I do things reflect 2 1 0
I finish the work I am doing. My attention is good attends 2 1 0
Peer Problems Scale
I am usually on my own. I generally play alone or keep to myself loner 0 1 2
I have one good friend or more friend 2 1 0
Other people my age generally like me popular 2 1 0
Other children or young people pick on me or bully me bullied 0 1 2
I get on better with adults than with people my own age oldbest 0 1 2
Pro-social Scale
I try to be nice to other people. I care about their feelings consid 0 1 2
I usually share with others (food, games, pens etc.) shares 0 1 2
I am helpful if someone is hurt, upset or feeling ill caring 0 1 2
I am kind to younger children kind 0 1 2
I often volunteer to help others (parents, teachers, children) helpout 0 1 2

Overall scores were summed for each of the five scales. Total Difficulties scores were also calculated as an overall measure of psychiatric health by summing the scores for Emotional Symptoms, Conduct Problems, Hyperactivity and Peer Problems, but excluding scores for Pro-Social Behaviour. The range of possible Total Difficulties score ranges from 0 to 40.

Some pupils did not answer one or more of the 25 SDQ items. To be able to calculate a score for each scale, pupils had to answer at least three out of the five items in that scale. For example, if a pupil did not answer three or more of the five items on Emotional Symptoms, an Emotional Symptoms score could not be calculated for that pupil. This same pupil may have answered all of the items in the Conduct Problems scales, and in this case, would have a Conduct Problems score. Total Difficulties scores were only calculated for pupils who had scores for each of the four components in the Total Difficulties score.

Scores for each of the five scales and the Total Difficulties score were grouped into categories of Normal, Borderline, and Abnormal (Table 13). These groupings are used in psychiatry to aid identification of pupils who are likely to have mental health disorders. The terminology used to describe SDQ scores is borrowed from the original questionnaire. While the terms 'Normal', 'Borderline' and 'Abnormal' may seem out-dated in the context of the language used to describe mental wellbeing today, they have been retained in this report to draw comparisons to previous years.

Table 13: Strengths and difficulties scoring

Score Category
Normal Borderline Abnormal
Total difficulties score 0-15 16-19 20-40
Emotional symptoms 0-5 6 7-10
Conduct problems 0-3 4 5-10
Hyperactivity/inattention 0-5 6 7-10
Peer relationship problems 0-3 4-5 6-10
Pro-social behaviour 6-10 5 0-4


Since 2010 the survey has included the Warwick-Edinburgh Mental Well-being Scale (WEMWBS). Developed as a tool for measuring mental well-being at a population level, the scale comprises 14 positively worded statements that relate to an individual's state of mental well-being (thoughts and feelings). Pupils were asked to indicate how often they had had such thoughts and feelings over the last two weeks.

The overall score was calculated by totalling the scores for each item (minimum possible score was 14 and the maximum was 70). The higher a person's score, the better their level of mental well-being. The mean was used as a measure of the average score and to compare different groups. Scores were calculated for pupils who gave a valid response to each of the 14 questions.

Family Structure

A variable on family structure (famstat) was computed for inclusion in a small number of tables in the National Overview and topic reports. This variable represents pupils' family structures in their main home only, and does not include information about a second home, if one exists. In the family structure variable:

  • pupils with a 'single parent' live with their own mother or father,
  • pupils with a 'step-parent' live with one of their own parents and one step-parent,
  • pupils with 'both parents' live with both of their own parents, and
  • pupils with an 'other' family structure do not live with either of their own parents and may live with foster parents, grandparents, older siblings, in a residential care home, or with other family members not represented.

There were 1,403 pupils for whom there was no reported family information. These pupils have been excluded from analysis involving the family structure variable.

Classification of smoking status

Pupils were classified as 'regular smokers' (defined as usually smoking at least one cigarette a week), 'occasional smokers' (defined as currently smoking but less than one cigarette a week) or 'non-smokers' (pupils who had never smoked or who were not current smokers) using a variable (smokstat) derived from question 7 of the questionnaire.

As not all pupils responded to question 7, it was not possible to classify all pupils as regular smokers, occasional smokers or non-smokers. Pupils who could not be classified were excluded from the tables that use smoking status as an investigatory variable. However, the pupils with unknown smoking status were included in the 'all pupils' category.

Classification of parents' and siblings' smoking status

Parents' smoking status was derived from responses to question 23. The variable 'parsmoke' includes pupils whose parents do not smoke daily, who have at least one parent who smokes daily, and those who do not see either parent. The variable has a high number of missing values because pupils who say their parents (or one parent) smoke occasionally are excluded, as are pupils who do not know the smoking status of their parents.

Whether or not pupils' siblings smoke was also derived from question 23. This variable, 'sibsmoke' was derived in the same way as 'parsmoke,' and thus has the same limitations of not including information on pupils whose siblings smoke occasionally or pupils who do not know their siblings' smoking status.


A variable was derived using question 49 to capture whether or not pupils had taken any drugs in the last month, in the last year, more than a year ago or never. A bogus drug 'semeron' was included in the list of drugs presented in questions 49 and 55 in the questionnaire. This is included to highlight where pupils might be exaggerating their drug use i.e. answering that they used a drug when they do not know what it is.

The analysis was set up to exclude pupils who reported that the only drug they had ever used was semeron from tables that report on the use of any drugs. No pupils reported using semeron and no other drugs.

Historically, pupils who claimed to have taken semeron but also reported taking other drugs were included in the analysis. This approach has been kept the same to ensure that the trend data is not affected.

A variable was also derived to capture whether or not pupils have ever been offered any drugs listed in question 49. There were no pupils who reported having been offered semeron but no other drug.

Age at which first smoked, drank, got drunk and took drugs

Question 73 asks pupils to report at what age they first smoked a cigarette (more than a puff), drank alcohol (more than a sip), got drunk and used drugs. For consistency with previous waves, ages below five were presumed to be unlikely. Because the SALSUS paper questionnaires were entered through scanning, it is probable that some values below 4 represent errors where the scanner failed to register the digit 1 preceding the value (i.e., read in 4 instead of 14). Rather than exclude pupils who reported an age between 0 and 4, we chose to add a value of 10 to these ages so as to keep them in the calculations without skewing the averages. This had a minimal effect on the distribution of ages, but enabled more accurate calculation of average ages of first substance use.

Pupils who gave a valid age for first use of a substance but had previously indicated (in other questions) that they had never used this substance were excluded from the analysis.

Calculating paternal and maternal knowledge scores

Pupils' perceptions of their parents' knowledge of their behaviours was assessed in questions 45 and 46 of the questionnaire. Pupils were asked how much they thought their mother and their father knew about five factors relating to their friends and activities:

  • 'Who your friends are'
  • 'How you spend your money'
  • 'Where you are after school'
  • 'Where you go at night'
  • 'What you do with your free time'.

The response option 'I think s/he knows a lot' was given a value of 2, 'I think s/he knows a little' a value of 1, and 'I don't think s/he knows anything' a value of 0 for each of the 5 items listed above. The values for each of the items were aggregated to give total paternal and maternal scores, represented in the derived variables 'mumscore' and 'dadscore'. These variables have values ranging from 0 to 10. Scores of maternal and paternal knowledge could only be calculated for pupils who answered all of the five items in questions 45 and 46.

Pupils' perceptions of parental knowledge vary by age group and sex. To take account of this variation, paternal and maternal knowledge scores were banded into below median, at median or above median scores separately for each age group: 13 year olds and 15 year olds.



Back to top