Information

Scottish Parliament election: 7 May. This site won't be routinely updated during the pre-election period.

Scottish Climate Survey: technical report 2024 to 2025

Technical report supporting the publication 'Scottish Climate Survey: main findings'


Data validation and management

Questionnaire versions

As described in earlier sections, the data have been collected from two sources: an online questionnaire and a postal questionnaire. The online questionnaire included some built-in routing and checks within it, whereas the postal questionnaire relied on correct navigation by respondents and there was no constraint on the answers they could give.

In addition, the online data were available immediately in their raw form, however the postal questionnaire data needed to be scanned and keyed as part of a separate process. Tick box answers were captured by scanning, and numbers and other verbatim answers were captured by keying, with the data then coded in an Ascii text string.

In line with standard procedures on a mixed mode survey such as this, the online questionnaire was taken as the basis for data processing. Once that was processed then a data map was used to match the data from both postal questionnaire versions with the online data.

A wide range of edits were carried out on the data followed by numerous checks, outlined below.

Data editing

Postal data – forced edits

The postal data was subject to errors introduced by respondents and subsequently edits were required for this data. There are five key principles to editing postal data which were drawn upon for this:

  1. Back editing a filtered question if the filter is a single coded possibility. This meant that if a filtered question was answered but the filter origin question contradicted that answer (blank or different), then the origin question was changed to be the answer for the filter question.
  2. There were no back edits if the filter had more than one possibility, in which case a forward edit was applied (deleted answers from filtered questions).
  3. There were no back edits if the only answer was a ‘negative’ response (‘none’, DK). Forward edits applied to these questions.
  4. If a positive answer was given alongside a negative one (a reason + None/DK) – then the negative answers were removed.
  5. If a question was incorrectly answered as a multi-code question when only one answer should have been selected, then a digit from the respondent ID was used to randomly select an answer.

Any questions that could not be edited using the principles above or which were incorrectly left blank / not answered were marked as ‘Not stated’ in the data. The ‘Not stated’ responses were excluded from the bases used for analysis at each question.

One of the most common mistakes made by respondents to the postal questionnaire was found at the question asking them to name the most effective actions someone living in Scotland could take to reduce their contribution to climate change (QD1). The instructions at this question asked them to “select the top four most effective actions, and write the number 1, 2, 3, or 4 next to these actions, with "1" being the most effective and "4" being the least effective of the four.” However, it was found that a number of postal respondents (322) wrote ‘X’s at this question, rather than numbers. As a result, these respondents were excluded from the bases of the follow up questions (QD1_1 to QD1_3), which ranked the actions from most to least effective.

Duplicate responses

Some cases were removed from the data if the same respondent at the same address completed both an online and the postal survey (nine cases in total)[1]. In these instances, the online questionnaires were prioritised as that represents a more complete set of data.

Base sizes

Since more than one respondent per household could participate in the survey, for any questions requiring a factual response the results were based on households (rather than all respondents), to avoid double counting. For these questions, the answers of the respondent who said they were responsible for paying the energy bills in their household (at questions BILLS) were prioritised. If both respondents from a household said they were responsible for paying the energy bills, one respondent was randomly selected from the household.

The following questions used the household as the base for analysis:

  • QB1
  • QB2
  • QB3
  • QB4
  • QB5
  • QB6
  • QC1
  • HOUSETYPE
  • AFFORD
  • CUTBACK
  • QF1
  • QF2
  • QF3
  • QF4
  • QF5
  • QF6
  • QF7
  • QF8
  • QF9

8.4 Coding

Coding was carried out on five questions with an ‘Other – specify’ response option (to be used if a respondent’s preferred response option was not available from the list provided)[2]. Coding is the process of analysing the content of each response based on a system where unique summary ‘codes’ are applied to specific words or phrases contained in the text of the response. The application of these summary codes and sub-codes to the content of the responses allows systematic analysis of the data.

Ipsos used a web-based system called Ascribe[3] to manage the coding of all the text in the responses. Responses were uploaded into the Ascribe system, where a member of the Ipsos coding team then worked systematically through the comments and applied a code to each relevant piece of text.

The Ascribe system allowed for detailed monitoring of coding progress, and the organic development of the coding framework (i.e. the addition of new codes to new comments). A team of coders worked to review all the responses after they were uploaded on Ascribe, with checks carried out on 5% of responses.

Data checks

Ipsos created a set of data tables based on a table specification which outlined the details of each question, the base unit for analysis, the different demographic subgroups to include for analysis (such as age, gender or rurality) and details of any combined variables[4] to include.

The tables were checked against the table specification, ensuring all questions and all categories from question were included, that base sizes were correct (e.g. for filtered questions), base text was correct, the combined variables and sub-groups chosen for analysis added up and were using the correct categories, nets were summed using the correct codes, and that summary and recoded tables were included.

If any errors were spotted in the data, these were then specified to the data processing team in a change request form. The data processing team then amended the tables based on this and the tables were rechecked after the changes were made. Checking was an iterative process and where any errors were spotted on the data tables these were re-run.

 

 

[1] Where the same name, gender and age was given in the contact details section for both surveys.

[2] QC2, QE2, QF5, QF6 and Jobsector.

[3] Ascribe Coder software: https://ascribe.voxco.com/

[4] For example, combining ‘Strongly Agree’ and ‘Tend to agree’ responses to create an ‘Agree’ variable.

Contact

Email: emily.creamer@gov.scot

Back to top