Joined up data for better decisions: A strategy for improving data access and analysis

This strategy document sets out how we will achieve our ambitions to build on existing programmes to create a culture where legal, ethical, and secure data linkage is accepted and expected; minimise risks to privacy and facilitate full realisation of the benefits of data linkage. This strategy is published alongside 'Joined-up data for better decisions: Guiding Principles for Data Linkage'.

Section 1: Scope and Explanation of Data Linkage

For the purposes of this Framework, data linkage is the joining of two or more administrative or survey datasets to greatly increase the power of analysis then possible with the data.

This framework is concerned exclusively with the linkage of data for research and statistical purposes where there is no direct impact on an individual because of information about that individual being linked. Examples can be seen as falling into three categories:

  • Development and production of Official Statistics, including the production of aggregate statistical information.
  • Production and dissemination of research resources, such as longitudinal statistical products like the Scottish Longitudinal Study.
  • Ad-hoc research projects, or linkages conducted to answer specific research questions using statistical analyses, such as the West of Scotland Coronary Outcomes Prevention Study.

This framework concerns linkages for research and statistical purposes only. It does not cover the sharing of personal information about an individual between organisations in order to deliver a co-ordinated service to that person. Data linkage for that purpose raises a different set of legal, ethical, and logistical issues. The following examples are all beyond the scope of this framework:

  • A Child Protection Officer sharing a particular family's case file with a School and the Police, in order that all three can work together to protect a child at risk.
  • A Local Authority sharing information about named individuals claiming Housing Benefit with any other organisation for the purpose of combating fraud.
  • A GP sharing information about an individual patient's symptoms or diagnosis with a hospital in order that the patient receives a co-ordinated service from all parts of the health service.

Two examples of data linkage for statistics are given below.

Example 1: The Scottish Health Survey

The Scottish Health Survey is a sample survey conducted through face-to-face interviews. Respondents are asked about a wide range of health issues including smoking, alcohol intake, diet, and levels of physical activity. Some biological measures are also taken, such as waist and hip circumference.

All aspects of the Scottish Health Survey, including data linkage, are approved by The National Research Ethics Service before being conducted.

All respondents are asked to consent to their name, address and date of birth being sent to the Information Services Division of NHS Scotland (ISD) so that their responses to the Health Survey can be linked with records holding data on medical diagnoses, in-patient and out-patient visits to hospital, and other information about cancer registration, GP registration and mortality.

Where the respondent gives consent for linkage the following process then occurs:

  • First, respondents' name, address, date of birth and a unique serial number (which is different to that used on the publicly available survey dataset) are separated from the rest of the health survey dataset (all the responses to the health survey questions) and sent by the survey contractors to ISD.
  • ISD then link respondents name, address, date of birth and a unique serial number, with the health records, and delete the respondents name, address, date of birth. This leaves a file of unique serial numbers and administrative health data. This file is then sent to a named analyst in Scottish Government.
  • The Scottish Government analyst then merges that file with the data collected through the Health Survey, using the unique serial number. The unique serial number is then deleted and a new random one added.
  • This dataset is then analysed, results are checked for risk of disclosure, and the aggregate results and conclusions and disseminated as widely as possible.

In this example, all data are sent between the three organisations by secure FTP (File Transfer Protocol) servers and can be accessed only by a small number of named people in each organisation. None of the three organisations - Scottish Government, the contractor, or ISD - has access to both survey and health records with personal identifiers attached at any time.

Example 2: The Scottish Health and Ethnicity Linkage Study

The Scottish Health and Ethnicity Linkage Study (SHELS) aims to explore the health experiences across different ethnic groups in Scotland. The project involves the linking of Census 2001, from which ethnic group can be obtained, to health data, which shows the number of times an individual has experienced a disease or health condition.

To do the linkage Census Division, within National Records of Scotland (NRS), sent a dataset to a safe haven, physically located within a NRS building. This dataset contained name, date of birth, address and a unique serial number for each person who completed a Census questionnaire in 2001. No other Census variables are included on this dataset.

Information Services Division of NHS Scotland (ISD) also sent a dataset, derived from health records, to this safe haven. This dataset also contained name, date of birth and address and also a unique serial number which is an encrypted version of the CHI number. No health related variables were on this dataset. A named ISD employee linked the two datasets in the safe haven and then removed name, date of birth and address to leave a file containing the encrypted CHI number and the unique Census number. This file, called the index, was sent to a named individual within Demography Division of NRS.

ISD then prepare a separate dataset containing the encrypted CHI number and health data but not name or address and send this to Demography Division. Census Division then send a file to Demography Division containing the Census unique serial number and some demographic information, but again not name or address.

Demography Division then put these two datasets together using the index, resulting in a linked dataset containing health information and ethnic group, but not name or address. This dataset is stored in the safe haven within NRS, where analysis takes place, and all results are checked for risk of disclosure before being released from the safe haven.

Neither ISD or NRS, nor any individual within these organisations, has access to both the Census and health data with personal identifiers attached. Explicit consent from data subjects was not sought for this study. All methods were approved by the Multicentre Research Ethics Committee (Scotland), the Community Health Index Advisory Group and the NSS-NRS Privacy Advisory Committee.


