Following extensive consultation, the Guiding Principles for Data Linkage were published on 8 November 2012 to support data users, data controllers and other decision makers (for example ethics and privacy committees) in their responsible use of data for statistics and research purposes.
There are six Guiding Principles which should be considered before undertaking a data linkage project. These can be viewed at www.scotland.gov.uk/GuidingPrinciplesforDataLinkage.
Below are some good practice examples of how the Principles are being applied.
1. Public Interest
The Scottish Health and Ethnicity Study (SHELS) was established to explore the health experiences across different ethnic groups in Scotland. The project linked Census 2001 data to health data, allowing for ethnic background to be linked to the number of times an individual experienced a disease or health condition. With respect to public interest, a professional ethicist was asked to judge whether the research successfully balanced individuals’ rights to data privacy and the potential benefits to society. The ethicist deemed this to be the case.
Linking these data has allowed for research to be undertaken into the effects of cardiovascular diseases, cancers, breast cancer screening, maternal and child health and mental health relative to ethnic background. For example, differences in heart failure rates across ethnic groups within Scotland have been identified. Specifically, ‘other’ White British and Chinese men suffered less heart failure when compared to White Scottish men, with Pakistani men found to suffer the greatest incidence of heart failure. A similar pattern was identified across the female population in Scotland. Further work indicates that these results are not due to socioeconomic differences (where data was adjusted for highest education). These results support the need for further research in this area to enable targeted health care provision relevant to the requirements of different ethnic groups across Scotland.
2. Governance and Public Transparency
The Scottish Longitudinal Study (SLS) was set up in response to a lack of longitudinal datasets in Scotland and provides high quality data that can be used to provide insights into the health and social status of the Scottish population and how this changes over time. The SLS has been established by linking data from administrative and statistical sources, including Census data from 1991 onwards, vital events data (births, deaths, marriages), NHS Central Registration data (information on migration in/out of Scotland) and education data (Schools Census and Scottish Qualifications Authority data). It is also possible to link SLS data to NHS Information Statistics Division (ISD) health data, however this is not held permanently within the SLS database.
To ensure that data is used responsibly, the SLS have established a governance structure comprising a Steering Committee and Research Board. The Steering Group is responsible for data protection, confidentiality and security issues on behalf of the SLS and is made up of representatives from the Longitudinal Studies Centre Scotland, the National Records of Scotland, the Office for National Statistics, the National Health Service Central Register, ISD and the Education Department of the Scottish Government.
The Research Board reports to the Steering Group and is responsible for assessing applications to use the SLS. It is made up of representatives from the main data providers and the SLS team as well as an independent academic. A list of representatives on both the Steering Committee and Research Board and their affiliations can be found on the SLS website (http://sls.lscs.ac.uk/about/people/).
There are three aspects of Privacy which should be considered when linking data:
Established by the Scottish Government, the Scottish Health Survey (SHS) provides information about the health and lifestyle of people across Scotland that cannot be obtained from other sources. The SHS has a number of aims, including estimating the prevalence of specific health conditions across Scotland, associated risk factors and any regional variations and monitoring the nation’s health over time.
During a face-to-face interview, respondents are asked to consent to their name, address and date of birth being sent to the Information Services Division (ISD) of NHS Scotland. This allows for their response to the Health Survey to be linked with health records which provide data on medical diagnoses, in-patient and out-patient hospital visits, cancer registration and GP registration. Where consent is not given, the linkage does not take place.
Linkage of respondents’ Health Survey data to their health records provides a powerful insight into the general health of the Scottish population, allowing for strategies to be developed to support anticipated changes in healthcare requirements.
The Scottish Government Education Department holds information on both looked-after children and school outcomes (such as attendance and qualifications). For the past three years this information has been linked to assess the educational outcomes of looked-after children compared to their peers. This has resulted in annual publications which have been extremely useful in measuring the gap between looked-after and non-looked-after children’s outcomes and also helping to understand the range of outcomes among looked-after children.
A number of steps are taken to ensure the anonymisation of this linked data. Names and dates of birth are removed from datasets (in this instance both looked-after children and school outcomes) before it is received by the Scottish Government. The results of the linked data research are produced as summary figures and tables which show, for example the percentage of school leavers under 16 years old from looked-after and non-looked-after children. All the findings (termed ‘outputs’) are disclosure checked by Scottish Government statisticians as part of the quality assurance process. This means that the results are checked by a responsible person to make sure that individuals cannot be identified from the figures once they are published.
The Scottish Health and Ethnicity Study (SHELS) was established to explore the health experiences across different ethnic groups in Scotland by linking Census and health data. SHELS has a number of security measures in place to ensure responsible use of linked data. For example, linkage is performed on a stand-alone computer in a National Records for Scotland (NRS) ‘safe-haven’. The safe-haven is ultimately a secure office containing a computer to carry out research on the linked data. Data are transferred to the safe-haven on encrypted memory sticks by trained staff that have undergone baseline security clearance. A one-way encryption method (‘hashing’) is used for any data which could be used to identify an individual, namely the Community Health Index number and the Census number.
Access to the safe-haven is limited and controlled by a keypad on the door. Only authorised personnel can enter the safe-haven. To further safeguard data, the safe-haven computer system for SHELS has been modified to allow monitoring of all activity undertaken by researchers. This ensures that only approved work is carried out. All external devices, for example printers and USB devices (e.g. external hard drives and memory sticks) are disabled to prevent data being taken outside of the safe-haven. The NRS also maintains a register of visits to the safe-haven.
4. Access and Personnel
The Scottish Longitudinal Study (SLS) provides high quality data that can be used to provide insights into the health and social status of the Scottish population and how this changes over time. SLS have taken a number of steps to ensure that the people accessing the data for research and statistical purposes do so responsibly. For example, a Data Access Agreement, outlining the terms and conditions of use between the data providers and the SLS, is developed for each of the datasets needed to undertake a project. The Data Access Agreement is agreed by both parties before any work is carried out and is reviewed on an annual basis.
The SLS also maintain an Access Control Policy which defines the access rights and security controls for SLS staff that need to use the data and systems as part of their job. Only a limited number of staff have access to the full database. Users never have access to the full range of data held by the SLS, and are only able to access the information (termed ‘variables’) which they have requested for their project and which has been approved by the Research Board.
Once a project has been approved by the Research Board, the SLS user and any collaborators must complete an SLS Undertaking Form. This form covers disclosure, confidentiality and ethical issues relating to the users responsibilities when handling SLS data. Users must also read and sign the Census Confidentiality Undertaking, and if using NHS Information Services Division health data they must also complete one of the Scottish Informatics Programme (SHIP) approved training courses.
Following approval and completion of the Undertaking Forms, a user-specific dataset of the SLS is created. Due to the sensitive nature of the data, it is only accessible on non-networked, password protected computers at the safe-setting in Edinburgh. The SLS user is supplied with a username and password which lasts only for the duration of pre-booked attendance at the safe setting.
One further step that the SLS take to limit any risk of an individual being identified by linked data is to ensure that no one involved in the project has access to the linked data in its entirety. This is achieved using a process called Separation of Functions. The following example outlines how a project linking Census data (used in the SLS) to the NHS Central Register (NHSCR) might be undertaken to ensure Separation of Functions:
The Census team add a new tag – termed a ‘unique identifier’ - to the SLS sample of their data. They then send a list of these tags along with the linking information (for instance forename, surname, sex, date of birth and postcode) to a third party who will link the Census data to the NHSCR data. Once the linkage is complete the linking information (names, date of birth and so on) are removed and the results are sent to the SLS team. The Census then sends their data (with the linking information removed) to the SLS team. The unique identifiers are then used to link the data.
Beyond 2011 is a programme established by the National Records of Scotland to explore the future provision of population and socio-demographic statistics in Scotland. Before linking Beyond 2011 data to other data sources, users must sign a Data Sharing Agreement. This outlines the rules and regulations for data users including strict internal access control policies. If a breach of the Data Sharing Agreement occurs, the organisation providing the data to be linked can terminate the agreement. They can also request that the National Records of Scotland destroy the data linked for the specific project in accordance with Her Majesty’s Government’s (HMG) Information Assurance Standards.
To further ensure responsible use of data, National Records of Scotland employees using Census data for the purposes of the Beyond 2011 programme are subject to the offences set out in the Census Act of 1920. This states that any release of personal census information by the Registrar-General, someone under their control or a supplier of services to them, without lawful authority, is an offence which will lead to a fine and/or imprisonment up to a maximum of 2 years.
For hard copies of the Guiding Principles for Data Linkage or any further information, please contact Claire Wainwright by email Claire.Wainwright@gov.scot or by phone on 0131 244 0618.