A Scotland-wide Data Linkage Framework for Statistics and Research: Consultation Paper on the Aims and Guiding Principles

The main purpose of this consultation is to seek views on the aims of the Data Linkage Framework and a draft set of guiding principles.

2. The benefits of data linkage and the need for a Scotland-wide framework

This section sets out the long-term benefits that can arise from data linkage, followed by a summary of the challenges that prevent more data linkage occurring. This demonstrates why a strategic framework is needed to help overcome the challenges and encourage data linkage, particularly across sectors (e.g. health, education, justice etc).

Benefit 1: Data linkage will help speed up cycles of improvement through the delivery of a higher quality cross-sectoral evidence base to inform public policy and strategic spending decisions

High quality policies and delivery of services are heavily dependent on high quality information about what problems exist, how they come about and what works in tackling them.

Benefit 1

By making better use of data that already exist the evidence base that forms the foundation of public services in Scotland will be improved. This will in turn allow for improved decision-making, and allow for funding to be better targeted at delivery mechanisms that have been shown to really work.

For example, it is known that:

  • School attendance for Looked After Children is lower than for the population as a whole. Looked After Children are 8 times more likely to be excluded from school than other children.
  • The average tariff score (a measure of attainment) for Looked After Children leaving school is 67 compared to 372 for all school leavers.
  • 59% of looked after children leave school for a positive destination compared to 87% of the population as a whole.

These insights were achieved through linking data from Local Authority social work service departments, publicly funded schools, the Scottish Qualifications Authority and Skills Development Scotland. It is exactly this kind of evidence that is needed for improvement in policies and delivery to occur. If health data or income records were linked in to these analyses too, a more detailed picture of how outcomes for Looked After Children come about could be formed. Public services would then be able to more usefully direct resources and support where it is most needed.

Benefit 2: Enable better use of existing data to develop efficient methods of producing demographic and census-type statistics

Linking administrative datasets offers the potential to provide more frequent and up-to-date counts of the population and basic information on its characteristics, such as gender and age profiles. This could remove the requirement for an expensive decennial census, or at least reduce the scale and complexity of it, as well as increasing the timeliness of key population statistics central to funding decisions and strategic decision making across Scotland.

National Records of Scotland is assessing the feasibility of replacing traditional census enumeration with alternative methods, and data linkage is part of this research. Stakeholders will be consulted on alternative census options in due course. A similar programme of work is being taken forward by the Office for National Statistics to establish and test models that will, in future, meet users' needs for census type statistics in England and Wales. National Records of Scotland will work with the ONS to maximise the potential for harmonious statistical outputs (i.e. population and socio-demographic statistics) where practicable and mutually beneficial.

Benefit 3: Data linkage will increase the power of official statistics available to all

Scottish Government and others collate statistics from Local Authorities and other public bodies, quality assure that data, analyse and publish aggregate statistics. This provides all interested parties with a national picture, and comparable information for bench marking, which is a vital component of performance management and improvement. Where data provided to Scottish Government can be linked, safely and securely, at a record level, the value and power of the analysis can be greatly enhanced. This is already occurring in the education sector, through ScotXed - the Scottish Exchange of Education Data programme.

The development of ScotXed provided the impetus and platform for improving the data held by schools and Local Government, and sharing those data at an individual pupil level safely and securely. This has allowed for linkages to be made between different aspects of the educational experience as well as allowing data from different parts of local government (e.g. social work and education) that were previously disconnected to be linked. It has also allowed the outcomes for pupils after education to be linked with educational experiences.

The statistics that are produced are published as National Statistics as well as being used for additional analysis to support policy development and are securely shared with academics to assist their research. Their National Statistics status means they have been assessed by the UK Statistics Authority as being produced and explained to high standards and that they serve the public good. For example the "Summary statistics for attainment, leaver destinations and school meals" National Statistics publication draws on data on attainment shared with Scottish Government by Scottish Qualifications Authority linked with data shared by Skills Development Scotland on pupil destinations post school and information shared by all 32 Local Authorities on their pupil populations. This is an excellent example of collaborative working to enable data sharing for the purposes of linking data to add significant value to that available from the separate data sets.

A central component of ScotXed was the investment in data quality. It was not just a question of linking the data that already existed - new collections had to be developed and existing collections had to be improved to make linking it worthwhile. A programme of investment to improve existing data quality and methods of securely and efficiently sharing information to allow for their future inclusion in linkage projects will be a necessary component of the Data Linkage Framework.

Experience tells us that advances in data quality held in administrative systems across different organisations only occurs with sufficiently high-level endorsement of the activity and resource investment. Experience also tells us that the benefits to the data custodians in terms of improving management information are well received. Strategic decisions about when and where to focus investment, based on consultation with data custodians, policy makers and analysts, will be necessary.

Benefit 4: Data linkage will allow relatively low cost longitudinal research to be conducted both retrospectively and prospectively, informing preventative spend

Longitudinal surveys, which interview the same individuals multiple times over many years, are expensive and difficult to implement in a robust way and are limited to the extent that they only provide information about the cohort under study. Where administrative data sets exist, data linkage offers a more efficient way to get much of this information, to a higher level of quality, and to constantly refresh the cohort being studied giving much richer, more broadly applicable results. This allows longitudinal surveys to focus on questions that cannot be answered from linked administrative data such as those relating to attitudes, thoughts and feelings.

One longitudinal linkage study that is already underway is the Scottish Longitudinal Study. This covers 5.3% of the Scottish population, and has been created using data available from the 1991 and 2001 Censuses, data from civil registration (births, deaths, marriages), GP registrations on migration in or out of Scotland, information on attendance and attainment from Scottish publicly funded schools and has the facility to incorporate information from NHS records on cancer registrations and hospital admissions.

The Scottish Longitudinal Study has been used to explore some key health and social issues for Scotland:

  • Boyle, Feng and Raab[1], for example, showed that there is an increased mortality risk due to widowhood. It was previously thought that this 'widowhood effect' could be due to selection, i.e. married couples often share characteristics related to health risks. However, the study found that risk was the same regardless of the cause of death of the spouse, suggesting that this is a causal effect, rather than a result of selection. This study has therefore been able to highlight the importance of health interventions to support widows.
  • Clemens, Boyle and Popham[2] showed that being unemployed is related to significantly higher odds of death within 7 years relative to being employed, that this is true regardless of other socioeconomic circumstances, and cannot be simply explained by people becoming unemployed because of illness.
  • The SLS has also been used to look at sectarianism, testing views that sectarianism is a continual characteristic of Scottish society. Holligan and Raab's analysis[3] showed that over time there are an increasing number of inter-sectarian partnerships amongst younger people, and that this is true across all of Scotland but the higher proportion of Roman Catholics in the West of Scotland leads to a much higher proportion of inter-sectarian couples there. Furthermore, a high proportion of Roman Catholics continue their religious practice, even when part of a religiously mixed couple. Taken together these findings suggest a breakdown of sectarianism in Scotland between Roman Catholics and others.

Longitudinal data provide a better basis for understanding the process of causality and allow for a better understanding of how outcomes are achieved, as data from different points in the input → process → outcome chain could be linked. This will allow policy makers to understand how a situation has arisen and design more effective policies. For example, linking data about adults' employment status back to school attendance and educational attainment records, and any children's services (such as support to attend a childcare centre and any social work support given to the family), would help understand what long term impact children's services have on people's lives. That understanding could then be used to inform decisions about overall spending, in particular 'preventative spend' whereby investment early on saves money in the long run as it avoids problems, such as poor health, or people getting involved in criminality.

Currently, this kind of evaluation is possible only through relatively costly bespoke pieces of longitudinal research that take a number of years to produce results. Longitudinal studies, which make use of administrative data, also provide material to study the effect of policy interventions enacted in the past as natural experiments.

Benefit 5: Increase the capacity to robustly evaluate the costs, benefits and risks of new health, social, educational and associated programmes

Linkage of clinical trial data with records of subsequent hospitalisations and mortality is a highly efficient way of assessing the long term benefits and risks associated with new medical treatments. Randomised control trials are acknowledged to be the most effective way of ascertaining whether medical treatments are effective, but extended follow-up of those involved in the trials to assess long term safety and effectiveness is difficult and expensive. Record linkage can markedly reduce the costs of long-term monitoring. It can also extend follow-up to determine whether benefits are sustained, and to explore risks of adverse events that occur too rarely within the initial study.

Important Scottish trials that are being extended through record linkage include the West of Scotland Coronary Outcomes Prevention Study[4] and PROSPER trials[5] which helped to establish the effectiveness of cholesterol-lowering drugs for primary prevention of heart disease.

The creation of large-scale biobanks containing biological samples along with demographic, social and economic data, is now one of the key ways in which researchers hope to extend understanding of the process of healthy ageing and the causes of chronic and life-threatening diseases. Scotland is a participant in the UK Biobank[6] and hosts other more specialised biobanks such as the Generation Scotland: Scottish Family Health Study.[7] Participants in these studies are asked to consent to follow-up of events such as the onset of disease or death. Record linkage allows such follow-up to be conducted more completely and economically than would otherwise be possible.

Benefit 6: Data linkage will provide globally unique exemplars of research excellence, enhancing Scotland's reputation and attracting investment and job creation to Scotland

The West of Scotland Coronary Outcomes Prevention Study, PROSPER, the Scottish Health Informatics Programme and the Scottish Longitudinal Study are all recognised across the UK and internationally as producers, enablers and products of high quality research. Academic interest in them brings research funding to Scotland, contributing to the Scottish economy and employment opportunities.

The capacity to link individual hospital records with other sources of data was developed relatively early in Scotland, and the many effective uses of such linkages has gained Scotland an enviable reputation as a centre for clinical, epidemiological and health services research. Expanding on these successes and enabling and facilitating cross-sectoral linkages will increase interest and improve the reputation of Scotland as the most attractive place for doing research in Europe.

International examples

In Western Australia, cross-sectoral data linkage is relatively advanced and all of the benefits outlined above are already being seen. A report in 2008 concluded that because of the Western Australia data linkage system:

"Longitudinal studies have become cheaper and more complete; deletion of duplicate records and correction of data artifacts have enhanced the quality of information assets; data linkage has conserved patient privacy; community machinery necessary for organised responses to health and social problems has been exercised; and the commercial return on research infrastructure investment has exceeded 1000%. Most importantly, there have been unbiased contributions to medical knowledge and identifiable advances in population health arising from the research."[8]

The Western Australia Developmental Pathways Project is linking a range of routinely collected data to investigate the perinatal, familial and early childhood characteristics of young adults who develop mental illness, criminal behaviours or early worklessness. Data are being linked from:

  • Health: Perinatal, Home Visitor/Public Health Nurse, Mental Health and Substance-Abuse Registers
  • Child and family services: including Child Protection records
  • Justice: including Juvenile and Adult Orders, Custodial Data, Police Incidents and Cautioning, Convictions
  • Education: Absenteeism, Suspensions, Standardized Test Results
  • Social welfare: Benefit recipient status through mid-adulthood.

Researchers are using these linked data to assess whether early characteristics can be used to help target sub-populations at risk, for parental and early-years interventions.

Data linkage is similarly advanced in Manitoba (Canada) where the Centre for Health Policy is conducting a Foetal Alcohol Spectrum Disorder (FASD) study using databases analogous to those listed above for the Western Australia Pathways Project plus a register of FASD cases which contains data on hospitalisation, psychological referral, and special education. Researchers are using these linked data to ask: What are the early-life familial and child markers of risk for diagnosed and suspected FASD? How do the burden of illness/disability, labour-market non-participation, and benefit receipt among those affected compare to non-affected children matched on appropriate factors? Can these risk-markers for FASD be used to target families and neighbourhoods at high risk, with preventative, remedial educational and care interventions?

Consultation question 1:

Are there any benefits of data linkage for statistical and research purposes that are not sufficiently described here?

If yes, please describe the further benefits.

Current challenges preventing more effective and efficient data linkages

As is clear from the examples given above, benefits of data linkage are already being seen in Scotland. The benefits could be much greater though, if the capacity of analysts to conduct data linkages for research and statistical purposes safely, securely, legally, ethically and efficiently was increased.

Currently, it is difficult and time-consuming for data linkage projects to get off the ground. There are four main reasons for this:

1. Uncertainty about the legalities and public acceptability of data sharing and linkage.[9]

There is considerable variation in the interpretation of the legal and regulatory environment. Data custodians may often be unsure whether they can legally and appropriately make data available for linkages and so, to be on the safe side, turn down requests for access to data.

This is exacerbated by concern amongst data custodians that the public may not consider sharing of their data acceptable, and would be unlikely to consent to personal data being used for research and statistical purposes if asked.

Personal privacy might therefore be protected, but potentially valuable research findings, which could benefit large parts of the community including the data-subjects themselves, are lost. It is not clear, however, that such a cautious approach is either necessary or sufficient to protect privacy nor that it strikes an appropriate balance of the interests in play.

2. Incomplete data, or data that cannot be linked.

As mentioned above, not all data that exists is of good enough quality to be worth linking. Administrative systems may be out of date, incomplete or used so inconsistently that the data within them is not suitable for statistical or research purposes.

In addition, there are several different "unique identifiers" in use across Scotland, which means it is extremely complex to link datasets from different systems. This is particularly true for different sectors and partly explains why cross-sectoral linkages are so rare. For example, the Community Health Index (CHI) Number and the NHS Registration Number are both used in parts of the health sector; Scottish Candidate Number in parts of the education sector; the Criminal Justice Reference Number and the Armed Forces Number. There are many more examples, and no example of a unique identifier that covers the whole population.

3. Limited capacity for secure exchange and access to data.

For data linkage to occur securely and safely, the technological infrastructure must exist, including the IT hardware and software to collect data securely, to conduct linkages, and to provide routes for analysts to access the linked data.

Routes to access linked data may vary from secure e-mail transfer through to 'safe havens'. Safe havens are stand-alone (not networked) computers within secure premises, which only authorised and trusted researchers can arrange to use to do their analysis.

4. Limited capacity of public sector organisations to analyse and make use of linked data.

The analytical expertise required includes the methodological know-how to conduct linkages, particularly when different unique identifiers are used; to understand the linked dataset, particularly when it has been created using incomplete data; and to effectively analyse linked datasets, particularly longitudinal datasets. None of this is simple and many public sector organisations do not have the resources to invest in analytical training and development of staff to create the capacity needed.

Consultation question 2:

Are there challenges or barriers preventing more effective and efficient data linkages for statistical and research purposes taking place that are not sufficiently described here?

If yes, please describe the challenges or barriers.

Why a data linkage framework is needed

The Data Linkage Framework aims to address these problems, and to widen the range of data linkages that can be carried out, without impinging inappropriately on personal privacy of data subjects.

Delivering a strategic data linkage framework in collaboration with major stakeholders across Scotland will capitalise on the direct benefits outlined above more quickly and effectively than if progress continued piecemeal, sector by sector or dataset by dataset. By co-ordinating what is currently a fragmented landscape of activities under a strategic framework there would be immediate benefits in terms of efficiency through the sharing of ideas, solutions, best practice and methods across different data linking projects.

This framework will allow the statistical system in Scotland to be enhanced and in so doing, make advances on the evidence base - particularly in terms of a joined-up understanding of how outcomes are achieved - that will allow for more informed spending on public services and early interventions that save money in the long run.


Email: Andrew Paterson

Back to top