A Scotland-wide Data Linkage Framework for Statistics and Research: Consultation Analysis

Analysis of consultation responses to a Scottish Government consultation on the aims of the Data Linkage framework and a draft set of guiding principles.

Consultation Question 2: Challenges or barriers

The consultation paper set out a range of challenges to data linkage and asked: Are there challenges or barriers preventing more effective and efficient data linkages for statistical and research purposes taking place that are not sufficiently described here?

The table below shows that around two thirds of those that responded to the yes/no question felt that there were further challenges to those described in the consultation paper.

Are there challenges or barriers preventing more effective and efficient data linkages for statistical and research purposes taking place that are not sufficiently described here?
Type of respondent Yes, there are further challenges No, the challenges have been identified No answer
Data custodian 7 4 0
Data user 20 9 1
Data subject 3 2 0
Multiple categories selected 3 0 4
No selection 1 0 7
Total count 34 15 12

General comments

One respondent took issue with the framing of this section of the consultation paper, feeling that challenges were presented as barriers to overcome rather than constraints to be respected.

There was a view expressed that having a centralised process as outlined in the consultation paper could itself act as a barrier to the development of local linkage work. In particular, this could affect the development of multi-agency linked data on service demand which is often required within relatively short timescales.

Additionally, respondents were keen to establish the impact of a limited number of data custodians agreeing to participate, raising the question of whether this would affect cost efficiency and the ability to produce meaningful results. It was pointed out that if just one data custodian is unwilling to release data for linkage then the linkage fails. A possible solution would be to create mechanisms not solely dependent on trust in particular people or bodies.

Comments relating to specific challenges

The bulk of responses to this consultation question were detailed expansions or variations of the challenges identified in the consultation paper. Comments are gathered below under the descriptions used in the consultation paper.

Challenge 1: Uncertainty about the legalities and public acceptability of data sharing and linkage

Legal/statutory issues A number of respondents highlighted particular challenges around legal and statutory issues. More than one respondent made clear that, in particular, any data linkage must comply with the Data Protection Act.

It was noted that government departments or datasets often have distinct legal constraints which limit what is permissible for the data. One organisation (SCRA) noted concern over their 'ability to share information except where an express or implied statutory power can be identified'. Such a power would need to be identified to enable participation in the framework.

One respondent felt that the consultation document was 'cavalier' when addressing the serious legal challenges of data sharing and linkage and did not pay sufficient heed to the legal barriers in place.

It was reported that uncertainty regarding what is legally permissible has led to various bodies and individuals operating 'inconsistently and over-cautiously'. However, it was also suggested that some of the inconsistency in the level of willingness to share data may be down to cultural issues for different data holders or through a sense of 'territoriality' stemming from the level of resources already invested in developing approaches to data linkage.

Public Acceptability
With regards public acceptability it was felt that the current low levels of public engagement and disproportionate negative media coverage presented a challenge. A number of respondents indentified that an important step would be to explain the purposes of data linkage to the public in order to both raise awareness and address the public's concerns. To this end, one respondent proposed that high quality research outputs could be used to raise awareness of the importance of data collection and the quality of data collected.

Additional views on the public acceptability of data linkage were provided by one respondent who cited research from the Child Medical Records for Safer Medicines (CHIMES). The research indicated that children/young people and parents/guardians have a limited knowledge of how routinely collected healthcare data is currently used but assume that the NHS use health data to improve and safeguard population health. Consent and assent were seen as important in enabling support of data. The research also suggested that concerns increase as the number of linked data sources increases and when there are commercial interests involved.

Privacy and Public Interest
A number of respondents raised points relating to the issue of privacy, with one concerned over the suggestion of a "balance of interests" between personal privacy and public interest. They argued that it would be improper to seek to dismiss valid privacy concerns by appealing to a wider public interest.

It was suggested that the framework could be clearer or more forceful with regards anonymisation post linkage and outline what steps are taken to ensure that individuals cannot be identified from any outputs. The Information Commissioner's Office flagged up their forthcoming Anonymisation Code of Practice which would be relevant to this.

More than one respondent highlighted the role of data custodian as being a particularly challenging one with regards the management of privacy. It was felt that data custodians might require support in order to effectively balance the privacy of the individual citizen with the wider public interest. Several comments highlighted the confusion that exists about how to identify who a data controller is (particularly once data have been shared) and their responsibilities.

Challenge 2: Incomplete data, or data that cannot be linked

Data Consistency
Several respondents identified challenges around data consistency.

Inconsistency of data definition. i.e. the use of one term with several meanings or different terms with the same meanings could present challenges. Equally, inconsistency in the recording of time and place between different datasets was noted as potentially problematic if the datasets are to be linked.

Additionally, changes in coding systems between one period and another could result in discontinuity within the data to be linked, making comparisons over time problematic. Ensuring this comparability over time was argued to require specific attention as it could be compromised as data collections are 'improved'.

Unique Identifiers
One respondent suggested that once indices (for example the Community Health Index and National Insurance Number) are reconciled linking data is reasonably straightforward.

Another respondent argued that a comprehensive data linkage network requires the existence of a common identifier and mappings to operational unique identifiers within data sources. They felt that by placing the mappings with the data sources themselves and designing the mapping operation such that the indexer does not know for sure the subject population of a data source some of the public acceptability challenges could be circumvented.

References to unique identifiers led one respondent to caution that some demographic groups could withdraw cooperation from data gathering systems. It was suggested that some groups might feel threatened by the possibility of a single identifier being created and held centrally - potentially enabling a 'data profile' for each person in Scotland.

Linking Practicalities
A number of respondents commented on various challenges around the practicalities of data linkage. It was argued that there needs to be both robust assumptions on data linkages where information common to multiple individuals is linked, and robust systems for removing duplicates.

Mismatching was highlighted as a potential challenge, in particular the impact this would have on any inferences. This was predicted to become more problematic as more data sets are linked.

On a related point one respondent was keen to establish what the impact of the 'inevitably less than perfect' data linkage methods would be on the ability to produce accurate counts. This was a particular concern given that it is suggested that data linkage will provide Census type information and potentially replace the Census entirely. It was argued that there would be a continued requirement to invest in alternative methods to capture data on 'undocumented and socially excluded populations.'

One of the less-populated local authorities noted that their statistical output often contains numbers which are too small to be published without risk of identifying individuals. National tables therefore often contain gaps for their data, making them of limited use. They felt that it was not clear whether this situation would be improved by data linkage, or whether the small numbers issue would still apply.

Challenge 3: Limited capacity for secure exchange and access to data

One respondent highlighted the challenges stemming from handling what may be large volumes of data, and the software needed for matching and for visualisation of said data. However, they also argued that close collaboration between the statistical and data mining communities may offer some solutions to these challenges.

More than one respondent acknowledged data security as a priority and it was highlighted that any increased concentration of data stemming from data linkage would increase the risk of mishandling or loss of sensitive personal data.

However, one respondent cautioned against sacrificing research functionality as a result of security being overly restrictive. A specific example was given around this where the application of rules can censor access to low but informative counts, even when the potential for deductive disclosure is remote and specific public health importance may attach to low counts.

It was noted that safe havens are not limited to stand-alone computers as suggested in the consultation paper. It would be possible to have access restricted to secure access points with no capacity of extraction of data.

Challenge 4: Limited capacity of public sector organisations to analyse and make use of linked data

A number of respondents reinforced the challenge identified in the consultation document regarding a lack of relevant expertise and knowledge amongst individuals and organisations to be involved in data linkage.

Firstly; respondents described a lack of knowledge regarding the datasets available and their characteristics. It was felt that even before practical challenges were considered it would be necessary to ensure that organisations and individuals know enough about what data could potentially be made available via linkage and how it could be used to answer research questions.

Secondly; there was concern that there may not currently be sufficient skills in data linkage methods distributed amongst relevant organisations and thirdly; there was a perceived lack of the skills required to analyse linked information.

Finally; it was argued that there is currently a lack of awareness of individuals' or organisations' responsibilities as data custodians or users. This can result in data being linked inappropriately or not being shared due to unwarranted risk aversion.

As one respondent noted:

'Speaking from experience of linking data across a number of organisations - there tends to be a general lack of awareness amongst researchers of their individual roles and responsibilities as data custodians/users if linked data are to be made publicly available' (Clare Baker)

With these potential shortcomings in mind, one respondent argued that it would be better to focus on how analysis can be undertaken by people who do have sufficient training by involving universities or government specialists rather than encouraging analysis of sensitive datasets by people who do not have the requisite skills and knowledge.

As well as staff issues, further resourcing issues were identified. It was argued that organisations will need improved systems for good quality, up to date, and safe data and it was suggested that the various financial and training costs borne by these organisations should be acknowledged.

As one respondent noted:

'Local Authorities may find it difficult to justify the cost of focusing on data systems when facing financial constraints. And yet it is the Local Authority data that will [be] needed for many research studies.' (Brigid Daniel)

This point was particularly relevant for smaller organisations and local authorities. Depending on the means of data linkage, the time, cost, and resources required could act as a barrier to data linkage and effectively act to exclude smaller authorities from the process.

One respondent felt that any resource gap could be removed or narrowed through effective partnership working with the private sector.

Commercial activity

More than one respondent raised the issue of commercial involvement in data linkage. One respondent argued that

'More consideration of the role of private enterprises and global corporations in the use of the Scottish population's data would be worthwhile.' (Centre for Population Health Studies, Edinburgh University)

Another suggested that

'There needs to be a mechanism that allows a route to access to data for commercial researchers' (ABPI Scotland)

Differences between respondent categories

In the analysis of the comments relating to challenges, the type of respondent (data user, subject or custodian) was examined. The different types of respondents were found to hold broadly similar views.

However, two of the more substantial comments came from data subjects. These were: that concern over the creation of data profiles could lead some demographic groups to withdraw cooperation from data gathering systems; and that the consultation paper's balance, as well as its 'cavalier' treatment of legal issues, were worrying.

Additionally, respondents who identified themselves as data users were most likely to highlight challenges around the role of data custodians and the inconsistency in organisations releasing data.


