Genomic medicine strategy 2024 to 2029

Our strategy for transforming genomic medicine across Scotland from 2024 to 2029.

16. Data and Digital Infrastructure

Data and digital infrastructure

Developing national solutions for the management, storage and analysis of genomic data and the implementation of secure and scalable infrastructure is the biggest challenge faced within genomic medicine in Scotland, and one that underpins our ability to deliver on almost every one of our strategic aims. A national, federated model is needed to digitally connect the genomic laboratories, ensure genomic data and reports are available across the health and care system and to support population-level healthcare screening and decision-making.


The success of a genomic medicine service is grounded in the ability to securely collect, test, store, process and analyse samples and data for diagnostic, prognostic and predictive testing. It also needs to support reanalysis as new knowledge develops, to allow reuse across different specialties throughout a patient’s life and be accessible for research as a key driver for innovation. Genomic datasets in humans can be large and, as the national test directories expand and include new testing technologies, the volume and complexity of this data will increase significantly, and at pace.

Where we are now

In keeping with health and social care systems across Scotland, the genomic medicine laboratories have developed and implemented different IT and eHealth solutions over time. The NHS genomic laboratories exist within virtual silos: unable to connect digitally with one another, reliant on local variant repositories and with no national data return or integration with national registries.

Scotland has well-established research centres with high-performance computing capacity but there is an urgent need for national-level storage and services capable of supporting high volume data and analysis. The Scottish Government has committed, through the Digital Health and Care Strategy and the Health and Social Care Data Strategy to improve technology and infrastructure across health and social care services to support recovery and reform.25, 40 It is vital that the digital infrastructure and data solutions developed for genomic medicine are harmonised with the wider digital infrastructure in Scotland.

A pilot project within the National Digital Platform (NDP) MediaStore is in development with an Application Programming Interface (API) to test the transfer of genomic data between laboratories. This has intersected with a wider transformational project around data standardisation across the genomic laboratories, as part of the groundwork for the national Laboratory Information Management System (LIMS) currently under development.

Data and digital infrastructure

Recognising the need for more than just a genomic data repository, the Scottish Government supported the development of the WES service which, using a novel partnership between NHS Lothian, the University of Edinburgh and the EPCC, has shown the benefits for a clinical service of interacting with an academic core of bioinformaticians and access to a software-rich, high-performance computing environment but this service remains within the research space. We know that the value of genomic medicine to individual patients, and for the population as a whole, lies not only in its ability to link with other sources of data at a national level, but in the ability to share genomic intelligence across the UK, Europe and internationally to support clinical care and research and innovation.

Where we want to be

We want to develop a secure, scalable infrastructure to allow large-scale genomic data storage and analysis with the ability to use data across all Health Boards, laboratories and academic centres that is compatible with the wider Scottish Government Digital Strategy and Data Strategy. We want to support coordinated care by ensuring that results are available both across NHS services and a federated data model which allows data to be shared securely across the UK and internationally where appropriate.

Data standardisation and adoption of core data standards

We will continue to support the work of the SSNGM transformation team in standardising the data generated across our genomic laboratories to ensure that they can work effectively as part of a joined-up service, and to support national-level data returns. We will work with the National Data Standards Board and in alignment with international data standards, such as the Mondo Disease Ontology and those developed by the Global Alliance for Genomics and Health (GA4GH) to support data sharing across the UK and beyond. The use of common data models and standards across all data systems which operate across the genomic laboratories are fundamental to the gathering of genomic intelligence and translation into datasets that can be used for research and support policy-making.

Laboratory Information Management System (LIMS)

A nation-wide LIMS is technically complex but the advantages are significant: supporting sample receipt and tracking, standardised test ordering and reporting as well as enforcing data standards by design. A consortium of Health Boards have commissioned a national LIMS and are developing a bespoke genomics module with input across the genomics laboratories, and in collaboration with pathology laboratories.

Development of a secure genomic data repository within the NDP

The National Digital Platform (NDP) is a central component of Scotland’s national digital infrastructure which will allow real-time data and information from health and care records, and the necessary tools and services needed, to be accessed securely and safely. Long-term, the potential offered by the NDP and the applications it can support are promising. A pilot NDP MediaStore project is intended as an initial step and we will work across Scottish Government and with NES to explore the development of a genomics data repository architecture. We will also identify solutions for known technical challenges around the need for high-capacity networks to support data transfer across Health Boards and large-volume data storage capacity. As work in this area progresses, opportunities to share lessons learned should be taken to the mutual benefit of human and pathogen genomic workstreams.

Genomic Variant Repository

The identification of medically-significant variants and the use of genotype-phenotype resources are important tools in making sense of the huge volumes of information that genomic testing generates. Within the genomic laboratories, in keeping with most clinical laboratories, clinical scientists use large genomic data sets, their own local laboratory and clinical data and variant libraries and ad hoc data mining. This can result, however, in differences across laboratories in interpretation of similar genomic abnormalities since complete data sets are not centralised or available across Scotland. Furthermore, our understanding of the relationships between a target and a disease are not always fixed and interpretations can change over time for some genomic abnormalities as genomic intelligence increases. We want to explore the need for a Scottish NHS variant repository.

Software-rich high-performance computing environment

Central to a genomic medicine service infrastructure will be the ability not only to store and share genomic data but the ability to analyse it within a high-performance computing environment and employ different software solutions for data analysis. The ‘last mile’ of analysis is dependent on the identification of raw genomic data, generation of sequencing reads (primary analysis) and alignment (secondary analysis). This is vital for the expansion of testing technologies such as WGS, WES and large NGS panels which generate complex genomic information in greater volumes. We recognise that this component is needed as a matter of urgency and may need to involve commercial applications or novel solutions such as AI/machine learning interacting with, or alongside the NDP structure.

To this end, we will conduct an options appraisal to scope the requirements, delivery timescales and potential for integration as part of wider digital infrastructure. In doing so we need to build on the capability and expertise gained from the work outlined in Case Study 19.1 and the world-class High Performance Computing (HPC) environment managed by the EPCC. Genomic data file formats (for example BAM and VCF file types) are standardised and well-supported across both academic and industry software which will help mitigate the difficulties in moving data between commercial and open-source systems.

  • Data federation: decentralised data model with data governance standards defined centrally and a shared data infrastructure layer to support data sharing and interoperability across organisations and disciplines

Genomic reports and interpretation

While the components outlined above focus on the genomic information generated by laboratories and subsequent analysis, there is also a pressing need to consider access to genomic diagnostic reports and interpretation. Reports are currently stored within individual Health Board Scottish Care Information (SCI) Store systems, frequently in PDF formats that require complicated efforts to share or data-mine the vital clinical information that they contain or which follow reporting standards that are not designed with patients as the intended end users. We need to ensure, in conjunction with the wider SSND, a whole-system approach is taken so that diagnostic information is securely accessible to those who need it across health and care systems, including primary care. We need to consider, as a whole, how these reports and the reporting standards used fit with the wider ambitions to provide access for patients to their own health records.

Data interoperability

Underlying these components is the importance of data interoperability; the ability to integrate genomic data with other health and care data in Scotland at a local, regional and national level, and to link up across the UK and internationally to share genomic intelligence. In Scotland, we have the Community Health Index (CHI) which is used for health care purposes and uniquely identifies a person. The CHI acts as a principal means of recording demographic information for patients within the NHS. A technical change programme is underway to revise the CHI and make it more flexible and functional across the wider health and social care system. Within the development of data solutions and digital infrastructure, there is a need for CHI-linked genomic data, in conjunction with common data models and standards, that can support the gathering of intelligence and translation into datasets for research and integration with other sources of data to allow real world evidence studies of the value of different genomic technologies and tests across Scotland.

Working to address racialised inequalities within data systems

The Expert Reference Group on COVID-19 and Ethnicity established by the Scottish Government in 2020 to consider the impact of COVID-19 on minority ethnic communities considered evidence of systemic data inadequacies, risks and harms within Scotland.[22] There is also a known lack of diversity within genomic data and variant repositories and structural biases within many new tools and technologies. We will work with the Anti-Racism Observatory to ensure that infrastructure and processes developed to support genomic medicine in Scotland do not perpetuate racialised systemic inequity and that there are clear guidelines on how race and ethnicity data is collected and used responsibly as risk markers rather than risk factors.

Data for research, innovation and service improvement

We will also find ways to capture data on genomic medicine service processes and clinical outcomes to assess their reach, effectiveness, adoption, implementation and maintenance, in conjunction with work around a national genomic data return as detailed below.

What will this mean for people of Scotland?

Having a fit-for-purpose data infrastructure will ensure that people’s genomic data is stored safely and securely. It will also enable scientists and clinicians to get the most out of working with data to improve people’s lives, with interoperability across systems to enable greater collaboration.



Back to top