Open Data Consultancy Final Report

This report presents the outcomes of the Open Data Consultancy study that Swirrl IT Limited was commissioned to carry out for the Scottish Government from September to November 2013.

‘Open Data’ is data that is accessible to anyone (usually via the internet), in a machine readable form, free of restriction on use. Adoption of this approach to information sharing is growing in the public sector, with anticipated benefits for transparency, efficiency of government and economic growth.

7. Enabling effective uptake of linked open data across the Scottish public sector

Earlier in this report we have described the potential benefits that can be derived from using open data to make public sector information more accessible. We have described the process and infrastructure involved in creating the Open Data Scotland pilot and discussed a possible approach to applying open data technologies to the needs of internal data management.

If, after assessing the outcomes of this study, the Scottish Government wishes to apply this approach more widely, what steps must it take to enable successful large scale implementation of the open data approach?

Moving to a situation where all or most of the government's non-personal data is available as linked open data is a significant task and can't happen overnight. Any 'big bang' approach is likely to fail. This chapter sets out the most important supporting activities that need to be put in place and a high level vision of how such a transition might work.

Broadly speaking, the necessary activities and changes required for a large scale implementation of linked open data publishing can be grouped into four main categories:

  • Standards
  • Knowledge and skills
  • Tools
  • Culture

We will address each of these in turn.

7.1 Standards

By 'standards', we are referring to the set of technical specifications and procedures that the Scottish Government should mandate, to ensure that open data production is sufficiently consistent. This will enable data from different parts of government to be interoperable, will give users of the data a consistent and predictable experience when using government data and ensure high quality.

Choosing how much to standardise is a matter of judgement. Being too rigid leads to a central monolithic approach that is too slow moving and too inflexible to succeed. Being too flexible means that different publishing initiatives are disjointed and some may be of insufficient quality.

In the view of the authors, the government should aim to enable a new ecosystem of data publishing and consumption. This ecosystem should be based around consistent patterns, but allow a distributed approach. Different parts of the public sector can choose their own software solutions, choose whether to do things in house or to use external suppliers, have a choice of external suppliers, decide to do things independently or to share facilities with other parts of the public sector and so on: as long as they act within a framework that ensures sufficient consistency and interoperability. Setting and applying common standards avoids vendor lock-in and promotes competition amongst suppliers. The distributed decentralised approach spreads the effort, ensures that people who understand the data are directly involved in the process of distributing it and encourages innovation in this developing area.

By definition, adopting a linked data approach brings with it the obligation to meet certain non-proprietary technical standards established by the IETF[38] and W3C, around use of HTTP as a data transfer protocol, RDF as a data representation format, and provision of specific HTTP-based Application Programming Interfaces. This already leads to a high level of interoperability.

The Scottish Government will need to agree an additional layer of standards, to sit on top of these fundamentals.

These should cover:

  • Design of URI patterns
  • A list of standard vocabularies for commonly occurring types of information
  • Sets of URIs and associated reference information for 'core reference data' - the kinds of data that get referred to in many other datasets and are a key point of interconnection for different data. This includes identifiers for geographical areas, time intervals, important government assets, government departments and so on.
  • Guidelines for how public sector publishers should go about extending the core standards to meet their own more specific needs.
  • Metadata standards for describing datasets and the processes that have led to their creation. This could follow or be inspired by the requirements of the Open Data Certificate from ODI, as well as W3C metadata schemes such as DCAT[39] and VOID[40].

At the next level of detail, individual departments or groups may develop their own more specific standards for commonly occurring concepts, for example an ontology to represent measures of economic activity, or a concept scheme for age ranges that can be used across all SNS indicators that have a breakdown by age.

7.2 Knowledge and Skills

The approach we have described here requires some new approaches that are new to most statisticians and other public sector staff working with data. It is important that the government develops a base level of awareness and knowledge of open data and linked data amongst those who work regularly with data; and also a centre of expertise within government that can provide advice to others and support learning.

Although day to day activities can be supported by appropriate tools and assistance from external experts can be used where required, it is important that the overall view of open data policy is internally led. Sufficient knowledge of the technologies and standards must exist within the government to support sound decisions, whether on operational details or during procurement processes.

The range of relevant skills includes:

  • Understanding of the RDF approach to data representation
  • Data modelling for linked data and familiarity with common practices for URI and vocabulary design
  • Programming techniques for data transformations
  • Programming techniques for presenting data on the web

Developing a skill base in this area will take time and planning, but the essential background and education needed is already in place amongst many staff in Analytical Services, Information Services and elsewhere in the Scottish Government.

7.3 Tools

There is already an established and maturing market in open source and proprietary software for triple stores to support the underlying storage and querying of linked data. Presentation of open and linked data on the web tends to rely on commonly available web development and visualisation tools.

The main area where further tool development is required is in supporting the process of transforming data from its native form into RDF - to support steps 2, 3 and 5 of the Linked Data Cookbook approach described in Section 5.1.2.

A variety of software tools exist to support this process but none is very mature and more development is required. In the area of statistical data, many datasets are best represented using the RDF Data Cube approach - this common structure allows narrowing the problem to be tackled in software tool development and so is a natural area to concentrate initial effort.

This is a developing area and sharing challenges, experience and expertise with other public sector open data publishers is to be recommended.

7.4 Culture

The experience of the authors during this open data consultancy project has been that the Scottish Government staff have been enthusiastic to investigate new possibilities that might allow them to improve their service to the public. There is already a culture that understands the growing importance of and opportunities arising from applying digital technology to public services.

Implementing any significantly new approach in a large organisation will always face challenges however. There is the unavoidable inertia that comes with any large organisation with a long history (whether private or public sector). The public sector emphasises reliability and trustworthiness and this naturally leads to an element of risk aversion. When dealing with publishing data that is derived from personal or sensitive data, it is of course very important that new approaches maintain the existing care around proper handling of personal data.

A gradual introduction of new technology, with ongoing reviews of benefits and engagement with the users, allows the culture to evolve alongside the use of technology.

Application of the 'Open Data Engagement'[41] approach proposed by Tim Davies and others assists in developing a new mindset where a primary purpose of the main curators of a collection of data becomes to communicate this effectively to others who may want to use it.

To make this new data ecosystem work will require establishing a culture and goals, where the success of a data curator is measured by how others use their data. The value of open data is created when someone puts it to good use: choices of technology and design of processes should be made with that in mind.


Email: Sara Grainger

Back to top