2. Open data overview
2.1 What is open data?
Open data in a government context can be defined as data that is:
- accessible (ideally via the internet) at no more than the cost of reproduction, without limitations based on user identity or intent;
- in a digital, machine readable format for interoperation with other data; and
- free of restriction on use or redistribution in its licensing conditions.
2.2 Why is open data useful?
The prime objective of open data is to get the right information to the people who need or want it, in a form that allows them to use it. The promise of open data is to achieve this quickly and cheaply.
Systematic use of open data is relatively new and the availability of quantitative evidence on cost and impact is still limited. However, a number of early success stories are discussed below.
The audience for government open data is broad: it includes the government itself, other public sector organisations (such as local authorities, public bodies), businesses who use public sector data in their operations, businesses that add value to public sector data and resell it, academic researchers, charities and other civil society organisations, as well as individual members of the public.
The uses for such data are diverse and will vary across these groups. The potential benefits they derive from using the data are often grouped into three broad categories:
1. transparency and accountability of government
2. efficiency and effectiveness of the public sector
3. supporting economic growth
We will consider each of these in turn.
1. Transparency and accountability
The government is elected by the people and the activities of government are funded by taxpayers: it is therefore a reasonable expectation that the citizens of a country are able to see what their government is doing and how their money is being spent.
The Open Government Partnership, of which the UK is a member, is founded on the basis that more transparent government contributes significantly to the goal of "improving the quality of governance, as well as the quality of services that citizens receive".
2. Efficiency and effectiveness of the public sector
Information sharing via open data is not an end in itself, but rather something that can assist the essential functions of public sector organisations. Making relevant data more accessible and usable has the potential to assist the government to:
- know whether policies are working
- design new policies or adapt and improve existing ones
- target investment
- improve targeting and delivery of services
- enable collaboration between organisations in service delivery
- improve data quality by increasing data visibility and enabling third parties to contribute improvements.
"Scotland's Digital Future: Delivery of Public Services" describes a vision for Scotland where
"digital technology provides a foundation for innovative, integrated public services that cross organisational boundaries and deliver to those in most need, and for services for business that promote growth"
The foreword to this report also illustrates the central role of data in digital delivery of public services. It is highly significant that this vision is one where services cross organisational boundaries. Not only do different parts of central government need to share information effectively, but information must also be shared with local government, health services and other public sector organisations - and often also with charities, civil society groups, businesses and individuals.
Recognising the central importance of data, as opposed to the applications or systems that hold it, is a necessary step in escaping from the data silo problems all large organisations face. Making data interoperable and re-usable offers the possibility of large cost savings by reducing search costs, reducing duplication, reducing data processing time and reducing mistakes.
3. Supporting economic growth
Alongside transparency and efficiency of government, the other central objective of the open government data movement is to encourage economic growth. The 2013 Shakespeare Review of Public Sector Information commissioned a Market Assessment carried out by Deloitte. The study analyses the ways in which public sector information is used by different market segments and estimates the economic value arising from use of the information. The study concludes that UK-wide the direct benefits are around £1.8 billion per year and the indirect benefits are around £5 billion per year. A number of barriers to effective use of data are identified - if these are overcome, the overall benefits could be significantly higher.
While the details of the study are informative, the overall implication is a simple one: enabling greater use of government data, both within the public sector and by the private sector will bring important economic benefits. The message for a public sector data owner boils down to: how can I help more people use my data?
2.3 Examples of open data publishing and exploitation
There is a growing portfolio of open data available in Scotland and across the UK. The Scottish Government and other public sector organisations in Scotland publish a large quantity of open data, notably the Scottish Neighbourhood Statistics, but also significant other resources on, for example, education, health and the environment.
The UK government's main open data site http://data.gov.uk is a central resource for public sector open data in the UK, listing thousands of openly available datasets. This includes data about Scotland, particularly spatial data related to the INSPIRE directive and on non-devolved issues.
There is also a growing collection of local government initiatives around open data, for example:
- The Glasgow Future City project is investigating a number of innovative uses of open data in the context of city management
- Councils including open datasets on their websites, for example Aberdeen City
- Dedicated city data websites, eg London and Manchester
- Local authority information system or 'observatory' sites, such as KnowFife and the Hampshire Hub
Many of these open data publishing initiatives are based around a catalogue of datasets, providing a list of available data, with descriptive information and a link to where the data can be downloaded. Other initiatives have taken a richer approach using linked data and other types of Application Programming Interface (API) to enable data to be exploited by other applications. Examples include:
- The Office for National Statistics 'NOMIS' API
- The Department for Communities and Local Government 'Open Data Communities' site
- Environment Agency data on bathing waters
- Transport for London live data feeds
It is easier to gather data on the supply of open data than it is to assess systematically how it has been effectively used. Usage data tends to be anecdotal and based on isolated representative examples. Nonetheless, it can easily be seen how these examples could be replicated more broadly. Many examples are of small impact in themselves, but uses of data can be so broad and so diverse that the total impact is significant.
Comparison of spending data between local authorities has helped some to identify where better deals can be had, sometimes leading to shared procurement approaches.
Release of open data on NHS General Practice prescriptions in England enabled Prescribing Analytics to compare the spending on branded versus generic statins across England, concluding that if best practices were applied in all GP practices, around £200 million per year could be saved on the costs of these drugs.
This analysis could have been carried out by the relevant authorities without use of open data, but it did not take place until the data was available openly: or at least if it did, it wasn't acted upon. Many aspects of open data involve doing existing things better or more cheaply rather than doing things that were impossible before. Most organisations suffer from pressure on time and resources. Lowering the barrier to information access can mean that opportunities that were previously ignored can now be exploited.
In addition to use of open data for large scale cost analysis, there are numerous examples of small scale efficiency improvements and new opportunities created by the existence of high quality open data. The impact in each individual case is small, but if replicated many times, the overall effect can be significant.
Transport for London (TfL) offers a number of APIs to provide live access to bus and train locations and arrival times. This has led to the creation of numerous travel information apps, assisting users in journey planning and avoiding known disruptions. TfL has estimated the impact of the accumulated small savings in journey and waiting times that better travel information has created. They have concluded that this leads to a return on investment of around 58:1 for their open data initiatives.
Another important example is the widespread use of online mapping tools, such as Google Maps (which in UK makes heavy use of open data from the Ordnance Survey). The existence of instantly accessible high quality mapping and photography has revolutionised journey planning for businesses and individuals. The time savings may amount to only a few minutes per journey, but applied hundreds of millions of times, this amounts to a notable increase in productivity.
An important consideration, illustrated by some of these examples, is the role of intermediaries in serving the needs of small niche markets. The owner or publisher of open data cannot predict or directly serve the needs of all users of the data. But by making the data openly available in a re-usable form, other business or organisations that understand the needs of a particular market or community can process or combine selected data in order to meet those needs. It underlines the importance in public sector information publishing of allowing commercial re-use of data. By allowing businesses to add value to the data for their customers, it amplifies the overall benefits of making the data available.
2.4 Costs and risks
While open data has the potential to bring many benefits, there are of course also costs and risks associated with it.
Sharing data in a way that encourages and enables others to use it requires effort on the part of the data publisher: to organise and annotate data and present it in a highly usable form. The process of publishing 5-star open data is explained in detail in Chapter 5 in the context of the pilot. The main additional costs associated with open data publishing arise from the time of well qualified staff in organizing, transforming and presenting their data. Depending on available in-house skills, it may be necessary to bring in external experts to assist.
Publishing open data will lead to a certain amount of cost in software licences and web hosting, but these costs are likely to be relatively modest in comparison with cost of staff time.
Implementing open data publishing at a significant scale will also require a training and skills development programme, which will also have a cost.
Most of the costs in this process lie with the publisher and many of the benefits come to the end user of the data. Since the essence of open data is not to charge for it, this imbalance can sometimes be seen as an obstacle. In a public sector context however, it is important to note that the biggest users of public sector data are often other parts of the public sector. Also, government has a role in promoting the success of the private sector economy, so investment in assisting the public sector as a whole and the economy as a whole is a legitimate and justified approach. Hence costs incurred by the group publishing the data can be offset against overall reduction in the cost of discovery, distribution and exploitation of data across the public sector.
There can be risks associated with open data publication, arising primarily from making data available to people who did not previously have access to it, and having limited influence on what they do with it. Commonly raised risks include:
- there could be mistakes in preparing the data
- users of the data misunderstand what it means or how it can be used
- data quality and defining what purposes the data is valid
- lack of control on how users will use the data
- managing user expectations
- creating more work for data producers, especially if data needs to be published to a strict schedule
- resulting analyses may have unintended political or policy consequences
- protecting interests and privacy of individuals
These are legitimate concerns and the approach to data publication should be designed in order to mitigate these risks. An important concept is the idea of enabling and promoting 'responsible re-use' of data. In essence this involves ensuring that the provenance, meaning and possible limitations of data are clearly documented and easily accessible to potential users of the data. In most cases, mis-representation of data by users is not malicious and the risk of it can be reduced by clear communication. In cases of deliberate mis-representation of data, the data owner can point interested parties to explanatory material and ensure that sufficient context is available to allow erroneous claims to be corrected or argued against.