Open Data Consultancy Final Report

This report presents the outcomes of the Open Data Consultancy study that Swirrl IT Limited was commissioned to carry out for the Scottish Government from September to November 2013.

‘Open Data’ is data that is accessible to anyone (usually via the internet), in a machine readable form, free of restriction on use. Adoption of this approach to information sharing is growing in the public sector, with anticipated benefits for transparency, efficiency of government and economic growth.


4. Requirements for effective use of open data

Earlier in this report we noted that the value of open data arises at the point of use. Therefore to create the greatest return on investment for open data publishing, all aspects of the process should be designed with the use of data in mind.

The ideal situation is that all data which can be open (because it does not contain sensitive personal or security related data) is published openly. Within that long-term objective, priority should be given to releasing data for which there is a known or expected demand. This choice can be made based on the knowledge of the data owners and their discussions with users, or via a more formal user engagement process.

It is then important that potential users of the data know that it exists and is available. This can be tackled in various ways and typically a combination of approaches is best.

  • Web search. Many people's first port of call when looking for something is to go to a web search engine. Ensuring that published open data can be found and indexed by search engines will make it easier to find for users
  • Catalogue. Maintaining a central list of available government data gives users a central starting point for a data search. To make such a catalogue searchable requires consistent and up to date metadata for each entry. Manually entering information in a catalogue is time consuming and rarely a top priority for data owners, which means that many dataset catalogues suffer from poor quality and incomplete metadata. Finding ways to automate as much of this process as possible is important.
  • Social awareness raising. Within various user communities, talking to people (in person, at events or via online forums) is an important part of helping them to know what is available. This can be assisted by technical aspects of a system - for example making it easy to provide a web link to a specific dataset or part of a dataset, so that it can be referred to in an email or on social media.

Once the user finds data that is potentially of interest, it is very important that they are able to make an informed decision on whether it is suitable for their chosen purpose. Datasets need to be associated with good quality metadata that explains:

  • Meaning and definitions of terms used in the data
  • Where the data came from and how it was processed
  • Limitations in how the data can be meaningfully applied
  • Quality considerations
  • Whether the data will be maintained in the future, or whether it is an experimental, short-term or one-off publication.

This last point is an important part of enabling 'serious' use of data, whether in the private or public sector. For a business to start making use of data in its operations, or to incorporate open data into a product that it sells to its customers, requires up-front investment. Businesses will only be willing to do that if they believe they will have sufficient time to recoup a return on that investment, so need confidence that the data they are using will still be maintained and available for some period in the future. Similar considerations apply to other public sector users of data: incorporating a particular data source into a process or procedure also takes investment.

That doesn't mean that a data owner must commit to maintaining all of their data in perpetuity. There are often good reasons for short term or experimental publishing of some data. The important thing is to communicate clearly to potential users of the data what the publisher's intentions are in this regard.

The 'Open Data Certificate'[25] introduced recently by the Open Data Institute is a useful tool in evaluating whether a dataset is presented in a way that enables and promotes use.

One of the risks often raised around publishing open data is that users will misunderstand the data and use it in ways that are not justified. This risk cannot be completely eradicated, but 'responsible re-use' can be encouraged by ensuring that the data is associated with good quality and thorough metadata and documentation that helps a data user understand what it means, where it came from and possible limitations they should be aware of. This can be embedded in the data itself or linked to from the data.

There is a spectrum of types of users and a range of ways in which they want to make use of open data. To maximise value of data, it is important to serve the needs of all of these users. Some data publishing activities may choose to focus on one audience type, but it is important to acknowledge the range of audiences and consider how their needs will be met.

Figure 2 presents one way of dividing the overall potential audience into categories, based on their objectives in using data and their familiarity with web technologies. For each audience category, it shows the data presentation approaches that are likely to be most appropriate.

An important aspect of linked data is that, although only a small proportion of data users wants to use it directly, it makes it easy to create other formats and other ways of presenting the data to suit the needs of the rest of the data audience. The hard work of structuring the data and allowing it to be queried and filtered has been done and that greatly simplifies the process of generating the other forms of presentation.

Figure 2 Publishing open data for multiple audiences

Figure 2 Publishing open data for multiple audiences

The general public, bloggers, journalists and researchers most often want to view data in their browser in the form of reports, infographics or simple apps. In some cases they may want to download the underlying data for a closer look in their preferred software tools.

For researchers and data analysts, the ability to download data for further processing is a higher priority. The choice of format for downloads depends on the type of data and its likely uses (and the diagram does not attempt to list all possibilities) but CSV or spreadsheet formats are the most popular.

Web developers generally want to take a selection of the available data and to incorporate that in their own application. They are often working to create a tailored presentation of information for a particular audience. Sometimes a developer will download data and manage it in their own database, but in most cases some form of programmatic interface to the data is preferred.

A subset of developers will use linked data or SPARQL directly, often to act as an intermediary serving the needs of a user group further up the spectrum.

Most users, even the most technical ones, will make their first contact with a published collection of data by reviewing web pages and visualisations to get a feel for what data is available, before digging deeper to see how the data is structured and how they can access it in a machine readable way.

All users of the data are therefore well served by thoughtful design. Once a user finds a website that presents data to them, whether they continue working with it, or abandon it and look elsewhere, is heavily influenced by their initial experience of using the site. Careful consideration of the user experience of the site is important: this incorporates information architecture, navigation processes, page layout and speed of response of the site for common tasks.

Contact

Email: Sara Grainger

Back to top