3. Linked data
3.1 What is linked data?
Linked Data is an approach to exploiting the strengths of the World Wide Web to enable effective large scale discovery, access to and integration of data. The architecture of the web has proven to be extremely scalable and extremely powerful: leading to the enormous collection of information and services now available online. Linked Data is about extending the principles of the web into the domain of structured data.
It incorporates a number of important principles: assign global web-accessible identifiers to real world things like people, places, events; use the mechanisms of the web to provide information about these things via their identifiers; and exploit web links to connect one piece of data to another, to help with discovering new information and comparing or combining it with what you already have.
Sir Tim Berners-Lee described Linked Data via these four principles in his original note on the topic:
- Use URIs as names for things
- Use HTTP URIs so that people can look up those names.
- When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL)
- Include links to other URIs. so that they can discover more things.
To encourage government data owners to make their data available openly and in an accessible way, Berners-Lee developed the 'Five stars of open data', to illustrate the steps an organisation can take, from basic publication through to a fully described machine readable approach.
* Available on the web (whatever format) but with an open licence, to be Open Data
** Available as machine-readable structured data (e.g. Excel instead of image scan of a table)
*** As (2) plus non-proprietary format (e.g. CSV instead of Excel)
**** All the above plus: use open standards from W3C (RDF and SPARQL) to identify things, so that people can point at your stuff
***** All the above, plus: link your data to other people's data to provide context
While in many respects, Linked Data is a set of principles, it is most commonly associated with a particular approach to representing data in a machine readable wafy: namely the 'Resource Description Framework (RDF). This is a set of specifications and standards, developed and maintained by the W3C and its members.
RDF is based around the concept of representing data using 'triples'. Each triple has a subject, a property and an object. The basic principles of RDF are explained and illustrated in the W3C 'RDF Primer' document.
3.2 Why is linked data useful?
Linked data is primarily a data integration technology. Data integration - combining different sources of data together to achieve a particular objective - requires shared identifiers, a shared approach to representing data and its structure and a way of transporting that data.
Because linked data is based around the architecture of the web, and the web is the easiest and most powerful way to distribute open data, linked open data is a particularly powerful concept. Like the web of documents, it is extremely scalable and enables a distributed model of data publication, while providing a way of connecting up and using information from diverse sources.
Note however that linked data can also be applied in a closed 'inside the firewall' environment, where access to the information is controlled. Many large organisations, such as governments or large corporations face significant challenges in exchanging data between different parts of the organisation and linked data is growing in popularity as a tool in tackling this well-known data silo problem.
As explained above, linked data typically uses RDF as a standard way of representing and exchanging data. Few people want to use RDF in its native form, but RDF enables a precise representation of the meaning and structure of data using a standard data model and syntax. By converting different data sources to this lingua franca, they can be shared and combined. It can be converted where required into various other formats that work well with popular software tools, as illustrated in the following diagram.
Figure 1 Use of Linked Open Data as an interchange method
3.3 When is linked data a good option?
Linked data is suitable for all kinds of open data publishing, but requires more effort than simply making a spreadsheet or CSV file available for download. Therefore it is worth concentrating, at least in the early stages of an open data strategy, on those datasets which are of greatest interest or value, or which have a role to play as reference data for many other datasets.
In some cases, providing a file download is not a viable approach to making data available and some form of API access is required:
- when data changes frequently
- when the data files are very large
- when you want to select interconnected data from multiple sources
Linked data is a good solution to the first two of these cases and in effect the only feasible solution for satisfying the third case in a web context.
The fact that linked data is a natural approach to maintaining a list of authoritative identifiers and descriptive information about those identifiers makes it an ideal solution to publishing reference data (for example lists of local authorities, or geographical regions) and for controlled lists (e.g. definitive categories for dividing a population into age groups or ethnicities).
3.4 Examples of linked data in practice
The use of Linked Data is becoming more widespread in the public sector in the UK. Some of the most important examples of linked data publication are:
- Ordnance Survey: http://data.ordnancesurvey.co.uk/ Linked data versions of the OS BoundaryLine, CodePoint Open and 50k Gazetteer data products
- Department for Communities and Local Government: http://opendatacommunities.org Statistics on housing, deprivation, planning, local government finance
- Environment Agency: http://environment.data.gov.uk Measurements of bathing water quality and catchment management data
- Office for National Statistics: http://statistics.data.gov.uk Reference information on statistical geography
- Scottish Environmental Protection Agency: http://data.sepa.org.uk/ Reference information on water bodies and catchments
- Land Registry: http://landregistry.data.gov.uk/ Price paid data for housing transactions
- Companies House: http://data.companieshouse.gov.uk/doc/company/SC337356 (for example) Reference information on registered companies
- British Library: http://bnb.data.bl.uk/ Bibliographic metadata on British Library holdings.
Email: Sara Grainger