This report is part of a larger set of publications released as Experimental Statistics and called the 'Scottish Crop Map'. It uses novel modelling techniques to develop a map of agricultural fields in Scotland categorised by likely main crop types or grassland which were grown in 2019. The summary statistics provide production areas of crop values for the main types of crops as well as areas used for grassland.
To produce the map, spatial data collected from satellites and the real world are inputted into a machine learning model to provide probabilities of the most likely crop growing in every field in Scotland. This document explains how the model is developed, data is collected and the summary statistics produced from the model.
The Objectives of the Scottish Crop Map
Currently the official statistics publications for cereal production and harvest statistics relies on industry intelligence for initial estimates and a follow up large scale survey of farms with significant arable farming output to produce final estimates.
The main objective of the modelling used to create the Scottish Crop Map is to reduce the number of surveys and the associated survey burden for farmers. The modelling will allow earlier predictions of crop production values and also improve the initial estimates by reducing the reliance on industry intelligence.
The initial estimates are verified against the anticipated growing values recorded in the agricultural 'Single Application Form' (SAF) submitted by farmers to Scottish Government in their annual application for support payments. However both initial and final estimates are susceptible to human biases and often rely on best estimates rather than recorded data.
The development of the Scottish Crop Map can introduce a more robust method of collecting, verifying and producing estimates as well as a systematic way of collecting data.
Other objectives of the Scottish Crop Map include:
- Develop the data science techniques to use satellite images to produce other agricultural statistics; and
- Explore other uses of the techniques, data and model to develop new metrics on land-use and the positive and negative impacts of agricultural land-use.
Approach to Development
The Scottish Crop Map has been in development for a number of years. It began with a feasibility study which examined the possibility of creating a crop map using machine learning. The feasibility study found that:
- A crop map could be developed using radar images and could be augmented with visible images
- Using field boundaries was critical to the mapping product and that a hexagonal 'mosaic' style map would not create a high enough resolution
- The specifications of the computing requirements to develop a crop map required a higher specification than is available from desktop computing
- The methods required to develop the crop map requires a multi-skilled team including statisticians and Geographic UInformation Systems (GIS) specialists
- The coding required to develop the model was within the grasp of specialist statisticians skills
A further study was developed to look at the models and the data required to conduct the modelling. This study concluded:
- Either a 'neural network' or 'random forest classification' (RFC) were the suitable models for conducting the large scale modelling
- The first iteration of the map should concentrate on the RFC model.
- The model should concentrate on identifying the major crop types as there is not enough information to create a 'training dataset' for minor crops
The study also recommended a series of variables to be trialled in the model and the suitable timeframes to be included
The first model that was trialled was a collaboration with EDINA, the specialist data and analysis service at the University of Edinburgh. This collaboration included trialling a model which had been developed for the Department for Environment, Food, Rural and Agriculture (DEFRA) and used the 'analysis ready dataset' for Sentinel-1 data images for Scotland provided by the Joint National Committee for Conservation (JNCC).
This iteration of the model is regarded as the 'Alpha' Version. This means that the data and the model have been developed to a high enough standard for full deployment, but that the methods and results should be used with caution and future revisions will be implemented in future publications.
The satellite images used in the Scottish Crop Map come from the Sentinel-1 earth observation satellite, which is part of the European Space Agency (ESA) Copernicus Programme. The data from these images was provided by JNCC.
More information can be found following the links below
ESA - ESA - Copernicus
The statistics from the map are designated as 'experimental' because the methods used to assign the crop types are novel and are under review.
Experimental Statistics is a UK Statistics Authority classification for Official Statistics released by an Official Statistics producer, in this case the Scottish Government. This classification is used for statistics which are still in the testing phase and not yet fully developed. The reasons for publishing them ahead of a finalised publication are:
- Consultation – to get informed feedback from potential users
- Acclimatisation – this is an alternative version of the existing series of Cereal Harvest Production Statistics and is released in the current state to help users adapt to the method and presentation of the data
- Use – as an experimental series, these statistics can provide useful information for users, however caution should be exerted when re-using the statistics provided.
The crop map statistics and associated publications are being produced partway through a well-defined development programme. Also, the statistics are new but still subject to testing in terms of their volatility and ability to meet customer needs. Before moving onto the next stages of development the development team are using this publication as an opportunity to align the current product with those needs.
The modelling used to produce these statistics has been developed in conjunction with a wide range of experts and has been developed to a standard where there is a fairly good degree of confidence in the accuracy of the results.
Before these statistics can be released as National or Official Statistics, the process which will follow the publication of the statistics and the consultation period following this publication will be used to help design future iterations and fully validate the measures to the standard expected of National Statistics.
Publishing the data in this form may also help users prepare for future publications for their own uses of the statistics and data.
As a new measure of cereal harvest statistics, they may have a component that has immediate value to users, users should be aware of the statistics' theoretical quality and can use them before all operational testing has been conducted.
Users should be aware that the current format in which the data is presented may be changed in future iterations. Future releases may also have additional information on agricultural crops in Scotland. Improvements to the model in future releases may include data on harvest yields, more timely data as the model is refined and improvements to the mapping and summary statistics that are presented.
National Statistics Designation
Statistics classified as "Experimental Statistics" are only made National Statistics following assessment by the Office for Statistics Regulation.
For this to happen, there are four stages that must be followed:
- Stage 1: self-assessment by business area
- Stage 2: methodological review by Strategy and Standards Directorate methodologists and/or business area methodologist
- Stage 3: recommendation by the Strategy and Standards Directorate, ONS Director General's office and the Statistical Policy Committee (SPC)
- Stage 4: assessment by the Office for Statistics Regulation
For Official Statistics, removing the Experimental Statistics label does not require Office for Statistics Regulation involvement.
This document explains the techniques and data used to develop these statistics and the methods that are being tested. This is a novel approach to developing statistics and instead of relying on traditional methods such as survey forms which are sent to farmers, these statistics rely on satellite images from the European Space Agency Copernicus Programme.
Statisticians, mapping specialists and data scientists have developed and tested a new methodology to interpret the vast data from the satellite images using high-powered computing referred to as machine learning.
The machine learning technique uses statistics and computer algorithms to analyse a dataset of radar backscatter time series for fields in Scotland, where it is known what crop was grown in each field. By understanding key characteristics of this satellite-derived dataset, a computer algorithm was written to then predict the main crop growing in all other agricultural fields (with unknown crop types). These predictions are then used to develop statistics to estimate the total growing area for all main crop types in Scotland.
All information used to develop this map and associated products have been made available online including:
- An interactive map of all crop types in Scotland
- The computing code used to build the crop map (access via GitHub)
- The satellite images used to develop the crop map.
Other data used in the development of the map including the field boundaries used are available on request and are released under licence from Ordnance Survey.