Annex B - Data Quality, Data Processing and Data confidentiality
Data quality: Data processing system
B.1 The Criminal History System (CHS) is an administrative system used to track individuals through the criminal justice system and, as such, was not designed purely for statistical purposes. However, actions and processes have been put in place to ensure that Scottish Government statisticians understand the data.
B.2 Annex A outlines how information is entered on the CHS and that extracts are sent to the Scottish Government from Police Scotland on a monthly basis. The data requirements for these extracts are contained in a joint specification document that has been agreed between Police Scotland and the Scottish Government.
B.3 Monthly extracts are uploaded onto a Scottish Government database at which point validation checks are undertaken to ensure a realistic number of records are added to the database. Checks are also made to ensure values for charges, court locations and disposal type are recognised. If any unexplained patterns or unrecognised codes are identified at the data upload stage, further investigations are undertaken. It may be necessary, at times, to go back to Police Scotland to verify the data.
B.4 Charge codes are the operational codes used to identify the crime or offence and are linked to legislation. New charge codes for crimes and offences under emerging legislation are created by the Crown Office and Procurator Fiscal Service (COPFS) on a monthly basis, and shared with the Scottish Government. When new codes are identified at the data upload stage they are verified and then added to a look-up table of recognised codes.
B.5 The Scottish Government is responsible for mapping each charge code to a crime code, which forms the basis of the crime code classification (see Annex D). There are around 5,300 active charge codes which are mapped to around 400 crime types. This mapping is agreed with individuals from Police Scotland and COPFS. Once any updates and/or amendments have been agreed, the updated charge code list, together with its mapped crime code, is published by the Scottish Government. The latest version of the charge code list can be accessed here.
Data quality: Data processing system update
B.6 When we receive data from the CHS as described above, the monthly extracts mentioned in B.3 are processed into our local database. In preparation for the annual Criminal Proceedings publication, a process is run which collates the year's data into a format that allows us to validate and analyse the information. It is these processes that have been updated, and during the change, it was discovered that there were a number of errors in the existing data processes.
B.7 The effect of these errors ranged from the extremely rare (for example, records of proceedings with more than 99 charges were truncated at 99) to the more common but still rare (records acquired incorrect crime classification in particular circumstances).
B.8 It is difficult to completely enumerate the changes from the overall effect of fixing these errors on the 2017-18 data, as they interact with each other, and there are a small number of new cases included where the sentence date was too late for the data to be captured last year.
Data quality: Validation of CHS data
B.9 During the processing of the 2018-19 data, it was discovered that, due to additional notes being provided on a number of items in the CHS, many of these items were being filtered out automatically, and therefore incorrectly not making into the published tables. This has now been remedied, and of the order of 400 additional data items have been included this year – the majority of these related to Community Payback Orders, although the direction of travel of these figures was not affected.
Data quality: Data validation during production of the statistical bulletin
B.10 As a court proceeding or police/ COPFS non-court disposal can be made up of more than one offence, production of the statistics at 'persons' level requires an intermediary processing stage to be carried out on the CHS data. Where a person is proceeded against for more than one crime or offence in a single proceeding, only the main charge is counted. The main charge is the one receiving the most severe penalty (or disposal) if one or more charges are proved, and is identified using a look-up table which ranks the disposal types in order of severity.
B.11 For example, custody is ranked higher than a monetary fine, so for a proceeding where there was a mixture of these two types of disposal, the main charge counted for this record would be the charge associated with the custody disposal rather than the charge related to the monetary disposal. Once this dataset is created the following types of validation are carried out:
- Automated validation procedures and manual checks to identify any unrealistic data values e.g. long custodial sentences for petty crimes or short sentences for the most serious of crimes. Effort is also made to clean up records for which key information is missing e.g. missing court locations or age/gender of the offender. These are referred back to Police Scotland, Scottish Court and Tribunal Service (SCTS) or COPFS for correction or for explanation of any unusual circumstances.
- Other checks are carried out as necessary changes to the justice system. For example when new legislation is implemented, checks are undertaken to ensure cases are coming through the system at a realistic rate.
- Trends in the statistics are compared against case processing information published by COPFS and management information provided by SCTS to ensure that the volume of court proceedings are consistent. Information is compared by court type (e.g. high court, sheriff court etc.) to identify any differences.
- Further checks are undertaken by crime type, sentence type and other characteristics to identify any errors. As an extra level of assurance, policy experts within the Scottish Government are consulted to identify why any significant changes may have occurred. Any relevant contextual information is then added to the bulletin.
- Similar consultation is undertaken with COPFS, SCTS and Police Scotland wherein results are shared purely for quality assurance purposes. Insight at an operational level provides invaluable feedback and informs whether further investigation of statistical quality is required.
- Further quality assurance and checking is undertaken on the statistics by members of Scottish Government Justice Analytical Services support staff when preparing the tables, such as ensuring the same totals match in different tables. Scottish Government statisticians, who have not been involved in the production process, check the results further and highlight issues that may have gone unnoticed.
Data quality: Double counting
B.12 In recent years, we have carried out much more extensive quality assurance with external agencies. The purpose of this is to ensure the accuracy and quality of the statistics published. COPFS have identified that there may be a small number of court proceedings (often involving multiple charges and of a complex nature) which are being recorded as separate court cases which, in fact, should only be reported as one. The effect of this would be to over-estimate the true number of court proceedings.
B.13 Initial investigations suggest that this affects all crime types, though to varying degrees. Further work will be carried out with a view to quantifying the extent of the problem and identifying whether a change in processing methodology is required.
B.14 Court proceedings are held in public and may be reported on by the media unless the court orders otherwise, for example where children are involved.
While our aim is for the statistics in this bulletin to be sufficiently detailed to allow a high level of practical utility, care has been taken to ensure that it is not possible to identify an individual or organisation and obtain any private information relating to them.
B.16 We have assessed the risk of individuals being identified in the tables in this bulletin and have established that no private information can be identified. Where demographic information is provided, this is done either in wider categories of ages (for example tables 6, 21 and 22) or in numbers per 1,000 population (table 5). This ensures that where there are small numbers, individuals cannot be identified.
B.17 Some of the additional data tables we provide alongside this publication have local authority information related to the offender. In the local authority tables, either demographic information is provided or offence-level information is provided, but not a combination of both. Similar to the main publication tables, demographic information is divided into wider age categories to further restrict the ability to identify individuals.
B.18 In terms of security and confidentiality of the data received from the data suppliers, only a small number of Scottish Government employees in the IT and Justice Analytical Services divisions have access to the datasets outlined in the various stages of processing outlined above. The only personal details received by the Scottish Government in the data extract are those which are essential for the analyses in this bulletin.
B.19 The data presented in this publication are drawn from an administrative IT system. Although care is taken when processing and analysing the data, they are subject to the inaccuracies inherent in any large scale recording system. While the figures shown have been checked as far as practicable, they should be regarded as approximate and not necessarily accurate to the last whole number shown in the tables. They are also updated and quality assured on an on-going basis, and the figures shown here may therefore differ slightly from those published previously. Where substantive revisions have been made to improve the quality of the data, these will be indicated in the footnotes.
B.20 New information based on the postcode of the accused replaced the tables on Criminal Justice Authority Areas (CJA) from the 2015-16 publication onwards. CJAs are groups of local authorities and the data were based on court location rather than the home location of the accused. Users are still able to request information based on the location of the court.
B.21 The CHS is not designed for statistical purposes and is dependent on receiving timely information from Criminal Justice organisations. A pending case on the CHS should be updated in a timely manner but there are occasions when slight delays happen. Recording delays of this sort generally affect High Court disposals more than those of other types of court, as they are the most complex and lengthy trials. Also ,the court may await reports before passing a sentence, so there may be a gap between when the defendant is found guilty and when a sentence is given. We only receive the data when the sentence details are given.
B.22 The figures given in this bulletin reflect the details of court proceedings as recorded on the CHS, that were concluded on or by 31st March 2019, and as provided to the Scottish Government up to the end of August 2019. Any subsequent updates on court disposals made will be incorporated into future bulletins and therefore some figures for 2018-19 (in particular those relating to the High Court) are likely to be subject to minor revisions.
B.23 These recording delays mean that figures for 2018-19 should be considered provisional as future bulletins may provide updates. We estimate that the 2017-18 bulletin contained a small undercount of people convicted in 2017-18, less than 1% of all people convicted.
B.24 No other revisions (beyond the changes described in B6 and B9 above) have been made to the Criminal Proceedings statistics as described below. When revisions are required, they comply with Scotland's Chief Statistician's current revisions policy.