Is the dataset fit for purpose?
It is important that to become familiar with your data and use it in the correct manner. There will always be occasions where statistics are either misused or quoted out of turn, but in striving to “meet customer needs by producing relevant and reliable information, analysis and advice, free from any political interference” the chances of this happening are considerably reduced.
In trying to carry out a specific type of analysis it is important to ensure that the data collected is suitable for the piece of analysis for which it is intended. The following points should help ensure that data we publish is not vulnerable to being misinterpreted:
Consult with colleagues, stakeholders, users etc. to gauge the need and practicality of the piece of analysis planned. It could be the case that, without realising it, their needs can actually be met with data already collected.
Scrutinise the data currently collected in order to be aware what it can and cannot be used for. Key questions to ask when studying data include:
- What is the nature of the data? i.e. is it performance measuring, context/outcome based etc. The Scottish Policing Performance Framework is a good example of statistical indicators being grouped in terms of the information they capture.
- The source of the data – how has it been collated?
- Is the dataset complete? e.g. In Justice, analysis at a Scottish level requires data from all 8 police forces. If a complete dataset cannot be obtained what is the best method for estimating/forecasting missing data? If the data is weighted, perhaps focus should be placed on data quality for the components with the largest weights?
- What timescale does the data cover and has the collection process been consistent throughout this period?
- Will any outliers effect/skew the analysis being carried out, and if so is it appropriate to remove outliers etc?
- What does the data plot show? Would a transformation of the data be important to enable you to carry out the analysis ? Are there any seasonal patterns that will require seasonal adjustment/smoothing to be carried out?
- Is the sample size big enough to give you meaningful results (especially when dataset is broken down into sub-groups).
If analysis requires data from more than one source it is important to make sure that these are compatible with one another. For example the data should cover the same time period.
Is the data source conducive to numerical output (e.g. how should responses to questions in surveys be presented?)
If the current data collection process is not conducive to the type of analysis planned then it is important to consult with your data provider (see Steps 4 and 5) to discuss the possibility of collecting suitable data that meet your needs.
There is a problem
Thanks for your feedback