Publication - Research and analysis

Evaluation for policy makers - A straightforward guide

Published: 11 December 2018
ISBN: 9781787814028

Evaluation for policy makers. A straightforward, user-friendly and practical guide to evaluation in the policy making cycle.

Supporting documents

Chapter 5: What type of evaluation should I use?

The type of evaluation will be determined by what you and others want to know about your policy, the funds available, the time available and the feasibility of conducting the evaluation. Analysts can advise you and might suggest a range of possible evaluation methods. The table below shows one possible categorisation of the main types of evaluation that are used in the Scottish Government. They can be used in combination, and all contribute in their own right to better policy making. Notice how the type of questions you have will determine the type of evaluation you use. The strengths and limitations of each evaluation approach are also set out in the table.

Table 1 - How policy questions determine the type of evaluation used

Your Questions about the Policy	Type of Evaluation	Strengths	Limitations
Process evaluations
Has the policy been delivered as intended? Has the policy reached the intended target group? What are the lessons learned and what could be improved?	Process evaluation/ case studies	Useful for identifying lessons learned and highlighting areas for improvement. Can be used to inform new policies where outcomes haven't had a chance to materialise.	Doesn't tell you about outcomes or the impact of the policy.
How can we improve working practice or fix know problems on the ground e.g. in a health or education setting?	Action or Participatory research	Often lends itself to small scale studies. It can generate solutions to practical problems and can empower practitioners by getting them to engage with research and the subsequent activities.	Can be very time-consuming and biases can creep in. Difficult to generalise findings to other work contexts.
Impact evaluations
To what extent have short term outcomes been achieved? How many people have made progress from where they started (from a baseline)? Has the policy achieved its own objectives?	Outcome or 'Before and After' evaluation (a data driven approach that uses a baseline measure as a basis for comparison but not a control group)	Useful if you want to know about the achievement of short term outcomes as it tells you how many participants have made progress on these.	You need to wait until outcomes have had a chance to materialise and there a few things to watch out for if you are considering this option. Attributing any pre-post changes to the policy is tricky which is why this design is recommended for shorter term outcomes only e.g. changes in knowledge before and after a test. Even if there is a difference in outcomes before and after the policy or intervention, without a control group we cannot be sure if the difference was caused by the policy or by other 'extraneous' factors. For example, the reoffending rate for participants after a domestic violence intervention may be lower than before simply because some of them were sent to prison after breaching the terms. The way in which participants are policed and willingness to report incidents may also be different before and after the intervention. You also need fairly large sample sizes if you want to test whether 'before and after' differences are statistically significant and also if you want to look at outcomes for subgroups of people e.g. older people or women. The interpretation of 'snap shot' outcome data can be challenging. For example, is 26% of people making progress on a particular outcome a positive finding or negative finding?
Did the policy work as it was expected to work? How well has a policy been delivered? To what extent have the short-medium term outcomes been achieved What aspects of the policy delivered those outcomes? Did certain types of people achieve better outcomes than other types of people?	Logic model evaluation (this approach is based on testing your policy's 'theory of change')	Useful if you want to combine a process evaluation with a basic outcome evaluation. It can tell you how well the policy was delivered and also whether there was progress made on achieving outcomes. It can also measure the extent to which a single policy is making a contribution to longer term outcomes.	Same as an outcome evaluation.
To what extent have the longer term outcomes been achieved? Have those outcomes been achieved because of the intervention or because of other factors? Was the policy initiative better than doing nothing or something else?	Impact evaluation (a randomised control trial or a matched comparison group) or a Natural Experiment (where the experimental groups occur naturally such as men and women)	Useful of you want to know if the policy made a real difference.	Randomised Control Trials are regarded as the 'Gold standard' approach, but can be very difficult to do well in practice - especially if participant numbers are small, and/or a large enough control group is difficult to identify. The characteristics of the control group *must* be matched with the experimental group or any differences in outcomes could be due to unmatched samples. Unless an impact evaluation is combined with a process evaluation, it often doesn't tell you why or how the policy worked, just if it worked.
Economic evaluation
Were the costs invested in the policy justified by the benefits? Has there been an optimal use of resources to produce the outcomes?	Economic evaluation or Value for money assessment	Benefits and costs valued in monetary terms so they can be compared. Can tell you if the impacts are sufficiently valuable to justify further funding. Can answer questions such as: Have you minimised the cost of resources required to deliver services and achieve outcomes? Is the policy cost-effective?	Needs to be based on a robust impact evaluation. Limitations are the same as impact evaluation. It can take a while for benefits to be realised. Some benefits are difficult to value in monetary terms, e.g. social or environmental benefits.

Measuring impact

If your main policy question is 'Does the policy work better than doing nothing or better than what was there before', you should speak to analysts about doing an impact evaluation. This compares a control group (who did not experience the policy) with people who did experience the policy. Establishing causation (i.e. what causes what) is also a good reason to consider an impact evaluation - even if you discover that outcomes have changed over time, it doesn't necessarily mean the policy has caused those changes. This is another reason some analysts suggest a control group - it can determine whether you can attribute outcomes to the policy or whether other factors may have had an influence.

A matched comparison or control group should be discussed very early in the policy development process to see if it is feasible because it requires sufficiently large sample sizes. A Randomised Control Trial is considered to be the 'Gold Standard' and is often used when the consequences of policy failure are serious.

Top Tip: Be aware of self-selection bias

Randomisation can avoid self-selection bias that is introduced when participants choose whether or not to participate in the evaluation. This is also known as volunteer bias. Basically a group that chooses to participate is not equivalent to the group that opts out. This means the results of the evaluation can't be generalised to the wider group of interest. Ask analysts how this can be avoided or dealt with.

Impact evaluations normally compare quantitative (numerical) data collected from the control group with the group that experienced the policy. However if the sample is too small for numbers to be meaningful, you could speak to analysts if it's worth using a control group to compare people's views and attitudes (qualitative data) with the group that experienced the policy. For example, if people who experience support from a policy initiative mention feeling satisfied more often compared with a control group who did not receive support, then it could indicate that your policy is achieving its objectives, even without numerical data. It is fairly unorthodox to use a control group in qualitative research, but it can be a partial solution to provide more meaningful information on how a policy is affecting its target group if numbers are small.

Are control groups ethical?

Some people may strongly object to the idea that some participants in the evaluation will not experience the policy or intervention just so that outcomes can be compared with those who do. This is often the case if the policy or intervention seems beneficial on the face of it. However, it is worth bearing in mind that until the policy is tested we have no way of knowing if it is beneficial or not. It could make no difference at all (a waste of valuable resources) or even be harmful. Some would go further to argue that it is more unethical to roll out an untested policy, especially if the consequences of policy failure were serious.

However, do not use a Randomised Control Trial or a matched comparison group if your sample sizes are too small. Your available sample size is often determined by the design and scale of the policy. An analyst can advise on whether a sample is large enough for Randomised Control Trials. If numbers in each group are too small the difference between your 'policy' group and your control group would have to be unrealistically large for the difference to be statistically significant. If you are unable to do an impact evaluation, this would mean that you cannot answer certain questions about the policy, for example, are the policy outcomes better than doing nothing.

Natural Experiments

A natural experiment is sometimes used if randomisation is considered unethical or difficult to implement. It is where individuals are 'naturally' assigned to the experimental and control conditions but the process governing the exposures arguably resembles random assignment e.g. families with 2 male children or 2 female children. Thus, natural experiments are observational studies and are not controlled in the traditional sense of a randomised experiment.

Measuring outcomes and logic model evaluations

If an impact evaluation is not feasible, then an outcome or logic model evaluation can be very useful alternatives, but you still need sufficient sample sizes to quantify outcomes (see Table 1). The beauty of logic model evaluations is that they combine a process evaluation (that will explore if a policy is working as intended) with an outcome evaluation (measures the extent to which outcomes have been achieved).

Top Tip: Logic model evaluations can test if your policy is achieving its own objectives

Every policy has a 'theory of change' that underpins it. In other words, that doing X will achieve Y. If it is impossible to obtain a control group or you have small sample sizes, discuss whether a logic model evaluation would be useful. This type of evaluation will assess if outcomes emerged as was predicted when the policy was formulated. It will also assess the extent to which your policy is contributing to the achievement of longer term goals by measuring the achievement of associated shorter term objectives.

What if I need to evaluation a new policy?

In the early stages of a policy, a process evaluation (see Table 1) should tell you how well it's being implemented and delivered, which will highlight where improvements may have to be made before the policy is rolled out or given more funding. Only when outcomes or benefits have had a chance to materialise should you consider an outcome, logic model or an impact evaluation - but this should build on data collected in the early stages about implementation and delivery.

Are evaluations all about outcomes?

That's a good question. There has been a lot of focus on outcomes and impact lately, but the fact is, benefits or outcomes are unlikely to materialise unless your policy is evidence-based and is implemented and delivered to a high standard. This is why it is worth conducting a process evaluation or a logic model evaluation to make sure your policy is running as intended. Another issue is efficiency. For example, you're happy to discover that you policy is achieving some outcomes, but then a value for money evaluation (see Table 1) finds that the costs of the policy exceed the benefits achieved. On balance, you might conclude that it might be better to spend those resources on something else.

Are evaluations all about statistics?

No, not only about statistics. Once you've settled on which general method is best suited to your purposes, you can refine your specific questions and decide on the balance between quantitative and qualitative data you need to answer them.

What is meant by quantitative and qualitative?

Data can be quantitative, meaning it is measured as a number, or qualitative, meaning it expresses ideas or opinions in words. For example, suppose you want to know whether an intervention helped people to feel more confident. Here are some examples of approaches you might use:

Quantitative:

Use a validated scale or inventory to score their confidence levels before the intervention and after. Compare the scores over time, and ideally with a control group as well.
Ask participants a question like "Do you feel that the programme has helped to increase your personal confidence? yes/no/unsure" and compare the number of people who select each answer.
This is particularly helpful if you want to be able to quantify how much people's confidence levels changed.

Qualitative:

Ask a survey question like "how do you feel about your self-confidence since participating in the programme?" then analyse the answers people give to identify common themes.
Use semi-structured interviews with programme participants to ask questions about their confidence and explore their answers in more depth follow up questions.
This is particularly helpful if you want to understand how and why people's confidence levels did or did not change, and what they did and didn't find useful about the intervention.

Top Tip: Statistics aren't the be all and end all

It is easy to be seduced by the apparent "certainty" and persuasiveness of statistics, but remember - they are only as good as the methods used to collect them and isolated statistics are sometimes prone to misinterpretation. Talk to your analysts about how data will be collected to make sure possible bias is minimised, and how they should be presented, explained and contextualised.

As explained above, if you want to be confident that a policy or intervention has had an impact on outcomes you need to collect quantitative data, use a control group who didn't experience the policy, compare their outcomes with the group that did experience the policy and find a statistically significant difference.

However, good evaluations are not only about numbers. Often the richness that comes from good quality case studies gives us a much deeper understanding of a situation's context and complexities, which might be even more useful to policy formation and improvement. They can tell us how a policy works or doesn't work.

To balance out their strengths and weaknesses, most evaluations will use a combination of quantitative and qualitative methods. This lets us get a robust descriptive picture through quantitative statistics, and to explore the "how and why" questions with more nuance through richer qualitative data.

What is monitoring data?

Without access to good data and evidence we cannot make informed policy decisions, or invest limited resources where they will have the biggest impact. If we have access to an existing database of relevant information it can save precious time and money as you may not need to commission a full evaluation. Also, a lack of reliable monitoring data can severely hamper standalone evaluations.

There are sometimes opportunities to change or add questions to national surveys. So it is worth considering whether existing data capture tools could ask the type of questions that would shed light on how your policy is performing.

Top Tip: Make the best use of the data we have

'Snapshot' evaluations assess a policy at a given point in time. But analysing relevant information from existing datasets, for example from national survey data or administrative data can be highly cost-effective and you can revisit the data to see if attitudes and behaviours are changing over time.

It is worth being aware of some challenges with existing data. If we can resolve these issues with analysts and data custodians, then the evidence on how policies are working and for whom would be greatly improved.

Some common issues with existing datasets:

Some data custodians experience technical issues in making their data easy to access - for example, if they are still running paper-based systems or older-style databases that have limited functionality.
Data is often collected in local silos so it can be tricky to get a national picture or compare one area with another.
Some data is hard to access especially if we don't own it. Data sharing agreements may need to be drawn up.
Some of the data collected can't answer important policy questions. For example without unique identifiers and joined up systems, it's hard to track people through the education system, the health system or the justice system.

Contact

Email: Social Research

Thanks for your feedback

Information

Chapter 5: What type of evaluation should I use?

Contact