Evaluating evolving and complex programmes: learning and reflections from the child poverty pathfinders' evaluation

This paper presents learning from the evaluation of the child poverty pathfinders in Dundee and Glasgow about evaluating evolving and complex programmes. It is intended to support policy makers, programme delivery teams and evaluators to get the best out of future evaluations of similar initiatives.


Suggestions for future evaluation of complex, place-based initiatives

1. Allow for (much) greater flexibility between formative, exploratory and summative, impact evaluation

A key suggestion based on the experience of the pathfinders’ evaluation is that when projects themselves are continuing to evolve and the timescales for delivery are not completely fixed, there may be a need for a much more flexible approach to the types of evaluation that are feasible and appropriate at different points in time.

Table 2, below, indicates how different evaluation approaches and elements fit with different delivery stages. Evaluation is often conceived as taking place in progressive stages, from evaluability assessment, through formative and developmental evaluation, to impact evaluation. However, this relies on delivery also progressing in a relatively linear way to agreed timescales. It may not apply when, as with the pathfinders, the programme takes a deliberately iterative approach to design and delivery based on ongoing learning. Timings may change, with projects moving from piloting to established delivery more slowly than initially expected, or different elements of a programme may be at different stages at the same time. This needs to be considered and reflected in evaluation approaches; there may be a need to move between, or incorporate elements of, evaluability assessment, formative evaluation and impact evaluation at the same time, or to spend longer than initially expected in the evaluability or formative stages.

Table 2: Evolving evaluation alongside delivery

Stage

Planning

Delivery

Needs assessment

Evaluation

Evaluability assessment

Stage

Learning

Delivery

Small-scale piloting

Evaluation

Developmental / formative evaluation

Stage

Scaling

Delivery

Established delivery

Evaluation

Impact evaluation

Building in and managing this kind of flexibility in practice is much easier said than done, especially when working to fixed evaluation budgets and timetables. In practice, it might involve:

  • Agreeing a much smaller number of core evaluation questions with all partners at the outset to guide activities, but holding off on specifying more detailed impact and value for money evaluation questions while there remains uncertainty over outcomes, timescales or what data will be available and when. This may also include reflecting on which aims are fixed or unlikely to change (for example, the long-term aim of reducing child poverty) and which outcomes might evolve (in terms of how and when they might be achieved) as the project develops.
  • Accepting that it may be necessary to spend longer than initially planned in the formative / process / early evidence of potential impact stage of evaluation, until the design and timescales for a programme have reached a more settled phase where more definitive impact assessment is possible. This may involve managing stakeholder expectations about when it will be possible to say with confidence whether a programme is having the desired impacts, and when formal value for money analysis (i.e. SCBA) is likely to be feasible. Considering at what point it is appropriate to move on from considering a project a ‘pilot’ to more established delivery where impact evaluation should be more feasible is a particularly relevant question when interventions are more iterative.
  • Revisiting outcomes and timelines with all key partners at regular intervals, including before any new evaluation activities are commissioned or agreed, so that the balance of formative and impact evaluation activities at different stages reflects, as far as possible, what is collectively felt to be appropriate and achievable at that point in time. This should arguably include robust discussions around ultimate deadlines – in other words, what is the maximum timescale partners consider reasonable before specific outcomes should be achieved, even if this is very long-term? Otherwise there is a risk that timescales continue to be delayed with each iteration of a programme.
  • Involving local partners in the design of place-based evaluations from the outset. Linked to all the points above is the increased importance of designing evaluations collaboratively, particularly when programmes are continuing to evolve. Involving local partners in discussions around evaluation focus and approach will help identify where there are potential tensions, for example between delivery and evaluation timelines. It will also strengthen partnerships between delivery teams, commissioners and evaluators, making it more likely that such tensions can be navigated effectively. Effective engagement is likely to involve some degree of compromise, recognising that the evidence priorities of commissioners, evaluators and delivery teams may not always dovetail completely.
  • Building more allowance into evaluation budgets and timescales for redesign and adaptation as programmes themselves evolve. While the need for a flexible approach is often specified in calls for evaluation (as indeed it was for the child poverty pathfinders’ evaluation), in practice, adequate time for external evaluators, the Scottish Government, and the projects themselves to navigate this is rarely fully built-in at the outset. A key reflection from this evaluation is that the level of change and adaptation required when projects are intentionally iterative can be extremely resource intensive, and that underestimating this can limit what is subsequently possible.

2. Use theories of change thoughtfully, recognising both their benefits and their limitations when a project is still evolving

Theory-based evaluation approaches are commonly used to evaluate complex programmes. Such approaches have particular value for these interventions in aiming to unpack the various assumptions and complex links underpinning programme design, and to ensure that the wider context of programme development, implementation and impact is considered. Theories of change, which are used to structure and support theory-based evaluations, have therefore become an almost ubiquitous feature of evaluations in the UK.

Their use in evaluation can have many benefits, including:

  • As a tool for project designers, managers and evaluators to create a shared understanding of what a project is trying to achieve, and of why and how it plans to achieve it.
  • Helping evaluators assess whether implementation and outcomes have been delivered as planned, and if not, why this might be (for example, because a particular input was not delivered as expected).
  • Providing a clear structure for data collection, analysis and reporting (particularly important for evaluating complex programmes)

The process of developing theories of change (typically involving workshops with key stakeholders) can also help to highlight areas where there were uncertainties or differences of opinion between stakeholders, that may be important to explore in subsequent evaluation activities.

However, there can also be important limitations to their usefulness:

  • Identifying agreed timelines for short, medium and long-term outcomes can be very challenging, particularly for projects like the pathfinders that are continuing to evolve. There can also be a perception among stakeholders that splitting outcomes into short-, medium- and long-term gives an unrealistically linear picture of how outcomes are achieved in a complex, multi-faceted intervention.
  • The language of ‘theories of change’ can be off-putting for some: an early workshop for the pathfinders evaluation, for example, suggested that other terminology, such as ‘a plan on a page’, might be less intimidating and avoid ‘evaluation-speak’.
  • It can be very difficult to develop a single theory of change over which everyone feels shared ownership, particularly where evaluators have not been involved from the outset. Where projects are long-term and involve multiple partners, there can be a tendency for theories of change to proliferate, creating tensions around which version should be referred to in which context. Theories of change may also need to be updated to ensure they remain accurate and relevant.

There is no straightforward answer to whether, and how, to use theories of change on projects that are still evolving. However, the experience of using them on the pathfinders’ evaluation suggests a number of questions that partners and evaluators may wish to discuss at the outset, when thinking about their use on future, similar projects, as set out below.

  • Why do you want to develop a theory of change for this project? Is it primarily to support theory-based evaluation? Or is it also about project design and delivery? Theories of change can play both roles, but being clear about which role(s) they are expected to play (or whether different roles take precedence at different points) can help determine who needs to be involved in developing them and what they need to include. If it is intended to support both delivery and evaluation, responsibility for developing (and updating) theories of change should sit more formally with a team that combines people in both roles (evaluation/analysis and programme design/delivery).
  • When should theories of change be developed – and revisited? One approach is that they are developed at the outset of a project and remain a point of reference throughout, with only relatively minor amends. However, as discussed, this is unlikely to be feasible when the programme itself is taking a more iterative approach. There may well be a need for more significant amends to the theory of change as the programme develops.

Again, there is no easy answer as to how often theories of change should be reviewed, or the process for doing so. Repeated theory of change workshops can easily lead to over-burden and fatigue, especially when the same people are involved. However, if there is a desire to use theories of change to support delivery and/or theory-based evaluation, funders and partners may wish to consider building in an agreed process for reviewing and updating theories of change at the outset. They may also wish to identify any elements of theories of change that are not open to future amendment (for example, the ultimate, long-term goals of a project).

This could include, for example, a regular (e.g. annual) ‘light touch’ review, where all partners are asked to consider whether the theory of change is still valid or whether changes are needed. This might be supplemented by less frequent workshops where a fuller review is needed (either because ‘light touch’ reviews identify that changes are needed, or because the programme is known to have substantially evolved).

This update process should also include capturing the reasons for any changes to theories of change, since this in itself may reflect important learning about what does and does not work.

  • What are the limits of theories of change? There are competing views on the use of theories of change in evaluation, with some evaluators arguing that they impose too rigid a structure. In reality, it is argued, progress for individuals may not be linear. Meanwhile, while the wider context in which programmes try to effect system change may be referenced in the ‘assumptions and risks’, it is arguable that this underplays the significance of context for the entire enterprise.

The pathfinder external evaluation team’s view is that theories of change remain a useful tool for helping to surface assumptions, articulate aims and target outcomes, and to structure evaluation, even if a programme is continuing to evolve. Rather than dropping them altogether, an alternative approach is to recognise and consider their limitations, and to avoid too rigid an application of them in evaluation. Striking the right balance in terms of the level of detail included in theories of change is important here – over-burdening a ‘plan on a page’ for a project that is expected to evolve is likely to mean both that it will need to be significantly updated as details change, and that there may be less space for the evaluation to explore issues that are not included in the original model. There needs to remain space within evaluation for elements that are not reflected in the original theory of change to emerge. At the same time, having a theory of change at the outset provides a framework to support policy and delivery colleagues in considering the purpose and potential implications of any proposed changes. It also provides a clear mechanism for documenting the rationale for any changes that are made (via an updated theory of change).

3. Consider evaluation data needs – and trade-offs – upfront

Robust impact evaluation is fundamentally dependent on the availability of reliable baseline data. Being able to assess what outcomes the programme has had and for which groups of participants requires data on who has taken part, what their ‘starting point’ was on key outcomes, and where they got to over time. Ideally, this data also needs to be available for a control group[5]. All this is very often easier said than done, especially when, as with the pathfinders, a project is evolving over time and data collection may need to begin before the primary target outcomes are completely settled. Again, there is no easy solution. However, suggestions as to how to support this process in future include:

  • Consider developing a small ‘core’ of quantitative outcome measures and demographic questions for use across Scottish Government child poverty projects. The precise data to be collected for different child poverty projects will likely need to vary depending on their specific aims and activities. However, there is a case for specifying a small number of ‘core’ measures that could be asked of families participating in Scottish Government funded child poverty projects to enable consistent tracking of changes over time, comparison between projects, and comparison with other national data sources.

The ‘break-even’ analysis included in the main report for the phase 2 pathfinder evaluation includes three such suggested measures, covering employment status, self-assessed financial wellbeing, and broader wellbeing, which could serve as a starting point for this (see Appendix A for details of these measures). Further work, involving a range of partners, might be required to test the relevance and feasibility of these and other potential ‘core’ measures. A small number of demographic questions could also be agreed, to facilitate more consistent analysis of the impact of investments on the Scottish Government’s ‘priority family’ groups (see Tackling child poverty priority families overview).

  • Invest in resource for data collection at the outset, including both administrative resource and software tools. Collection of monitoring data is often hampered by lack of both time and appropriate tools to collect reliable, easily analysable data about participants. The Scottish Government and its partners could consider providing both dedicated administrative resource to support data collection and entry on funded projects, and funding for software tools that can support this, as part of the set-up phase of any new projects. This would not only better support evaluation, but provide information likely to be of wider use to delivery teams in understanding their work.
  • Developing a more explicit process for agreeing with programme partners on any changes to outcomes, including additional outcomes, changes to the balance between outcomes, or changes to the expected timelines for outcomes. Where programmes are iterative, new outcomes, or changes in which outcomes are expected when, should arguably be expected (though as noted above, there may also be some outcomes that are non-negotiable, and these should be agreed upfront). Agreeing a process between partners on how such changes are agreed and, importantly, what data would need to be collected to measure them and what this means for evaluating impact, should form an explicit part of the set-up phase for similar projects in the future.
  • Consider secondary data sources early, but do not rely on them to fill all gaps in primary data. Secondary data is widely considered to be underused in research and evaluation; the benefits of being able to access data already being collected for routine purposes in terms of coverage, availability of potential control groups, and minimising the burden of additional data collection on participants are obvious. However, it is also important to recognise that secondary data cannot be a panacea. Data collected for other purposes may not always match the aims of specific projects – for example, it might not cover all relevant outcomes, or may define outcomes in ways that differ from programme definitions in important ways. As discussed, accessing secondary data from government departments and organisations is also time and resource intensive. The experience of the pathfinders’ phase 2 evaluation confirms these challenges. It highlights the importance of early discussions with data controllers and project partners to understand the scope, timelines and costs of secondary data. Ideally, these discussions would start prior to commissioning external evaluators, so that the specification and timelines for the evaluation are based on a realistic timeline for accessing administrative data. Given these challenges are likely to remain, it also underscores the importance of maximising the value and usefulness of primary project monitoring data, alongside the Scottish Government continuing to explore with other national partners how to smooth access to secondary data to support both evaluation and delivery.
  • Involve analysts in discussions about data requirements at the outset. A key challenge in implementing all these suggestions is evaluation capacity to support them. While external evaluators have a key role in this regard, they are rarely in place at the point a project is being conceived, and the same project may often involve different external evaluators at different points. One option for providing more consistent evaluation input would be for the Scottish Government to consider formally embedding an analyst within the planning and delivery teams for similar projects from the outset. They could have an explicit remit to consider the various likely trade-offs and challenges in terms of matching evaluation to policy and delivery priorities, as well as helping to understand and navigate the different evidence needs and capacity constraints of project partners. Again, it is important to reiterate that these trade-offs cannot be avoided; the key is to ensure they are aired and, as far as possible, arrive at an agreed balance. As a key funder of both projects and their evaluations, the Scottish Government arguably has a central role to play in this regard.
  • Build in consents to research and evaluation from the outset. Adding a routine request to be allowed to share contact details for participants with external researchers where needed to support the delivery or evaluation of the programme would expand the evaluation options available (including potentially facilitating surveys), reduce potential bias (by allowing evaluators to select a sample capturing a wide range of characteristics from those who have agreed at the outset to be contacted), and reduce the burden on project delivery teams by removing the need for them to be directly involved in setting-up evaluation interviews. This should be routinely discussed with delivery teams at the start of any projects where evaluation is planned, so that they are clear on the reasons for requesting this and can discuss how best to introduce it to their clients. Potential wording for such requests is shown in Box 1, below.

Box 1: Potential wording for requests to share contact details with external researchers

We, or our partners [SPECIFY], may want to carry out research about [NAME OF PROJECT] in the future, to help us understand how it is working. Would you be okay with us sharing your name, email address, and phone number with approved researchers so they can contact you to ask if you would like to take part in such research? You don't have to say yes, and saying no won't affect your involvement in this project.

4. Consider the full toolkit of approaches to assessing value for money – and continue to develop principles and tools to measure the VfM of system change

As discussed above, the kinds of value for money techniques that are feasible are often limited by the available data, while new approaches may be required to assess the value for money of ‘whole system change’. There may also sometimes be differences of opinion among stakeholders over what constitutes ‘value for money’ and when it might be realistic to expect to see this. This may be a particular challenge in contexts where a project is still evolving. More broadly, different stakeholders may place more or less ‘weight’ on different types of evidence – while some want to see quantifiable evidence of value for money, others may favour a more qualitative exploration of how ‘efficient’ a project is, for example. This underlines the need to consider carefully what types of value for money evaluation are most appropriate, at what stage, and how this will be communicated to key stakeholders.

Questions that could be asked to inform decisions about the type of value for money evaluation activities that are feasible at different points, depending on the resources available, are outlined below. Considering VfM at an early stage is particularly important in terms of securing understanding and consensus around the desirability and feasibility of different types of VfM analysis and putting in place processes to capture both costs and outcomes to support this at later stages.

Early design/development

Questions to consider to inform VfM evaluation include:

  • What would ‘VfM’ look like to different stakeholders? This might include considering what the 4 E’s look like in the context of this project – what would delivering this project efficiently mean, for example?
  • What kind of VfM data do different stakeholders want / need? For what purpose? (e.g. to inform a business case? To feed into future funding decisions? Etc.)
  • What is the ‘value’ the intervention is expected to generate? And for who (participants, partners, society)?
  • What are the key outcomes that would indicate ‘value’ on this basis?
  • How can these be measured?
  • How soon are these outcomes expected to materialise?
  • What are the costs of the project expected to be? What is in and out of scope here (e.g. direct costs vs. indirect costs, like time allocated to a project as part of a general leadership role)?
  • Who is responsible for monitoring / collating cost data (likely to need to involve both delivery/funding and evaluation teams)?

Potential activities to support VfM evaluation include:

  • Workshops / interviews to discuss what ‘VfM’ means to stakeholders
  • If there is a desire for SCBA in the future, activities to enable this that need to be considered at an early stage include:
    • Development of tools to collate costs on ongoing basis (which define what is and is not considered ‘in scope’ in terms of costs)
    • Identification of key outcome measures to support future VfM analysis
    • Support for implementing outcome measures (baseline and follow-up)
    • Break-even modelling to support future SCBA. Depending on how early this took place, the costs included in break-even modelling might need to be based on projected rather than actual costs. However, undertaking the modelling at an early stage would both inform business plans and support planning for future SCBA.

Early delivery

Questions to consider to inform VfM evaluation include:

  • Is the programme delivered in line with the 4Es? (economy, efficiency, equity, effectiveness)
  • Have stakeholders’ understandings of VfM on this project changed?
  • Are the outcomes / timelines discussed early on still relevant? If not, what are the implications for future VfM analysis?

Potential activities to support VfM evaluation include:

  • Interviews / workshops exploring perceptions of 4Es and VfM on the project.
  • Revisiting activities above, especially if outcomes / understanding of VfM have changed / evolved – break-even models may need to be updated/revised, for example, and new outcome measures may need to be collected to support SCBA.
  • Collating costs of delivery – and identifying unexpected / unquantifiable costs.

Established delivery

Questions to consider to inform VfM evaluation include:

  • Has the perceived economy, efficiency and equity of the project changed over time?
  • Is robust baseline and follow-up data available for key outcomes in a way that would allow them to be assigned a monetary value (with reference to standard approaches to calculating this, such as those included in the Manchester CBA work[6])? What are the limits of this, that might impact on conducting/interpreting SCBA?

Potential activities to support VfM evaluation include:

  • Further stakeholder interviews/workshops exploring the 4Es.
  • SCBA (if robust baseline and follow-up data is available), building on the break-even model(s) developed in earlier stages.

Assessing the value for money of system change

As discussed above, there is no established process for assessing the value for money of whole system change. Chapter 10 of the main report considers potential principles and approaches that might support this. While it was written with the Glasgow pathfinder in mind, it is likely to be relevant to evaluations of Public Service Reform (PSR) or system change programmes more widely. Those interested in this area are directed to the more detailed commentary included there, particularly the section setting out a) key principles for assessing the value for money of ‘whole system change’ and b) potential ‘whole system change’ value for money indicators (included as Appendix B to this paper). The four suggested key principles for assessing value for money of ‘whole system change’ are likely to be relevant to considering value for money of system change regardless of the specific policy focus. These are:

  • Consider the ‘value for money’ of system change using different outcome indicators across different time frames, recognising that system change may take a long time to fully achieve, but that early indicators of progress are needed to give confidence in the approach being taken (or to identify that changes in approach are needed).
  • Consider broader cost savings, at short, medium, and longer-term. This again recognises that system change may require investment up front to deliver longer-term savings, and that evaluating short and medium-term impacts on system costs may not be straightforward. For example, demand for some services might be expected to increase in the short-term but, if effective, system change should reduce or stabilise overall costs in the long-term.
  • Consider the value for money of system change vs. the alternative of leaving the system ‘as is’. Value for money is often assessed by comparing the impact of a project with the status quo prior to its implementation. This is arguably not the right approach to take when assessing the value for money of system change. Medium to long-term, demand is forecast to rise very substantially across public services, including potentially costs for people of working age.[7] This highlights the importance of developing a robust ‘quasi-experimental’ design for measuring the impact of system change against a counterfactual of what would have happened in the future without the systems change, rather than comparing to the current situation. This would aid understanding of how to interpret falling costs, should they arise.
  • Use a mixed method approach to measuring benefits. Assessing value for money in the context of system change is likely to require both qualitative and quantitative evidence. Some of the impacts considered essential in driving long-term reform (like culture change) are harder to ‘count’, and there is no obvious way of applying a monetary value to them. However, in the medium to longer-term, quantitative evidence of changes in demand, shifts / changes in budgets and spending, etc. will also be needed as ‘harder’ tests of what system change has achieved.

Contact

Email: social-justice-analysis@gov.scot

Back to top