Tackling child poverty pathfinders: evaluability assessment
An evaluability assessment of the Child Poverty Pathfinders in Glasgow and Dundee to inform the development of an evaluation plan for the Pathfinder approach. Includes an evaluability assessment report and accompanying theories of change and initial monitoring framework to support evaluation.
Appendix 2 – Best Practice Literature Review
This document provides a brief overview of the literature reviewed so far on the how programmes similar to the Child Poverty Pathfinders have been evaluated. It will highlight considerations about data collection and methodologies that are relevant to the Evaluability Assessment of the Child Poverty Pathfinders. It will also give detailed examples of how other evaluations have employed each of these methods.
Defining child poverty
There exists several definitions of poverty and the Child Poverty (Scotland) Act 2017 outlined targets relating to relative poverty, absolute poverty, low-income and material deprivation, and persistent poverty. However, for a full evaluation of the impact of a programme on poverty, there needs to be an understanding of not only the prevalence of child poverty, but also the severity and extent of child poverty. To evaluate any effects of a Pathfinder on child poverty, a definition and indicators of child poverty need to be agreed.
A Welsh Government evaluation of child poverty used 23 indicators of child poverty across a range of areas including income poverty, worklessness, skills and qualifications, housing services, and health. Indicators around work security are also important, as reflected in Scottish Government experience of evaluating poverty programmes, as having paid work is often not enough to lift families out of poverty, creating a need to account for their income, working conditions, transport accessibility, food, and fuel poverty. Furthermore, specifically in the case of evaluating holistic services, learning from the Tackling Child Poverty Delivery Plan highlights how that from the start, data needs to be collected around potential longer-term impacts of holistic services addressing poverty, such as health, to evidence these outcomes.
Best practice on evaluating child poverty and holistic service programmes emphasises the need for early collection of baseline data and monitoring and outcome data that includes every aspect of the operation, including inputs, process, outcomes, and long-term effects. To understand the impact of child poverty programmes and their impact on families, evidence highlights the importance of involving those with lived experience to help define the problems they face, and the impact similar programmes have had on their lives.
To inform what data needs to be collected, there needs to be Theories of Change developed for each programme that outline potential impacts and outcomes, so that data can be collected to evidence these. Learning from the Tackling Child Poverty Delivery Plan highlights the need for individual logic models for each of the Pilots in their local context. These models were collated and used to create local monitoring and evaluation plans that allowed the programmes to be evidenced, while also maintaining commonality and a shared language across the Pilots. In regards to ensuring that commonality is maintained across different programmes, the Local Authority Child Poverty Innovation Pilot evaluation, highlighted the importance of having agreed tools for collecting data on participants and families (such as on ethnicity, job status, and gender) and also a common outcome framework, so that the data can be comparable across programmes.
In terms of the amount of amount data collected, the need for practicality and resources needs to be balanced. Learning from the Welsh Child Poverty Strategy Evaluation highlighted the need to manage expectations in terms of findings from evaluations of child poverty strategies, as they found that with their population size there had to be large changes in rates of child poverty – at least 3% every year for 3 years – to be statistically significant.
Overall, the literature and other evaluations highlighted that is important to use a range of qualitative and quantitative methodologies to evaluate child poverty, holistic services, and systems change programmes. Qualitative, quantitative, quasi-experimental methods, and contribution analysis are discussed below.
Qualitative data is helpful for getting a deeper understanding of the impacts and outcomes, as administrative data used in large scale evaluations can often be hard to disaggregate to the local level or understand the impact on small priority groups. For example, these methods can include focus groups, interviews, participant observation, and case studies. A guide written on systems change evaluation by the Center for Evaluation Innovation emphasised how qualitative methods can be a useful tool for understanding the 'how' behind observed outcomes.
To illustrate, a Welsh Government evaluation of the Housing Act (2014), which addressed houselessness, shows the importance of incorporating qualitative methods into policy evaluations. In their evaluation, they interviewed stakeholders, service delivery staff, and service users. They highlighted that these interviews gave crucial insight into the implementation and delivery and also provided a comparative perspective for how these aspects differed from previous programmes. On the other hand, they also interviewed service users, which they noted was essential for bringing the opinions of those with lived experience in houselessness and experience navigating the service into the evaluation.
Additionally, the Local Authority Child Poverty Innovation Pilot evaluation also conducted semi-structured interviews with service delivery staff, stakeholders, partners, children, and families to enrich their evaluation. They noted that the qualitative fieldwork was essential for understanding the depth and complexity in which child poverty can impact an individual and the impact of the Innovation Pilots. The findings from the qualitative fieldwork were used in conjunction with analysis of monitoring and outcome data and a cost effective analysis.
In addition, quantitative methods, including analysis of performance indicators and outcome measures, have also been used in evaluations of the similar programmes. Quantitative analysis can be used to examine data trends and determine if there has been any statistically significant change in performance or outcome indicators.
For example, the evaluation of the Welsh Child Poverty Strategy, tracked 23 indicators of child poverty, comparing the indicators in 2005 to the most recent data collected to 2014. This analysis allowed the evaluators to track changes in the rate of relative poverty, employment, health, and education, and compare the changes seen in Wales to those seen elsewhere in the UK. Similarly, an evaluation of the Welsh Government's Out of Work Service used quantitative data and methods to track performance against the programme targets and monitor the outcomes relating to participants who enter work, obtain qualifications, and health outcomes.
Quasi-experimental approaches are also useful when evaluating social policies. These approaches use a counterfactual that is not created by randomisation (as compared to randomised controlled trial) to evaluate the effect of an intervention. These approaches create a control group that is as similar as possible to the group who received the intervention, based on prior characteristics, so that the differences in the outcomes observed can be attributed to the intervention. Therefore, these approaches are useful in instances where individuals cannot be randomly assigned to treatment or control groups, such as when this would be unethical or logistically impractical. These methods can also be used to conduct retrospective evaluations, but this is subject to the availability of suitable data. Experimental approaches are best used to explore the impact of an intervention in a closed system where the relationship between the intervention and the outcome is linear and direct. Wimbush et al (2012) caution against the inappropriate nature of experimental methodologies for complex programmes.
Propensity score matching
Propensity score matching (PSM) is one quasi-experimental method than can be used in programme evaluations. A propensity score is the likelihood that an individual received the treatment, and it is calculated with observable characteristics which are believed to effect participation. Individuals from both the treatment and control groups are matched on their propensity score, and then the differences in the outcomes can be calculated.
PSM is beneficial because it can control for pre-programme characteristics of the sample for both control and comparison groups and can measure impact of a programme. Additionally, PSM has been mentioned to be suitable to evaluations of antipoverty programmes because it allows one to examine the difference in impacts of the programme based on pre-programme characteristics. However, PSM is most suited to evaluations where large datasets, such as administrative or local authority-level data, are available that includes demographic and outcome data on both participants and non-participants. PSM is also limited because it can only match the participants and non-participants based on observable characteristics.
In the evaluation of the Troubled Families Programme, a local linear regression was used to match those in the control and comparison groups, using a combination of family and individual characteristics. This the data for the PSM was obtained through matching 5 years of administrative data and data provided by the Local Authorities, including the National Impact Study and the Family Progress Data. From this combined data, they were able to compare the outcomes between the comparison and control groups relating to: out-of-work benefits, looked after children or children in need, and instances of adult and juvenile offending. However, it was noted that this approach was limited because the quality of the matches was highly dependent on the quality of data supplied by the Local Authorities, and just over half of Local Authorities were able to supply this data.
Case Study: Trabajar Programme in Argentina
This programme is an anti-poverty programme which started in 1997 in Argentina, it was supported by the government and a loan and technical assistance from the World Bank. The programme provided short-term work for the poor and located social projects in more deprived areas to develop the local community.
The authors of the evaluation chose propensity score matching because pre-intervention data was not available, and a randomised control group was not possible. The main outcome of focus in the evaluation was the participants' income. In their study they used two surveys: one was a national household survey and the other was of programme participants. The participants and non-participants were matched using pre-intervention characteristics and were also matched within their region. They used kernel density estimation techniques to ensure good matches. However, they noted that there is still room for bias because of unobservable characteristics.
To compare the two, the observed distribution of household income and the estimated distribution of the counter-factual income were compared. The authors estimated that the programme resulted in a 15 percentage-point drop in the occurrence of poverty. These outcomes were then further compared based on the profiles of the participants, including difference for families where female or younger members of the family participated in the programme.
Another commonly used quasi-experimental approach used in impact evaluations, is called difference-in-differences (DiD), which uses both before and after and treatment and control comparisons. DiD, allows one to evaluate the effect of a programme by subtracting the after-before difference of the control group from this difference in the control group. To conduct a DiD, you need data for both the treatment and control groups after the intervention, however this can come from panel data or repeated cross-sectional samples. By subtracting the before-after differences, DiD helps to control for non-observable factors which influence the outcomes.
It is also important to note that DiD can also accommodate several treatment and control groups. This means that, for example, DiD can account for staggered start dates between treatment groups. The models which include several time periods also offer the possibility of utilising state-level data, rather than individual-level data.
However, there are several assumptions made in DiD calculations. First, it assumes that there is no spill-over between the control and treatment groups and that the control variables are unaffected by the intervention. Additionally, central to DiD is the parallel trends assumption, which states that the treatment group would have followed the same path as the control group if they did not receive the intervention. If there is data which allows researchers to look at the trends for several periods before the intervention, this can help to show that the DiD is more robust. Another method to check if this assumption is met is to conduct a DiD on the pre-treatment data, if there is no significant effect, it provides support that the assumption is met. The availability of sufficient data to check if assumptions are being met is a significant consideration for using DiD.
An example of a DiD is What Works for Children's Social Care's evaluation of Strengthening Families, Protecting Children, which took a holistic and whole-systems approach to reduce the number of children in care. For their evaluation, they matched local authorities which were part of the Strengthening Families, Protecting Children programme to similar local authorities which did not take part, based on quarterly data on care outcomes for four years before the programme, to create the treatment and control groups. After the local authorities are matched, data was then matched on the individual basis based on: gender, age of children at referral age, ethnicity, disability, free school meal eligibility, asylum-seeking status, and if they child has previously been in care. They then conducted a random effects regression model using generalised least square estimates.
Case Study: Livelihood Empowerment Against Poverty, Ghana
Another of example of evaluating child poverty programmes using both propensity score matching and difference-in-differences comes from the Livelihood Empowerment Against Poverty (LEAP) programme in Ghana. The programme attempts to address child poverty through cash transfers and health insurance, which although the approach differs from the Pathfinder, the evaluation is still a useful example.
The evaluation method used was a longitudinal propensity score matching and they then used DiD estimate the effects of the programme. The evaluators highlighted that they chose this combination of PSM and DiD because DiD can provide one of the strongest estimates of causal impact. Also, using longitudinal data shows the change in the comparison group, which allows for overall changes to be accounted for. For example, this would be important if, over the length of the programme, there was significant overall changes in in conditions which affected both treatment and control groups.
For the PSM, the same survey instruments were also used on household that were eligible but not yet enrolled in the programme, which created the control group, and these families were then matched to families in the treatment group based on data on eligibility criteria using a probit model. The propensity score was then used to in inverse probability weighting in the further statistical analysis. This means that households which were more similar received a greater weight. The outcome measure used was changes in food consumption. In the analysis demographic variables, such as age, gender, household size, and community-level effects were also controlled for.
Regression discontinuity analysis
Another quasi-experimental method is regression discontinuity analysis (RDA). This method is suitable for when membership to the control or treatment group is determined by a singular cut-off on a continuous scale (e.g., living below a certain income). This threshold creates a discontinuity, and allows researchers to draw a comparison between those just above and just below the discontinuity. The requirement to have this defined eligibility cut-off may not make this a suitable method for the evaluation of the Pathfinders, as there has not been suitable defined eligibility criteria to receive support. Additionally, one assumption of the RDA is that this cut-off cannot be the same as other programmes, which may pose complications for evaluation of a child poverty pathfinder, as eligibility for benefits or other support services may be similar and could therefore confound the results.
However, geographic regression discontinuity employs geographic or administrative boundaries (such as postcodes) which split groups into treatment and control. However, this method is not without these limitations, as these boundaries are often shared with other programmes. In the case of Glasgow for example, the eligibility criteria of having a Glasgow postcode, also overlaps with other services offered by that local authority, this would make it difficult to isolate and attribute the outcomes of the Pathfinder. It is also necessary in this approach to track spatial correlations. However, to conduct this, the literature highlights that there needs to be clear geographic data collected for those in and outside the treatment area.
Contribution analysis can be useful to explain outcomes when quasi-experimental or experimental approaches are not available. Contribution analysis relies on robust Theories of Change and can help evidence a programme's influence on specific outcomes. Literature also highlights that in the Scottish context, contribution analysis has been shown to generate public value. It also is useful for evidencing impact in complex landscapes. The Scottish Government has previously used contribution analysis, including in relation to National Outcome 12 and by NHS Scotland.
For a contribution analysis, evidence is gathered about the assumptions made in the Theory of Change, the links, and external factors that could affect outcomes. From this evidence, a contribution story can be synthesised and analysed. The idea is that if programme's implementation followed a Theory of Change that is supported by the evidence gathered, and external factors have been examined to show they did not significantly impact the outcomes, then the contribution of a programme to the outcomes can be shown. However, it should be noted that problems of this approach include inferring causality and uncertainty in the contribution of the programme to the outcomes observed.
Case Study: Keep Well Programme, Scotland
Contribution analysis is becoming more popular and has recently been used to evaluate the Keep Well programme by NHS Scotland. This programme was aimed at the entire Scottish population and provided health checks in an effort to reduce cardiovascular and associated diseases and reduce health inequalities. The programme was able to be tailored and implemented across each health board.
NHS Scotland conducted an evaluation of the programme. The authors noted that quantitative research was limited by not having adequate data available, so contribution analysis was used. The evaluation followed similar steps of a contribution analysis including defining a model of change, gathering evidence, and plausibility reporting. To gather evidence for the impact study, they conducted group interviews, and synthesised existing evaluations and mapping studies. The evaluation found that the programme did not have an impact on the diagnosis rate of cardiovascular disease in Scotland. However, the authors also noted that contribution analysis allowed for an understanding of the impact of the programme in its local context.
There is significant amount of literature on what constitutes systems change and how to evaluate this. First, it is necessary to note that a system can be an entity or a collection of individuals, organisations, institutions. Because of variety of actors and movement, change in a complex system is most likely non-linear and difficult to predict. Therefore, in order to have systems change, "public policy and practice managers should avoid highly specified, over-determined and over-monitored approaches that fight with the natural way that change processes work in complex systems."
Defining successful systems change
There is not one definition of systems change in the literature, but rather a wide range of definitions, theories, frameworks, and methods. The exact definition of systems change and what this entails is different for each system and depending on local context, needs and challenges. Definitions from systems change do highlight a number of key factors which contribute to the success of systems change, and define when systems change is 'completed':
The University of Sheffield and CFE Research defined systems change as the opposite of the 'status quo': "Any change to a system which improves outcomes for the intended beneficiaries of a system, is sustainable in the long-term, and is transformational.". This type of change is different to tokenistic changes, changes that rely on the work of individuals rather than services, and one-off developments.
The Lankelly Chase Foundation and New Philanthropy Capital's guide on Systems Change: A guide to what it is and how to do it highlighted systems change as an intentional process which requires buy-in from involved stakeholders and beneficiaries:
"Systems change [is] an intentional process designed to alter the status quo by shifting the function or structure of an identified system with purposeful interventions. It is a journey which can require radical change in people's attitudes as well as in the ways people work. Systems change aims to bring about lasting change by altering underlying structures and supporting mechanisms which make the system operate in a particular way."
Systems change may also occur as an unintentional process, through shifts in one part of the system which have repercussions on another part, whether positive or negative.
To understand the journey of systems change and its' 'completion' requires an understanding of the characteristics of complex systems, which are "comprised of multiple diverse interacting actors, and non-linear and non-proportional interactions between them". Systems do not operate as siloes, but rather have fluid boundaries which shift and adjust as the system changes. Defining the change in a system therefore requires a thorough understanding and mapping of what the system looks like, which boundaries are used and subsequently who is included in the system and who is not.
A final key factor of successful and sustainable systems change is the inclusion of experts, being people with lived experience. As systems change often ultimately results in a change or improvement to how people with lived experience are supported, these people need to be able to directly influence the design and delivery of systems change: "Experts provide a powerful and authentic voice and unique insights that can challenge assumptions, motivate organisations to do things differently and pinpoint areas for change." 
Evaluating systems change
In terms of evaluating systems change, learning from Revolving Doors emphasised the need for an early definition of what would constitute success, what is included in the system, and what data needs to be collected to evidence this. Literature supports that there are three main areas where systems change can be evidenced: strategic learning; changes in the drivers, behaviours, or actors of the system; and changes in the outcomes of the system. Possible indicators of systems change include changes in the scale, quality, and comprehensiveness of pathways or changes in the way these pathways link different steps or are co-ordinated. Methods of determining if there has been changes in the drivers or behaviours of a system include social network analysis, outcome mapping, and outcome harvesting.
In terms of evaluating the effects of systems change on the outcomes of the programme, methods previously used have been contribution analysis and mixed methods research. However, there are some challenges to systems change evaluations. Principally, because of the nature of systems change, the scope of the evaluation must be bounded, and it is often impossible to establish a counterfactual and attribute outcomes in a systems change evaluation. The literature also highlights that action research is suited to systems change evaluations. This is because conducting action research allows findings to inform the decisions of stakeholders as they are being made. This adaptive research style reflects the need to be flexible when evaluating complex systems.
An example of a systems change evaluation is that of Fulfilling Lives, a holistic support programme that focused on systems change approach to multiple disadvantages. This evaluation used qualitative research including interviews and focus groups with partnership staff and stakeholders. The audio transcripts from these sessions were coded in an Excel framework according to the themes of the evaluation framework.
In terms of economic assessment and evaluations that are several approaches including: cost benefit analysis (CBA), social return on investment, and cost-effective analysis (CEA). CEA is most suited to situations when the full costs can be estimated and compared and attributed to specific outcomes. The Local Authority Child Poverty Innovation Pilot Evaluation used a CEA to compare the cost effectiveness of the different Pilots. However, the authors noted that there were significant problems with the CEA because of a lack of data on outputs and outcomes. Moreover, there were great differences in the approaches taken by each pilot to data collection, making it difficult to compare outcomes, even when data was available. The final report for this programme was only able to provide partial estimates on the cost-effectiveness of the programme due to the lack of data. The case study highlights the need to collect suitable data and use standard tools to measure and report outputs and outcomes across the programme.
Further, in their evaluation of their Child Poverty Strategy, the Welsh Government highlighted how a CBA may be impractical for evaluations of child poverty strategies where monetised impact data is not available.
Case Study: Troubled Families Evaluation
The national evaluation of the Troubled Families Programme conducted a cost benefit analysis based on data from the 124,000 families who joined the programme in 2017/18. The CBA was based on the outputs from the Propensity Score Matching, which included the following outcomes: looked after children, children in need, adult convictions, child convictions, claimant status, and adult employment status. Only the outcomes that were statistically significant between the treatment and control groups in these models were included in the CBA.
The monetisation values used in the CBA came from the New Economy Manchester Unit Cost Database. The evaluation looked the economic case, including all economic and social benefits, to examine the public value of the Troubled Families Programme. It also looked the fiscal case, which estimated the budgetary impacts.
The evidence reviewed so far has highlighted two main challenges in evaluating programmes concerning child poverty, holistic services, and systems change. First, is the problem of attribution. This is partly because these programmes operate in a larger policy environment. The Tackling Child Poverty Delivery Plan 2022-2026, notes that especially for policies concerning poverty in Scotland, it is impossible to isolate the impact of programme as outcomes are inherently tied to macroeconomic conditions in the UK.
Second, another problem present is around data quality and data lag. The time lag in data reporting and publishing and the dynamic nature of poverty, means that the data only represents a snapshot of the situation. Additionally, the lag between collection and publication of large administrative datasets often means that evaluations are using data that is usually over a year old. Additionally, for holistic services, especially those operating across several levels, evaluations rely on locally collected data which can be of variable quality.
The initial key implications that are relevant to the Child Poverty Pathfinders are highlighted below:
Child poverty should be measured using a variety of indicators including relative and income poverty, worklessness, skills and qualifications, housing, and health
The Theories of Change should be used to guide the creation of data collection priorities and tools. These should be continually adapted to local contexts and respond to the experiences of service users
Evaluation methodologies and data collection tools need to be flexible enough to account for unintended outcomes and longer-term impacts that are characteristic of systems change and holistic services. These also should incorporate and respond to learning throughout out the evaluation
To have a comprehensive evaluation that is tailored to local contexts and can help account for unintended outcomes, both qualitative and quantitative methods are important for the evaluation
Quasi-experimental methods can be used to show the impact of a programme using a non-randomised counterfactual. However, the robustness of the analysis is reliant on data quality and availability
Contribution analysis can evidence impact and is useful when experimental or quasi-experimental approaches are not possible
Systems change can be evidenced through examining changes in the drivers, behaviours, or outcomes of a system
The economic impact of programmes can be assessed through CEA, CBA, and SROI
Attribution and data quality are of primary concern for evaluations
There is a problem
Thanks for your feedback