Annex A: Data cleaning and grouping
This annex summarises model parameters and data cleaning decisions and provides some characteristics and averages for the 18 groups used. It is important to note that group formulation and choices make very little difference when it comes to the final results of the CBA. The 18 groups that were formulated are already higher than needed, as is evident by the number of participants in each group, some of which are zero. The number of individuals belonging to each group as well as some of their other characteristics are detailed below. Also note that the analysis was done on different data cleaning decision and different group formations, and the change in the results were extremely minor.
For the purposes of this evaluation, the groups are constructed based on the type of claimant, age (under 25), and the health journey element. The first parameter refers to relationship and parental status and takes three values: single, couple, and lone parent. The second parameter takes two values, whether the claimant is under 25 of age, or 25 of age and older. The third parameter refers to disability status and takes 3 values: No, Yes Limited Capability for Work (LCW), and Yes Limited Capability for Work and Work-Related Activities (LCWRA). Taking all the different combination of the aforementioned characteristics, a total of 18 groups are used to capture the different Universal Credit treatments.
Table A1. Participant characteristics that the DWP SCBA model considers
- Claimant cares for severely disabled person?
- Type of claimant
- Number of children
- Number of children under 14
- Rental costs (monthly)
- Health Journey Element
- Under 25?
- Benefit unit capital
- Non-Wage income
- Hours worked per week
- Wage rate
For each of the groups, the average wage and hours worked are calculated as well as the mode of the number of dependent children. Benefit unit capital and non-wage income are set at 0 as suggested by the model because they are unobserved in the data. Rental costs are estimated using Universal Credit Full-Service guidance for the mean cost of rent in Scotland for different groups.
Regarding data cleaning, the following decisions were undertaken. For claimants who care for a severely disabled person, there are only 11 individuals who care for people with severe disabilities, but an additional 28 new groups would be required to reflect them (many of which will be empty). Therefore, this is kept at 'no'.
For types of claimants, the data is able to distinguish between claimants that are lone parents, and claimants that are parents but not a lone parent (couple). For claimants who do not have children, the data cannot distinguish between single and couple, so they were all coded as single.
For number of children and number of children under 14, the number of dependent children is used as the age of children is not captured in the data.
For rental costs, the average rental cost provided by the DWP SCBA model combining private and social renter is used. The data is from 2017, so CPI was used to get the real values for the year 2021/2022.
For health journey element, the data is derived from the variable ESA WRAG. The data does capture those who have no health journey element. However, it does not completely distinguish between LCW and LCWRA. Those coded as receiving 'other' are considered LCWRA.
For benefit unit capital & non-wage income, they are set at 0 because they are unobserved, as recommended by the DWP SCBA guidance.
For wage and hours worked, some of the cleaning decisions include:
- Choosing the last wage and hours worked for participants with multiple jobs.
- Combining employment and participant data to come up with the most comprehensive hours and wage data.
- Converting yearly wage into hourly wage (by dividing 52 then by 35).
- Replacing wages that are lower than minimum wage with the minimum wage of the respective year (based on job start date).
- Capping hourly wage at £20 and replacing anything higher with the minimum wage of respective year.
- Replacing missing values with minimum wage of respective year.
- Converting wage data to real 2021/2022 wages using the CPI provided by the model.
Finally, the cost data used included forecasts up to the year 2023/2024. It also combines provider costs, support costs, and non-supplier costs. It is disaggregated by participant group, Lot, and year where appropriate. Since the cost was accrued over the course of 6 years, the figures used correspond to real costs for the year 2021/2022 using CPI provided by the model.
|Group||Type of claimant||Number of children||Rental costs||Health Journey Element||Under 25?||Hours worked||Wage rate||Number of participants|
|Group 7||Lone Parent||1||403||No||No||27||8.8||830|
|Group 8||Lone Parent||1||403||No||Yes||27||8.8||75|
|Group 9||Lone Parent||1||403||LCW||No||28||8.4||23|
|Group 10||Lone Parent||1||403||LCW||Yes||28||8.9||1|
|Group 11||Lone Parent||1||403||LCWRA||No||25||8.5||3|
|Group 12||Lone Parent||1||403||LCWRA||Yes||0||0.0||0|
Source: Analysis of management information, Wave 3 survey data, and cost data.