Appendix D: Evaluating the quality of VR to IDBR matches
This appendix provides more detail on the outcome of the Scottish Government's matching of properties in the non-domestic rates data. As we outlined in the body of the report, we were able to compare the groupings of properties into enterprises in the non-domestic rates data with the matching of properties to enterprises in the IDBR. To do so, we primarily compared the resulting site count from each – that is, the number of properties attributed to a business in the non-domestic rates data, and the number attributed to an enterprise in the IDBR.
This first required us to define a measure of consistency between the two with which to make such comparisons. We were then also to evaluate the extent to which mismatches would result in differences in estimates of total business RV between the two ways of grouping properties. Throughout, we use businesses to refer to groupings of properties in the non-domestic rates data, and enterprises to refer to those from matching to the IDBR.
D.1 Defining consistency of matches within enterprises
We categorise a property as a having a consistent site count between the business and enterprise to which it was attributed in the following way:
Its site count matches the average site count of all properties within an enterprise and the total number of properties matched to the enterprise.
This is not exhaustive, and can miss special cases in which, for example:
- An enterprise has 5 properties, the site counts average to 5 but are different, and one property has a site count of 5
- An enterprise has 3 total properties, site counts average to 3 (for example 5+1+3), and one property has a site count of 3
There is also a way in which this is overly strict for multi-sites. To be classified as having a good match between Valuation Roll and IDBR site counts, it implicitly requires:
1. All properties were used in the matching exercise by SG (however the sample used was restricted)
2. All matches were adequate and retained (only those with a match score above 84% were kept)
3. All matches regardless of business/enterprise size were to the correct enterprise.
For example, there could be properties that:
- have a site count of, for example, 36 but have only had n < 36 properties matched and retained, meaning site count does equal their average site count, but not their total number of properties;
- have a site count of 36, all 36 properties are matched, but there is 1 additional matched property, meaning their site count does not equal the average or total property counts only marginally;
- or have some combination of both.
We therefore also relax our definition of consistency for properties to require:
Its site count matches the average site count of all properties within an enterprise and for all properties their site count multiplied by the number of properties in the enterprise is equal to the sum of all site counts within the enterprise.
This is again not entirely exhaustive, but it captures the scenarios in which, for example:
- as above, an enterprise has 3 total properties, site counts average to 3 (for example 5+1+3), and one property has a site count of 3;
- an enterprise has 5 properties in the non-domestic rates data, 2 properties are excluded from the matching exercise, but the three remaining properties all match to the same enterprise. In the initial definition of consistency these properties would be classed as inconsistent.
Where necessary, we discuss the implications of relaxing our definition of consistency. Unfortunately, for the reasons surrounding data access outlined in the report, we did not have the opportunity to extend our analysis of multi-site matches any further than is outlined here.
D.2 Site count versus property count within matched enterprises
Overall, 35,597 (57.5%) of the matched properties were classified as single-site businesses in the Valuation Roll. The remaining 42.5% had a site count greater than one, and so were considered as being part of multi-site businesses.
In total, there were 26,220 (42%) matched properties whose site were not consistent – for example, 5,853 (16.5% ) of matched single-site properties classed as single-site businesses were matched to an enterprise that was also matched to other properties, suggesting they were part of a larger, multi-site enterprise. Conversely, there were 5,405 properties classed as being in multi-site businesses that were only matched to one enterprise.
We do not know with any certainty the extent to which mismatches based on our strict definition of consistency arise due to the omission of properties from the matching process. From our initial exploration, 9,644 of the mismatched have a site count equal to the average site count, but not the total number of properties. This is 3,132 additional enterprises. Because our analysis was cut short, we do not know if these properties satisfy the weaker definition of consistency above, although it is likely that a sizeable portion do – the only reason they would not is if there were examples of different site counts within enterprises averaging to the site count of a property (as discussed above). This would mean a substantial increase in the number of multi-site properties classed as consistently matched.
The remaining 14,962 properties are multi-sites matched to enterprises with a total property count which is not equal to their site count from the valuation roll, meaning a total to 20,367 (77.5%) of the 26,305 multi-sites matched had an inconsistent enterprise property count.
Only 35,682 (57%) of the properties matched therefore had a site count that corresponded with the property count in their enterprise grouping – unsurprisingly given the uncertainty around multi-site groupings, 29,744 (83.4%) of these were single-sites (see the last section on the definition of consistency/mismatch).
D.3 Implications of the inconsistencies in matching
The purpose of the econometric analysis is to evaluate the effect of SBBS eligibility on enterprise outcomes. The 61,902 properties successfully matched translate to 42,663 unique enterprises. Of these, 32,173 (75.4%) comprise of properties whose site count matches the property count within the enterprise, 29,744 (92.5%) of which were single-property enterprises (and so single-site properties in the VR since only these properties can be consistently matched to a single-property enterprise).
We would like to maximise the likelihood that we use the appropriate enterprise level information from the Valuation Roll. For example, it is desirable to ensure that the site count (for eligibility), business RV, and SBBS levels we use to evaluate enterprise level outcomes actually reflect those of the enterprise.
This is difficult to do, however, given the uncertainty surrounding groupings of properties into enterprises (there are now actually two sources of uncertainty since there is the initial uncertainty from grouping by addresses plus the additional uncertainty added by matching to the IDBR by addresses). To illustrate the difficulty in obtaining enterprise level Valuation Roll information from multi-property enterprises, we can examine the extent to which RV and SBBS relief levels varies within the enterprise groupings.
We have already outlined above that 16.5% (5,853 of 35,597) and 77.5% (20,367 of 26,305) of single- and multi-site properties respectively were matched to enterprises with inconsistent property counts. The median difference between a property's site count and the total number of properties in its matched enterprise is extremely low. The mean number is 48.5, which is driven by a small number of extremely large multi-site/property businesses/enterprises having site counts that are inconsistent with the property count within their matched enterprise (the standard deviation is 166.9).
For total business/enterprise RV, there are 35,803 properties with a total business RV that matches the sum of the RVs of the individual properties within its enterprise. This is of course true for all of the 29,744 properties classified as single-site that were also matched to an enterprise with only one property. There are also 92 that were classed as multi-site in the Valuation Roll but have been matched to a single-property enterprise. It is not clear why the business RV of these properties matches their property RV, and so their enterprise RV. The remaining 5,967 are properties that were grouped together in both the Valuation Roll and when matching to IDBR enterprises.
This amounts to 32,269 (29,744 single and 2,525 multi-property) enterprises with an enterprise RV which is consistent with their total business RV calculated from the VR: 76.4% of the total number of enterprises. For the enterprises comprising of properties that did not have a consistent amount for RV when calculated from the VR versus the IDBR (26,099), the median average difference in the two RVs within the enterprise was below £6,000. As in the case of property counts above, this is much lower than the mean difference of £3,552,514. This difference is driven by discrepancies among properties in large enterprises.
Given that a complete analysis of the effect of SBBS on enterprise outcomes would focus on enterprises with at most a total RV of £35,000 (the multi-site eligibility threshold), this variation in business RV within multi-property enterprises is large. In fact, almost one-third of the overall variation in business RV is within as opposed to between enterprises. Again, we must stress that comparing total RV as estimated from the NDR data to that among properties within a matched enterprise does not offer a perfect assessment of the match quality, even if all properties were included in the matching exercise. This is because, again, total business RV in the NDR data is estimated by the Scottish Government.
There are also inconsistencies in the SBBS relief levels – which, in theory, should be accurate at the property level – among multi-property enterprises. Here, we note that businesses should not be in receipt of different rates of relief on different properties. The eligibility rules are such that if a business's total RV is less than (or equal to) £15,000, it receives 100% relief on all properties. If its total RV is greater than £15,000 but at most £35,000, however, then it receives 25% relief on all properties with an RV of at most £18,000. Again, however, there is substantial variation in the relief levels of properties within an enterprise. We define consistent relief levels within a property in an analogous manner to the way in which we defined consistency of site count and RV: if all properties within an enterprise receive the same level of relief, each property's relief level should match the average relief level within the enterprise.
As was the case with RV within enterprises, the relief level of properties that were classed as single-site and matched to a single-property enterprise cannot be inconsistent. For the multi-property enterprises, there were a total of 16,882 (64%) properties whose relief level was consistent within its matched enterprise. This amounted to 5,932 multi-property enterprises with some level of inconsistency, 79% of the total number of such enterprises.
This is a relatively high level of consistency given the extent of the uncertainty around groupings, and is not primarily driven by multi-property enterprises that have no relief: 75% of the 5,932 multi-property enterprises with consistent relief levels actually receive a positive amount of SBBS. In fact, almost half of these enterprises had an inconsistent site count but consistent relief levels, again highlighting the discrepancies that arose when grouping properties within the Valuation Roll Versus the IDBR.
As an example of the within-enterprise discrepancies in relief levels, 1,011 (63.7%) of properties with inconsistent relief levels have at least one property with 0 recorded relief and at least one property with 100. The total number of multi-property enterprises is 7,514, which is 6 fewer than the total of the two groups of this with (5,932) and without inconsistencies in relief levels (1,588). This is because there are some enterprises in both groups – again we note that the rule we use for consistency is not completely exhaustive: for example, it could be the case that (50+25+25+100)/4=50 and so this enterprise has properties in two groups.
D.4 Summing up the description of matches
Drawing together the main characteristics of matches shows that for properties:
1. 70,819 of 130,861 (54%) properties were matched to a record in IDBR;
2. 61,902 of these matches (47.3% of 130,861) were to live records;
3. 35,597 (57.5%) of the 61,902 were single-sites, and 26,305 (42.5%) were multi-sites;
4. 5,853 (16.5%) of the single-sites were matched to a multi-property enterprise, and 20,367 (77.5%) multi-sites were matched to an enterprise that had a property count inconsistent with their VR site count;
5. leaving only 29,744 single sites (83.6%) and 5,938 (22.65%) of multi-site properties matched consistently by our definition based on their site count. (note again the strictness of consistency for multi-sites)
At the enterprise level, this means:
1. the 70,819 properties translate to 42,663 enterprises;
2. 32,173 of these enterprises comprised of properties whose site count matched their property count: the 29,744 single-property enterprises and 2,429 multi-properties;
3. 32,269 (76.5%) of enterprises have consistent RVs across properties: the 29,744 single-property enterprises and 2,525 (20.3%) of the multi-property enterprises;
4. 5,932 (79%) of multi-property enterprises did have consistent relief levels;
5. meaning there is sizeable variation within enterprises in the key variables required for the econometric analysis.
There is a problem
Thanks for your feedback