2 Literature Review
2.1 Comparative Studies
Many empirical studies in economics depend on estimating the effects of policy interventions or regime changes. In these studies, researchers estimate the change in aggregate outcomes (such as GDP per capita) for a unit affected by a particular policy intervention of interest and compare it to the change in the same aggregates estimated for some unaffected unit (or units).
For example, Card and Krueger (1994) compare the change in employment in fast-food restaurants in New Jersey and its neighbouring state Pennsylvania around the time of an increase in New Jersey's minimum wage. In this study – and many others – information at the aggregate level was or will not be available. In these cases, a sample of disaggregated units (a handful of fast-food restaurants, rather than all minimum-wage employers) may be used to estimate the aggregate outcome of interest (change in the employment rate amongst minimum wage employers when the minimum wage is increased).
Often, these comparative studies lend themselves well to difference-in-differences methods (DID), including in Card and Krueger's case. Other examples include Meyer, Viscusi, and Durbin's (1995) article examining the effect of an increase in benefits for injured workers on time out of work, and Card (1990), who examined the impact of a sudden increase in Miami's labour force due to immigration ('the Mariel Boatlift') on the Miami Labour Market in 1980.
DID is particularly useful when one or more 'treatment' group of individual units (persons, firms, households) has been exposed to some policy intervention, and one or more 'control' group has not. This allows researchers the estimate the effect of the policy intervention in the form of the interaction between time and treatment.
The most critical assumption in the DID method is perhaps the parallel trend assumption: the difference between the treated group(s) and the control group(s) needs to be constant over time in the absence of the policy intervention. This is fundamentally unknowable and so depends highly on the chosen control group(s). However, the group structure of the errors and serial correlation are two other commonly overlooked pitfalls, as highlighted by Moulton (1990) and Bertrand, Duflo, and Mullainathan (2004), respectively,. This is explained further below.
- The treatment and control 'groups' are chosen by the researcher, making assumptions about their similarities both before the policy intervention and after crucial to estimating the effect of the intervention.
- Serial correlation is often not accounted for, which may lead to an overestimation of the true treatment effect. This is discussed by Bertrand, Duflo, and Mullainathanand can in some cases be remedied. Inference testing (or placebo testing) is the most favoured method to do so, meaning placebo treatments are generated for the control groups and effects estimated. The estimate for the actual treatment can then be compared to estimates for placebo treatments to gauge its precision and accuracy.
- Bertrand, Duflo, and Mullainathan (2004) highlight further drawbacks of the DID method, such as the assumption that the intervention is 'as good as' random (conditional on time and group fixed effects) which otherwise could signal potential treatment endogeneity.
In our case, the assumption of random treatment assignment would be relevant if the United States Trade Representative chose to implement tariffs on single malt from the UK because they deemed the US to be importing too much single malt compared to countries without a (change in) tariff.
Although the decision was likely based on a multitude of factors (e.g., the relative importance of single malt in total drink exports to the US and the value of the spirits industry in the US), it is unlikely this was one of them. Add to this the fact that the tariffs were introduced as a result of a separate trade dispute over state aid given in the aeroplane industry, and this issue becomes even less worrisome.
The DID method has a further drawback which is especially relevant in our case:
- The majority of the uncertainty reflected in the DID estimator will be the uncertainty associated with not knowing the true population value (due to only having a sample of the 'population' control and treatment groups).
If we do know the aggregate data, this uncertainty can be disregarded. However, the uncertainty regarding the control group's counterfactual trend remains (uncertainty which is not expressed within the regression framework's standard errors).
Since, in our case, we do know the total monthly UK export figure in the case of single malt (we are not sampling whisky importers in the United States and Canada) the standard DID framework may not be the most appropriate. This is not to say that the estimate of the treatment effect will be biased or otherwise incorrect; however, the standard errors will not reflect the 'true' uncertainty.
The synthetic control methods described by Abadie and Gardeazabal (2003) may be more appropriate in these cases. Abadie and Gardeazabal study the impact of political instability on economic prosperity in the context of terrorist activity in the Spanish Basque country. They do this using aggregate (county-level) observations. This political instability coincided with an economic downturn, which is in some ways similar to the Covid-19 outbreak coinciding with tariff effects in our case.
2.1.2 Synthetic Control
The synthetic control method is an extension of the DID method in which a so-called 'synthetic control' is constructed using a combination of unaffected (control) units, rather than a single unit (as with DID). This combination can take the form of a simple average or a weighted average. The synthetic control approach can limit the risk of a bad choice of control unit or group resulting in overly large (or small) treatment effect estimates.
Abadie, Diamond, and Hainmueller (2010) (ADH hereafter) describe the synthetic control method in more detail and use it to estimate the effect of California's tobacco control program. They advocate for the use of data-driven procedures to construct suitable control groups. It may be challenging to find the single unexposed unit that approximates the most relevant characteristics of the exposed unit(s). ADH argue that a combination of units often provides a better comparison for the unit exposed to the intervention than any single unit alone.
In our case, this may involve a combination of single malt exports to Canada, as well as single malt exports to other nations – or UK exports of goods to the US unaffected by the tariff(s). This is not a novel approach; for example, Card (1990) uses a combination of cities to construct a control unit.
Because a synthetic control is a weighted average of the available control units, the synthetic control method makes explicit:
- the relative contribution of each control unit to the counterfactual; and
- the similarities (or lack thereof) between the unit affected by the policy intervention and the synthetic control in terms of pre-intervention outcomes and predictors.
However, exports of single malt are highly seasonal, potentially complicating the factor model ADH describe (this is less of an issue if the seasonality is of a similar magnitude between treatment and control units).
Additionally, the data requirements for this method are higher than conventional DID, since the same data is needed not only for the treated unit (the United States) and one control unit (e.g. Canada), but also a wide variety of control units.
The method described by ADH has been used by many others, both in econometrics and other fields. For example, Donohue, Aneja, and Weber (2018) use the method to study the impact of right-to-carry laws in the United States, while Cunningham and Shah study the impacts of the decriminalisation of prostitution in Rhode Island, and Kleven et al (2013) study impacts of changes in tax legislation.
The use of synthetic control methods in trade economics is limited. Generally, analysis of trade agreements and legislation focus on the macroeconomic aspect – changes in trade flows, employment, economic output, income – and less on the trade in one specific good. Hannan (2016, 2017) uses it in two IMF working papers, to study the impacts of trade agreements in the 1980s and 1990s; more specifically, she studies the change in the value of exports/imports to/from countries entering a trade agreement.
Slaughter (2001) uses conventional DID with many control groups to gauge whether trade liberalisation has led to income convergence or divergence, and Fotopoulus and Psallidas (2009) use an augmented DID approach where they match the most similar country pairs to estimate the effect of the introduction of the Euro on bilateral trade. Fotopoulus and Psallidas use a country-level deflator in a gravity model for each country-pair, and use the resultant deflated real GDP as well as the real exchange rate, distance, common language, areas, and common border as confounders in this model. They show that the adoption of the Euro increased trade significantly and find no evidence of trade diversion.
2.1.3 Extensions and Other Methods
Conventional time series methods were also considered – structural break time series models, in particular. Investigating the presence of a structural break around October 2019 could provide some indication of a change in the data-generating process. However, a pure time series approach may not be able to properly account for the potential decreases seen because of Covid-19, which could lead to an overestimation of the effect of the tariffs.
Investigating a cointegrating relationship between, for example, Canada and the US – and the change in this relationship – is another potential way of investigating a change seen in the US, but not Canada. However, this results in some of the same issues as conventional DID (namely, how to choose the 'control' unit).
The synthetic control method, as advocated by ADH, uses placebo studies to draw inferences (instead of large sample inferential techniques). As a result, no standard errors or confidence intervals are obtained. These outputs are particularly useful, for example, when the impact is close to zero – a small confidence interval could limit the impact to only negative values, for example. Additionally, a large number of controls is beneficial, as a p-value of 0.05 or smaller can only be generated when there are 20 or more control units.
Linden (2018) argues that standard errors, and confidence intervals are useful in gauging the causal impact and suggest using Newey-West standard errors to overcome some of the problems ADH highlight without using placebo studies. This is done by way of using the synthetic control method in conjunction with interrupted time series analysis (ITSA, also known as quasi-experimental time series).
Lastly, as noted by Ferman et al (2020), there is little guidance available on the choice of predictor variables. They note that some authors use all pre-treatment outcomes (export values, in our case) as predictors, while some use a subset or an average in addition to other predictors. Leaving the choice of which predictors to include until the end gives rise to potentially cherry-picking results which show statistically significant results ('specification searching'). Abadie and Gardeazabal (2003), for example, use the mean of all pre-treatment outcome values (plus additional covariates), while others use various selected lagged values.
Ferman et al provide two recommendations to researchers in choosing a lagged dependent variable specification:
1. Use only specifications which use an infinitely large number of pre-intervention outcome values as the numbers of pre-intervention periods increases (i.e. approaches infinity).
This rules out specifications which do not consider the dynamics of time series, such as the mean of all pre-treatment outcomes or a specification which uses a limited number of pre-treatment outcomes (e.g. first, middle, or last outcome).
2. Report results for different specifications.
The second recommendation complicates the reporting of an obvious point estimate. Ferman et al (2020) suggest basing an inference procedure on a new test statistic that is a function of all the test statistics for each individual specification and either:
a. using a weighted average of the point estimate if the function is a weighted average of test statistics (with the same weights), or
b. construct confidence intervals using a set identification procedure suggested by Firpo and Possebom (2018).
There is a problem
Thanks for your feedback