Who knew difference-in-differences could be so complicated?

Guest Post

November 1, 2018

The following is a guest post by Melissa Garrido, PhD. She is the Associate Director of the Partnered Evidence-based Policy Resource Center (PEPReC) at the Boston VA Healthcare System, U.S. Department of Veterans Affairs, and a Research Associate Professor with the Department of Health Law, Policy, and Management at Boston University School of Public Health. For more methods content, you can follow Melissa on Twitter (@GarridoMelissa) or read her tutorials here.

Several recently published papers describe important considerations in difference-in-differences (DiD)* analyses. Several of these have debated the merits of matching in DiD analyses and are summarized below. (Other useful papers tackle issues of standard error estimation and treatment effect heterogeneity due to differential treatment timing. These will be covered in a future post.)

Matching is usually conducted to reduce the chance of observed confounding in an observational analysis. In a recent article (available as Early View in Health Services Research), Jamie Daw and Laura Hatfield point out that because DiD measures relative changes in outcomes (rather than outcome levels), a covariate that is considered a confounder in a cross-sectional analysis is not necessarily a confounder in a DiD analysis:

…variables related only to treatment assignment and outcome level (not trend) do not bias difference-in-differences studies. They are not confounders and therefore not a useful target of matching that is intended to reduce bias due to observable confounders.

What’s the disadvantage of matching on a variable that isn’t a true confounder? In the case highlighted by Daw and Hatfield, matching on variables associated with treatment assignment and outcome level (but not trend) can introduce bias into your treatment effect estimates.

Consider a hypothetical study of the impact of a counseling program on hospital satisfaction scores among bereaved spouses. The treatment group would include spouses who received counseling. The comparison group might include spouses of decedents from nearby hospitals that do not offer the counseling program. You notice that the treatment group has lower mean satisfaction scores than the comparison group and try to correct this by matching the groups on pre-death satisfaction.

This means that your analytic sample now contains the comparison individuals who have below-average pre-death satisfaction (relative to the mean of the original, pre-matched group of comparison individuals) and the treatment individuals who have above-average pre-death satisfaction (relative to the mean of the original group of treatment individuals). Over time, satisfaction levels of the matched comparison individuals may increase towards the overall comparison group mean, and satisfaction levels of the matched treated individuals may decrease towards the overall treatment group mean. Even with a null effect of counseling, it may appear as if counseling decreased satisfaction with care. This is bias due to regression to the mean.

Daw & Hatfield consider the potential benefit (or harm) of matching in a variety of scenarios in which treatment and control groups differ prior to receipt of an intervention. Some highlights, adapted from their flowchart (Figure 4):

Q: What if I have differences in outcome levels?

A: Matching may introduce the risk of bias due to regression to the mean.

Q: What if I have differences in a covariate that’s correlated with the change in outcome?

A: It depends. If the covariate changes over time, you may introduce bias from regression to the mean. If the covariate is relatively stable over time, matching may increase the precision of your estimate.

In an accompanying editorial, Andrew Ryan highlights seemingly contradictory results, in which matching on pre-intervention outcome levels appears to reduce bias. In response, Daw & Hatfield ran further simulations to illustrate that the differences in results are due to the treatment assignment mechanism.

Consider the case where treatment and comparison groups are derived from the same population, treatment is assigned at a unit level, and treatment assignment is based on pre-intervention outcome levels (e.g., within a given health care system, providers are assigned to receive technical assistance if they are below some performance metric). In this case, the treatment assignment mechanism introduces risk of regression to the mean bias. This is a situation in which matching on pre-intervention outcome levels may reduce bias.

However, in the common scenario where treatment is assigned at a population level, so that treatment and comparison groups are derived from different populations (e.g., investigations of policies implemented in a subset of states or of interventions, such as the bereavement counseling example listed above, implemented in a convenience sample of clinics or hospitals), matching may introduce bias due to regression to the mean.** This is the focus of Daw & Hatfield’s original simulations.

What if you were to match on pre-treatment trends in outcomes? Both Daw and Hatfield, as well as Stephan Lindner and John McConnell, caution against matching on pre-treatment outcome trends, particularly when it’s uncertain that these trends are stable. For instance, in our bereavement counseling example, we may see a pre-death trend towards decreasing satisfaction among treated individuals. When matching, it may not be clear if pre-death trends towards decreasing satisfaction among comparison individuals are stable (perhaps due to anticipatory grief) or whether they are due to chance. To the extent they are due to chance, regression to the mean bias can creep into the analysis.

Matching is sometimes a useful tool for reducing bias in DiD analyses, but it should be used with caution. In any matching analysis, you’re deliberately choosing a subset of comparison observations. It’s important to consider whether this choice may introduce bias (as in regression to the mean) or exacerbate bias due to unobserved confounders. If matching might increase bias, an alternate analytic strategy might be needed.

*If you’re new to DiD analyses, you may to wish to first read this brief primer by Justin Dimick and Andrew Ryan.

** They raise the possibility that similar biases from regression to the mean may exist in synthetic control groups, but the extent to which this bias may exist is left for future work.

Who knew difference-in-differences could be so complicated?

Guest Post

Hidden information below

Guest Post

The Incidental Economist

Editors In Chief

Managing Editor

Subscribe