Does repeated use of instruments reveal problems with IV?

The NBER working paper by Randall Morck and Bernard Yeung, “Economics, History and Causation,” has received some attention in the blososphere and in the comments on this blog. I’ve now read the paper. It’s a good one, and one I think even non-economists can (mostly) understand. I recommend you try if you’ve got a little interest in how economists think (or don’t) and work (or don’t).

The key issue raised in the paper about instrumental variables (IV) is this, from the abstract: “[E]ach successful use of an instrument potentially creates an additional latent variable bias problem for all other uses of that instrument – past and future. […] Useful instrumental variables are, we fear, going the way of the Atlantic cod.” I’ve made bold the key word, potentially.

OK, what’s the problem? In the next three paragraphs I’ll say it in technical terms, though with a silly example. That still may not help some people, but I urge you not to throw up your hands and conclude IV is doomed. If you like, skip my unpacking of the problem and go write to the paragraph that begins, “About this, I agree with Alex Tabarrok.”

Suppose one study exploits a seemingly good instrument Z to estimate a causal effect of X on Y1. That is, Z is an instrument for X and Y1 is the dependent variable. That means, among other things, that Z is correlated with Y1. For example, if the coin flip (Z) turns out heads, all the patients live, otherwise they die (treatment X is very, very important to the life/death outcome, Y2).

Now imagine a future study that does the same thing, uses the instrument Z to estimate a causal effect of something on a new dependent variable, Y2. That means Z is correlated with Y2. For example, it turns out that everyone that did not get the treatment (because the coin flip Z was tails) was also in the ICU but nobody who did get treatment was (it turned out to be a very biased coin; who knew?). Imagine Y2 is an ICU stay indicator.

If Y1 and Y2 are also correlated (which is not certain in general, but obviously is in the made up example), then the estimates of the first study are called into question if Y2 (ICU stay) was not included among the control variables. If Y2 was not a control variable in the first study, then it is an unobserved factor correlated with both the instrument and the dependent variable. That makes the instrument endogenous (omitted variable bias or incomplete risk adjustment), which exactly violates the assumptions of a good instrument. It invalidates the instrument. The study is flawed. That coin flip was really not as random as we thought!

About this, I agree with Alex Tabarrok, who wrote, “I don’t see this as a fundamentally new problem or one specific to IVs.” I view this as the natural and welcome march of scientific progress. We discover all kinds of things that invalidate prior theory or call into question prior observation. One can always say, “Maybe the instrument is flawed!” (This applies to physics as well as economics, only more so in the latter.)

Yet, if one is going to make a claim of instrument failure, one had better explain how. Well, a second study that shows the prior use of the instrument was flawed is even better. It’s not just a story of how the instrument could be flawed (in the first study), but a demonstration exactly how it is. Still, it doesn’t prove that the results are wrong, at least qualitatively.

What one then should do, if one wants to attempt to repair the reputation of the results of the first, earlier study is to redo it, only addressing head on the issue raised by the second by including the omitted variable it suggests. That’s not a fundamental problem. Maybe the conclusions of the first study turn out to be qualitatively right anyway, that the omitted variable wasn’t a big deal.

There is always doubt in science, and more so with observational studies than randomized experiments. But we need observational studies. Our life is based on them. We can’t randomized everything, nor should we. So, march on we must. When one result begins to look shaky we should revisit that area and try to do better. For some time an instrument may look potentially bad. It may turn out to be bad. Or maybe not. In time, the fisheries can be replenished.

Hidden information below


Email Address*