What Took the “Con” out of “Econometrics”?

This is an uncharacteristically but justifiably long post on some fairly technical aspects of applied economics. If it isn’t for you, skip it and try the next one.

A March 2010 NBER paper by Joshua Angrist and Jörn-Steffen Pischke may be the best economics paper I’ve ever read. The Credibility Revolution in Empirical Economics reviews the extent to which greater emphasis on research design in microeconomics studies have dramatically increased the credibility of empirical work. That credibility was called into question by Edward Leamer in a 1983 American Economic Review paper titled Let’s Take the Con Out of Econometrics. In it Leamer lamented, “[H]ardly anyone takes anyone else’s data analysis seriously.” (Leamer’s paper is itself a fun and worthwhile read.)

Leamer and others in the early 1980s were distressed by the lack of testing of implications of assumptions in specification and functional form of econometric models. His proposed solution was to analyze the changes in results based on model variations (sensitivity analysis). Angrist and Pischke make a strong case that Leamer was correct in his diagnosis but not necessarily in his prescription. They argue that the “credibility revolution” experienced in empirical microeconomics since Leamer’s critique is due principally to a greater focus on research design not on sensitivity analysis.

A “research design” is a characterization of the logic that connects the data to the causal inferences the researcher asserts they support. It is essentially an argument as to why someone ought to believe the results. It addresses all reasonable concerns pertaining to such issues as selection bias, reverse causation, and omitted variables bias. In the case of a randomized controlled trial with no significant contamination of or attrition from treatment or control group there is little room for doubt about the causal effects of treatment so there’s hardly any argument necessary. But in the case of a natural experiment or an observational study causal inferences must be supported with substantial justification of how they are identified. Essentially one must explain how a random experiment effectively exists where no one explicitly created one.

I view their paper in three parts, one on improvements in research design in microeconomics  (which I understand well), one on the degree to which industrial organization (IO) has harnessed those improvements (an area about which I’m learning), and one on macroeconomics (which I am not qualified to judge). Taking them in reverse order, let’s begin with macro. Angrist and Pischke essentially characterize it as too little data and design chasing far too much theory. Whether that is fair or not I will leave to others to evaluate (paging macro bloggers). Nevertheless, they point to “[s]ome rays of sunlight pok[ing] through the grey clouds” of macro and continue by summarizing a few design-based macro studies (quotations © 2010 by Joshua Angrist and Jörn-Steffen Pischke).

If their critique of macro is strong, their attack on IO is withering.

The dominant paradigm for merger analysis in modern academic studies, sometimes called the “new empirical industrial organization,” is an elaborate exercise consisting of three steps: The first estimates a demand system for the product in question … Next, researchers postulate a model of market conduct …  Finally, industry behavior is simulated with and without the merger of interest….

[T]his elaborate superstructure should be of concern. The postulated demand system implicitly imposes restrictions on substitution patterns and other aspects of consumer behavior about which we have little reason to feel strongly. The validity of the instrumental variables used to identify demand equations—prices in other markets—turns on independence assumptions across markets that seem arbitrary. The simulation step typically focuses on a single channel by which mergers affect prices—the reduction in the number of competitors—when at least in theory a merger can lead to other effects like cost reductions that make competition tougher between remaining producers. In this framework, it’s hard to see precisely which features of the data drive the ultimate results.

Angrist and Pischke ask whether characteristics of simulated mergers based on this new empirical IO framework match those from other credible design-based merger studies. Their answer based on a survey of comparisons to date is that the evidence is mixed, which in their view diminishes the credibility of the new empirical IO approach.

Finally, turning to domains of empirical microeconomics in which a focus on design has been most prominent, Angrist and Pischke make some superb points. Among them is the notion that the gold standard of the randomized experiment is not without deficiencies. Such experiments are “time consuming, expensive, and may not always be practical.” To this I would add that they are also not always decisive. Even the RAND health insurance experiment (HIE) has been critiqued (and defended). That is not to suggest that it is certainly flawed (or certainly perfect), it is merely to say that variations in interpretation exist for results of randomized experiments just as they do for non-experimental studies.

Indeed, Angrist and Pischke (and I) agree with Leamer that “randomized experiments differ only in degree from nonexperimental evaluations of causal effects.” The authors add that “a well-done observational study can be more credible and persuasive than a poorly executed randomized trial.” It is for this and the other foregoing features of randomized experiments that I believe the half-billion dollars or so that some advocate spending on another RAND HIE would arguably be better spent funding well-conceived observational or natural experiment-based studies. (A half-billion dollars could found on the order of 1,000 observational studies.)

In perhaps the clearest possible example of why Leamer’s suggested remedy for empirical economics–sensitivity analysis–was not how it regained its credibility, Angrist and Pischke summarize a 1997 American Economic Review paper by Sala-i-Martin that reported results of two million variations of regression analysis. (The paper is titled I Just Ran Two Million Regressions.) The author chose three fixed control variables and selected three others at random from a set of nearly 60. He obtains some “wonderfully robust” predictors but Angrist and Pischke are not impressed.

Are these the right controls? Are six controls enough? How are we to understand sources of variation in one variable when the effects of three others, arbitrarily chosen, are partialed out? Wide-net searches of this kind offer little basis for a causal interpretation.

For all that, sensitivity analysis does have a place in the canon of empirical technique. Angrist and Pischke may be correct that it is a focus on design and not more sensitivity analysis that deserves the lion’s share of credit for distinguishing econometrics from whimsical alchemy. However, once one is working within a framework of sound design sensitivity analysis is an important check on the robustness of results. Therefore, Leamer’s advice is valid as an enhancement to, not instead of, good design. And that may, in fact, be the sense in which he meant it. That is certainly the sense in which it ought to be interpreted today.

In conclusion, Angrist’s and Pischke’s paper is an excellent review of issues pertaining to causal inference. It cites and summarizes a substantial number of high-quality work in numerous applied economics domains. And it makes a compelling case for how attention to elements of research design have taken the “con” out of “econometrics.” If you’re a student or practitioner of applied economics, consider reading the whole thing. As long as this post is, it hardly does it justice.

Hidden information below


Email Address*