Can Big Data address causal Inference?

Austin Frakt

February 2, 2015

Rocío Titiunik thinks not, in general.

The relevant question is whether big data has the potential to uncover causal relationships that could not be discovered with “small” data. […] [T]he bottleneck often is the lack of a solid research design and a credible theory, both of which are essential to develop, test, and accumulate causal explanations. […]
The fundamental problem of causal inference is that for every unit, we fail to observe the value that the outcome would have taken if the chosen level of the treatment had been different (Holland 1986 ). Therefore, the search for causal inferences is a search for assumptions under which we can infer the values of these unobserved counterfactual outcomes from observed data. The question at the center of my argument is whether access to big data fundamentally increases the likelihood that those assumptions will hold.

“Big data” can mean many things. In particular, it could mean a large number of observations and/or a large number of variables. We can dispense with the first one relatively easily:

[N]o increase in the number of observations, no matter how large, will cause the omitted variable bias in a mis-specified [] model to disappear.

One argument that a large number of variables may not help is that something important could still be missing.

Without a theory and a research design, it is not possible to know when to stop adding to the list.

For all that, big data are still useful, as Titiunik goes on to discuss. It just isn’t a solution to causal inference by itself. I agree.

@afrakt

Austin Frakt

Austin Frakt, PhD, is co-Editor-in-Chief of The Incidental Economist. His day job is Associate Director of the Partnered Evidence-based Policy Resource Center at the Boston VA Healthcare System, U.S. Department of Veterans Affairs. He is also a Principal Research Scientist with the Department of Health Policy and Management at the Harvard T.H. Chan School of Public Health, and Editor-in-Chief of the journal Health Services Research.

Can Big Data address causal Inference?

Austin Frakt

Hidden information below

Austin Frakt

The Incidental Economist

Editors In Chief

Managing Editor

Subscribe