Causal Speculation: A Meditation on Theory
In a post late last year on correlation and causality I touched on the role theory plays in making causal inferences. Specifically, I indicated that a causal inference cannot be made from the analysis of data from an observational study without a theoretical model. If conclusions of causality rely on theory, where does the theory come from? Fundamentally, how do we know when something causes another?
Before addressing those questions, let’s clarify the issues with a hypothetical example. Suppose physicians begin to notice that males who eat a particular exotic beetle of Zimbabwe also have no hair loss, independent of age. There are precisely four possibilities:
- the observations are coincidental,
- consumption of such beetles (causally) prevents hair loss,
- lack of hair loss (causally) leads to consumption of the beetles,
- beetle consumption and lack of hair loss are both jointly caused by something else.
Each of these possibilities is essentially a theory about the world, and each implies something about the causal relationship (or lack thereof) between beetle consumption and hair loss. Theories 1-3 are relatively simple since the causality implied only runs at most one way. Theory 4 is the source of many problems. Even when there is strong evidence in support of theories 2 or 3, one can never fully rule out theory 4. “Something else” could be anything.
That’s essentially why causal inferences cannot be made from correlations (or statistical analysis) alone. One needs to put a fence around the problem and assert that only the factors one has considered, measured, and included in the analysis are relevant, that there is no “something else” left out. With that assertion one can make causal inferences from statistical models and correlations (assuming the correct application of appropriate technique).
Where does this assertion that all relevant factors have been considered come from? Its origin is outside the data, outside the analysis. It is extra-empirical. Put simply, it is theory, a hypothesis about the nature of causality in the world that can be rejected, but never fully confirmed by the data. Without it no causal inference can be made no matter the quality of the data or what is done with it.
Where do causal theories–the fences around problems–come from? Why do we believe that x causes y and not vice versa or that some other factor z causes both? These questions are puzzling because all our experience is empirical yet theory stands outside the data.
Perhaps theory comes from extrapolation from the subset of our experience that is exactly like or darn near a randomized trial (either explicitly so or due to a natural experiment that makes it close enough)? If the “cause” seems random we’re comfortable inferring that it is responsible for much of what seems to “result.” No doubt this is hard-wired into our brains, a consequence of evolution. For example, “See lion eat chief. See lion again, run!”
But such causal inferences are formed quickly and easily can be wrong. Moreover, we frequently hold mutually exclusive causal ideas in our brains at the same time. The role of theory is to force us to organize our causal ideas, to be explicit, and to iron out logical inconsistencies. Then we go to the data to test the theory.
It is tempting to believe that the world can be understood from data alone, that if x causes y we should not need theory to tell us so. Evidently that is not the case, at least insofar as observational studies are concerned. Observational methods comprise a great deal of science and, in far less rigorous form, most of our experience. This leads to a version of the anthropic principle: we can’t exist without theory (nor, I assert, could many animals). A world in which humans don’t rely on theory would be one in which humans, as we know them, do not exist.
Causation and Evolution
James Bronzan sent me a follow-up e-mail to my post based on his earlier comment. He agreed to let me quote it:
But again, it still seems to me that there’s likely a reason that we look for instances of causation to be associated with instances of correlation precisely because they are. It turned out to be a good evolutionary strategy to do this — we adapted to the way the world is. (Grossly coarse example: it was useful to go beyond “man, every time someone happens to eat that berry, he or she dies,” to “perhaps eating that berry causes death.”)…
Of course it doesn’t require that they be this way, or indeed that there aren’t other effects of causation that we don’t detect whatsoever. But I’d posit that those effects didn’t turn out to be useful in decision making linked to survival, which suggests (to me, anyway) that the causation-correlation link is more prevalent or stronger.
I agree with James’ perspective to a point. I might even be caught making the same or similar arguments some day. But today I’ve decided to differ, if only slightly, for entertainment value. Plus I’d like to think I add an important nuance or two in what follows.
We actually can’t say that we are reacting to correlation when we intuit causation. Correlation is such a precise measure and, therefore, is only meaningful when talking about statistical analysis. I don’t believe our berry-eating ancestors were running regressions in their heads. Few of us do that today.
When it comes to statistical analysis and quantitative studies, one can speak precisely about correlation and causation. Linear models and their Gaussian assumptions reduce everything to correlations. There’s nothing left. The tools are blind to all else, though they’re very helpful when we have a theory upon which we choose to rely. Even in cases where models are not linear and assumptions of Gaussianity are relaxed, our intuition (by which I mean that of researchers) is based on the linear/Gaussian/causal world, so some of its distortions remain.
But back to the non-research, intuitive world we inhabit. On what do we base our causal inferences? It is correlations (maybe) but could be more or less than that. It is some vague interpretation of sensory data, I know not what. Yes it is evolved and therefore is (or was, rather) of great utility for reproduction.
Is it of great utility today and for other purposes? Yes, but it also leads to errors, a subset of which we notice. But very few cases in which it is applied casually are in areas relevant to reproduction. So there is this muscle we use in domains beyond that in which it was strengthened by evolution.
What can we say about the degree to which correlation (or whatever we do) is associated with causation in such domains? I think strictly speaking not much. We have only our bias and intuition. Beyond that I’m willing to say, “I don’t know.” Not everyone is comfortable with giving the unknown that much scope. It isn’t necessary that we do. But very often it is important if we do. Noticing this causality bias really can be an eye opener. Though if one goes too far it becomes hard to know anything.
Reader Response: Causation Bias
In a thought provoking comment to my post Causation without Correlation is Possible James Bronzan wrote
To my uneducated eye, I’d say that even though correlation does not mean causation and causation does not mean correlation, they nonetheless travel very closely together. In other words: the instances in which causation results in correlation are far more frequent in the world than those in which it does not. (Maybe that’s why the examples had so much mathyness?)
Surely if X causes Y instances of X are more likely to hang around with instances of Y than not.
There’s a lot going on here. Let me try to unpack it. First of all, causation itself and any claim of it are theories or mental models, if you prefer that terminology. (A subsequent post will go more into this point.) However, causation and causal thinking has proven to be incredibly useful. No doubt it is an evolved modality of thought. Even if the world is not objectively causal in any sense I’m not ready to abandon causality. Let us say, if only as shorthand, causality exists and is ubiquitous. (As an exercise try for a moment to imagine any time and place in the universe where causality ceases. Good luck.)
I assert that there are far more things that are related by correlation than by causality. To build on an example from the prior post, among children, reading comprehension is correlated with, among other things, shoe size, mathematical ability, height, weight, and age. Yet only one of those plays a causal role (or so a reasonable person can believe). If causation exists and is ubiquitous, what do we say about correlation? It is hyper-ubiquitous. It is over-abundant. There is far too much of it to be useful. If we based inferences on correlation the universe would be over-determined. There’s just far too much of it.
That’s related to the fact that correlation is not as useful as we’d like to think. Or, rather, it is a very blunt tool, especially for causal inference. Using it is like trying to catch water with bare hands. It leaks out all over the place unless one carefully plugs all the holes. That’s not so easy to do, but it isn’t impossible to do a credible job. Sometimes we can hold just enough water to get a drink.
Very often we think correlation is carrying water when it is not. The conclusions of many studies on many subjects and much of what is believed in general (I can’t say how much) about many things are based on very casual causal inferences from correlations. I’d say we have a bias to think this way. It’s a sub-type of confirmation bias. We so adore our causal theories that we search for and believe correlations that support them. Sometimes we learn later how wrong we were. The history science is full of such stories (flat Earth, the Ptolemaic model, blood letting, and more recently, though less profound, arthroscopic knee surgery for arthritis, among many others).
But James’ point isn’t that correlation very often implies causation. Rather, his point is that if X really does cause Y, it is far more likely that X and Y are correlated than not. It certainly seems that way. But an honest look at this issue has to account for the bias in our minds and in our tools. We have very good tools for discovering correlation and certain other measures of relatedness. There could very well be (in fact must be) a class of causal phenomena that escape the detection of those tools. That is to say, our minds and tools are biased in favor of contemplation, detection, and study of evidence of correlation in support of causal inference. It is tempting to conclude that causation and correlation are frequently or tightly associated. But I don’t know how one could substantiate such a claim.
Nevertheless, correlations are useful in the study of causal phenomena. Though they do not by themselves confirm or reject causal assertions they do measure degree of relatedness. That is to say, if we take as given X causes Y, the next question is “to what extent?” Correlation provides an answer, though an incomplete one (being only one statistic).
So, to address James’ point directly, I think we do find that instances of causation to be associated with correlation. But that’s because that’s what we look for and that’s what we can see. The brightness of the street lamp tells us nothing about the extent of the universe it illuminates.
Causation without Correlation is Possible
It is well known that correlation does not prove causation. What is less well known is that causation can exist when correlation is zero. The upshot of these two facts is that, in general and without additional information, correlation reveals literally nothing about causation. It is neither necessary nor sufficient for it.
Correlation without causation. My favorite hypothetical example of this is a study of thousands of middle and high school kids. The poorly informed investigators measure shoe size and reading comprehension scores. They find that the two are positively correlated. Their manuscript claiming that larger feet cause better reading skills is rejected, of course. Foot size does not cause better reading skills despite the correlation of the two.
Two elements are missing from this study. One is the measurement of age, which is related to both foot size and reading comprehension. The other missing element is a conceptual or theoretical model that provides a basis for causal interpretations of the relationships between age and foot size and between age and reading comprehension. Getting older is correlated with both and we say it is the cause of both because we have a plausible conceptual model of human development that is consistent with such an interpretation.
Causation without correlation. It is a common misconception that correlation is required for causation. Let’s start with a simple example that reveals this to be a fallacy. Suppose the value of y is known to be caused by x. The true relationship between x and y is mediated by another factor, call it A, that takes values of +1 or -1 with equal probability. The true process relating x to y is y = Ax.
It is a simple matter to show that the correlation between x and y is zero. Perhaps the most intuitive way is to imagine many samples (observations) of x, y pairs. Over the sub-sample for which the pairs have the same sign (i.e. for which A happened to be +1) y=x and the correlation is 1. Over the sub-sample for which the pairs have the opposite signs (i.e. for which A happened to be -1) y=-x and the correlation is -1. Since A is +1 and -1 with equal probability, the contributions to the total correlation from the two sub-samples cancel, giving a total correlation of zero.
Since x really does have a causal role in determining the value of y we see that causation can exist without correlation. This result hinges on the precise definition of correlation. It is a specific statistic and reveals only a little bit about how x and y relate. Specifically, if x and y are zero mean and unit variance (which we can assume without loss of generality), correlation is the expected value of their product. That single number can’t possibly tell us everything about how x might relate to y. If we didn’t know the true process y=Ax and the statistics of A in advance we might be tempted to say that x cannot cause y due to a lack of correlation. That would be an incorrect conclusion. Correlation and our lack of understanding of it would be misleading us.
But there are other statistics to consider. In the example above x and y are uncorrelated but their magnitudes are not. That is, there are functions of x and functions of y that are correlated. This must be so because the two relate to each other (causally) somehow. In general, evidence consistent with the causal relationship is found in the probability density of y conditioned on x. If x causes y then that conditional probability, p(y|x), must be a function of (vary with) x. It is possible for p(y|x) to depend on x yet for the correlation of x and y to be zero. But causation cannot exist if p(y|x) is independent of x. Or, put even more simply, though x and y can be both uncorrelated and causally related, they cannot be statistically independent and causally related.
Advanced example. (This is a bit more advanced so some readers may wish to skip it.) I’ll close with a nice real world like example offered by my colleague Steve Pizer. Suppose we have good theoretical reasons to believe that illness causes death. Let
y = death (1 if dead, 0 if alive),
x = illness (1 if sick, 0 if not),
t = administration of treatment (1 if treated, 0 if not),
e = other unobservable factors (could be anything).
The true (hypothetical!) model of death is y = (1-t)x + e. That is if an individual is ill (x=1) and doesn’t get treatment (t=0) they would surely die apart from the effects of other factors denoted by e. On the other hand, sick individuals who do get treated live, again ignoring e. Assume the correlation of t and x is very high (like 0.99). That is, nearly everyone who is ill gets treatment and almost nobody who is not ill does. Therefore, hardly anyone who contracts the illness actually dies from it.
If we estimate this model without observing t, we would find that illness and death are uncorrelated. Such a finding might tempt us to question our theory that illness causes death. This would be a mistake because we’ve omitted an important factor, treatment t, in the analysis. However, if we can observe t, then the high but imperfect correlation between t and x might make it possible to estimate the true effect of illness on death, using appropriate econometric techniques. We might therefore learn the degree to which illness (untreated) causes death, consistent with our theory.
The foregoing is an illustration of the type of incorrect conclusions that can result from improper analysis of observational study data (as opposed to a randomized trial). Steve has written a very handy tutorial paper [pdf] on this topic, which I recommend highly to anyone working on observational studies or wishing to better understand them. Additional exploration of the econometric issues is provided in the Background (Section 2) and Set-up (Section 3.1) of a recent NBER paper by Millimet and Tchernis.
Later: For more on this topic, see my follow-up posts.




