Methods (kinda): Rubin on Rubin and Campbell

Yesterday I encouraged you to read at least the paper by Stephen West and Felix Thoemmes if not all the papers on Campbell’s and Rubin’s causal frameworks in this 2010 issue of Psychological Methods. I also encourage you to read the response by Rubin. It’s much shorter and so much fun. Here are my highlights.

Because my doctoral thesis was on matched sampling in observational studies under Cochran, I thought that I understood the general context fairly well, and so I was asked by the Educational Testing Service to visit Campbell at Northwestern University in Evanston, Illinois, which is, incidentally, where I grew up. I remember sitting in his office with, I believe, one or two current students or perhaps junior faculty. The topic of matching arose, and my memory is that Campbell referred to it as “sin itself” because of “regression to the mean issues” when matching on fallible test scores rather than “true” scores. I was flabbergasted!

Rubin later showed that he was correct about matching but that Campbell was not wrong because Rubin had misunderstood his objection.

Of course, the situation with an unobserved covariate used for treatment assignment is far more complex, and that situation, coupled with the naive view that matching can fix all problems with nonrandomized studies, appears to have been the context for Campbell’s comment on matching.

(I may put up a methods post on matching at some point, though I haven’t decided.)

The drive for clarity in what one is trying to do expressed in this passage resonates deeply:

Perhaps because of my physics background, it seemed to me to make no sense to discuss statistical methods and estimators without first having a clear concept of what one is attempting to estimate, which, I agree with Shadish (2010), was a limitation of Campbell’s framework. Nevertheless, Campbell is not alone when implicitly, rather than explicitly, defining what he was trying to estimate. A nontrivial amount of statistical discussion (confused and confusing to me) eschews the explicit definition of estimands. […] My attitude is that it is critical to define quantities carefully before trying to estimate them.

Elsewhere in the paper, Rubin reveals that even Campbell did not think very highly of his own ability to do math. Rubin studied physics with John Wheeler at Princeton, which one can’t do without a lot of math ability and confidence in it.

Later in the paper he has a very nice discussion of the stable unit treatment value assumption (SUTVA), which I won’t repeat here. Very roughly, the aspect of it that’s relevant below is that there be one treatment (or at least a clearly defined set of them), not a vague, uncountable cloud of them. (See also, Wikipedia.) It’s due to this assumption that the problem of, say, the causal effect of gender on wages is “ill defined,” as I raised in my prior post.

For example, is the statement “She did well on that literature test because she is a girl” causal or merely descriptive? If [being assigned to the “control” group] means that this unit remains a girl and [being assigned to the “treatment” group] means that this unit is “converted” to a boy, the factual [the outcome from assignment to “control”]  is well defined and observed, but the counterfactual [outcome due to “treatment”] appears to be hopelessly ill-defined and therefore unstable. Does the hypothetical “converted to a boy” mean an at-birth sex-change operation, or does it mean massive hormone injections at puberty, or does it mean cross-dressing from 2 years of age, and so forth? Only if all such contemplated hypothetical interventions can be argued to have the same hypothetical [outcome] will the requirement of SUTVA that there be no hidden versions of treatments be appropriate for this unit.

But this does not mean there can be no well-defined study of the causal effects of gender.

An example of a legitimate causal statement involving an immutable characteristic, such as gender or race, occurs when the unit is a resume of a job applicant sent to a prospective employer, and the treatments are the names attached to the resume, either an obviously Anglo Saxon name [“control”] or an obviously African American name [“treatment”].

They key here is that though you can’t in a reasonably defined, unique way imagine changing the gender of a person, you can imagine changing the gender as listed on a person’s resume.

Later still, Rubin explains how, before his work, the “observed outcome notation” that had been the norm made it impossible to be clear how and why certain designs permit unbiased estimates. You really have to read the paper (at least) to see this. I’m still not sure I get it, but I believe him!

To repeat, using the observed outcome notation entangles the science [all the potential outcomes and observable factors] and the assignments [the mechanism by which observed outcomes are selected among potential ones]—bad! Yet, the reduction to the observed outcome notation is exactly what regression approaches, path analyses, directed acyclic graphs, and so forth essentially compel one to do. For an example of the confusion that regression approaches create, see Holland and Rubin (1983) on Lord’s paradox or the discussion by Mealli and Rubin (2003) on the effects of wealth on health and vice versa. For an example of the bad practical advice that the directed acyclic graph approaches can stimulate, see the Rubin (2009) response to letters in Statistics in Medicine. […]

To borrow Campbell’s expression, I believe that the greatest threat to the validity of causal inference is ignoring the distinction between the science and what one does to learn about the science, the assignment mechanism—a fundamental lesson learned from classical experimental design but often forgotten. My reading of Campbell’s work on causal inference indicates that he was keenly aware of this distinction.

(I may read and then post on Lord’s paradox. I don’t know what it is yet.)


Hidden information below


Email Address*