• Bias, validity, and terminology

    After posting and editing the following, I realized that I should promote (again) Mostly Harmless Econometrics by Angrist and Pischke. It covers in great detail issues raised below, and so much more. I’ve clearly forgotten some of its contents. I could not easily find answers to my questions below in the time available this morning. So, let’s crowd source them. 

    This is why I blog and in the style I do so. After some back-and-forth in the comments to my post on bias yesterday (go read those comments), Matt offers more precise terminology — distinguishing bias from internal and external validity — within a differently organized discussion. I like what he’s done. My comments and questions are interleaved with his. Let’s keep discussing this!

    (1) What internally valid estimates can we obtain from the Oregon Study?

    We can obtain the effect of winning the lottery (ITT) and the effect on the population that gained insurance due to winning the lottery (LATE).

    We cannot obtain the ATE [average treatment effect] or the TOT [effect of treatment on the treated]; the seemingly natural estimators of these quantities are biased since the populations we are comparing differ due to self-selection. The ITT and LATE avoid this problem because they scrupulously _solely_ compare the full group of lottery winners to the full group of lottery losers.

    I agree that LATE exploits the lottery, but does it really compare the full groups of winners to losers? My understanding is it compares the two groups of compliers, as I wrote. That’s the difference between ITT and LATE.

    (2) What internally valid estimates can we obtain from alternative study designs? How do they differ?

    From a perfect compliance RCT, we can estimate the ATE for the study population. Relative to the group covered by the LATE, the group covered by the ATE also includes: (A) the types of people who still enroll if they lose; and (B) the people who will not enroll even if they win.

    Point of clarification: With perfect compliance, there are no people who still enroll if they lose. There are no people who do not enroll even if they win. However, those groups exist in an RCT without full compliance and, as I wrote above, LATE filters our their effect. Under full compliance LATE is the same as ATE is the same as ITT. The way I’d put this is not that ATE includes these noncompliant groups. I’d say that the ITT and LATE estimates are the ATE in a fully-compliant RCT. They are not in an RCT without full compliance. Continuing with Matt’s section (2):

    From a Oregon-like study in which we forbid enrollment by lottery losers, we can obtain the TOT. Relative to the group covered by the LATE, the group covered by the TOT adds (A) from above but not (B).

    TOT compares treated with untreated. I can think of three ways to do this, and I’m not certain which one we call TOT. Way 1 compares all treated to untreated, regardless of random assignment. Way 2 does so only for those assigned to treatment. Way 3 does so only for those assigned to control. Which one is TOT? Why does it incorporate a group like (A) but not (B)? The only version of the three versions of TOT I suggested that is consistent with that is way 3. But way 3 is the one that sounds least likely to be what one means by TOT. (I’d order it as way 1 > way 2 > way 3.) Still continuing under his section (2):

    The LATE/ATE/TOT difference is not about bias. Each average treatment effect is perfectly valid for the population it pertains to; those populations are just different.

    I agree that the differences among these types of estimates have nothing to do with bias. However, Matt wrote early in his comment (way above), “the seemingly natural estimators of these quantities are biased since the populations we are comparing differ due to self-selection.” I think we need to explore this more. What does “seemingly natural” mean? It appears to be doing a lot of work here. When do we say we have obtained a biased estimate? Can we give a precise example? Does it merely mean we haven’t really computed one of them properly?

    (3) Which estimates have the greatest external validity for the policy questions of current interest?

    This is a hard question. The answer depends on whether the group affected by our proposed policy looks more like the group included in the LATE or the full population. How do we think about that?

    Insert your existing discussion of this point.

    Matt is either referring to my comments to my post or the post itself. Either way, I presume you’ve read them.


    • I have some additional thoughts. I’m going to take things out of order if that’s OK.

      On (2), I agree in the light of morning that what I wrote is confusing. In my head, I was defining everything with reference to the Oregon study design. That is, for the Oregon intervention, there are three types of people: compliers, always takers (lottery losers who take up), and never takers (lottery winners who do not take up).

      Relative to Oregon, a perfect compliance RCT means changing the intervention so that the “Oregon always takers” can never get Medicaid if they lose the lottery and forcing the “Oregon never takers” to always take up Medicaid if they win lottery.

      With these definitions, the LATE averages over only the “Oregon compliers.” The ATE we obtain from the perfect compliance RCT averages over the “Oregon compliers,” the “Oregon always takers,” and the “Oregon never takers.”

      On (1), there is one infelicity in my terminology. The ITT/LATE are not estimators (i.e. formulas for getting an estimate from the data), they are estimands (i.e. quantities to be estimated). So ITT/LATE are not “comparing” anything; the corresponding estimators are.

      That issue aside, the IV estimator that leads to the LATE really is only comparing the full groups, in a precise sense. This is easiest to see in the “Wald” estimator form of the IV estimator. This is given by

      mean health outcome for winners – mean health outcome for losers
      enrollment rate for winners – enrollment rate for losers

      Both the numerator (which is also the ITT estimate) and the denominator compare only the full groups. The reason an effect for the compliers comes out of this comparison is that the only differences between the two means are for the compliers; everyone else “differences out.”

      It would be nice (and much lower variance) if we could just directly compare the mean for the compliers in the winning group to the mean for the compliers in the losing group. Sadly, in general, there’s no way of separating the compliers from the “never takers” among the lottery losers or the compliers from the “always takers” among the lottery winners. So we’re stuck comparing everyone and taking advantage of this happy differencing property.

      On (3), when I said “seemingly natural,” I was referring to the simple difference in treatment/control means you mentioned in your original post. Those also seem to be the alternative (and invalid) estimators I’ve seen people most tempted to reach for.

      Regarding the definition of bias, the bottom line is that we cannot say an estimator is biased in the abstract. Bias is always defined relative to the target you are considering. For example, we can say that “the estimator comparing treatment and control means in an RCT with imperfect compliance is biased _for_ the average treatment effect”.

      • Thanks. It mostly all makes sense, with some fuzz around a few edges I’m mostly not going to worry about right now. Two questions: (1) Am I right that ITT is always way #1 that I listed? (2) Your “The ATE we obtain from the perfect compliance RCT averages over the ‘Oregon compliers,’ the ‘Oregon always takers,’ and the ‘Oregon never takers'” is confusing to me because you’re starting with a perfect compliance RCT and then bringing in groups of people that deviate from perfection (as is the case in Oregon) and then saying something I can’t follow about averaging over them. So, a general, huh?

        Anyway, aren’t there really six types of people? (i) Compliers randomized to treatment, (ii) always takers randomized to treatment, (iii) never takers randomized to treatment, and (iv-vi) the same three randomized to control. Even in a perfect RCT some of these groups remain, but not all: compliers of both types plus always takers randomized to treatment and never takers randomized to control. Maybe this “averaging over” business you were mentioning above is exactly this. That makes sense.

        I believe all this is in Mostly Harmless Econometrics, but it’s been a few years since I’ve read it. Not an easy book to use as a quick reference!

        • “Even in a perfect RCT some of these groups remain, but not all: compliers of both types plus always takers randomized to treatment and never takers randomized to control.”

          This is incorrect; a perfect RCT has no noncompliance, period. This is distinct from having no *observable* noncompliance. The distinction is over hypothetical re-randomization; in a perfect RCT, no matter how subjects are allocated to treatment and control there will be no crossover. The intervention is perfectly effective on everyone, hence there are only compliers.

        • I’m going to start with your second paragraph. I don’t think I would say there are six types of people; I would say there are three types of people, and each type of person can win or lose the lottery.

          Turning to the second part of the second paragraph, that’s not right. The key thing that I think is causing confusion here (and the key thing I was sloppy about last night) is that being a complier/always taker/never taker is defined with reference to the particular intervention, institutions, and lottery structure under consideration. Someone who falls in one of these bins in one setting may fall in a different one in another setting. With the terminology “Oregon complier/always taker/never taker” I was trying to emphasize that these categories are defined with reference to the structure of the Oregon study.

          As a concrete example of how these labels are context-dependent, consider an alternative lottery structure that is like Oregon except that losing the lottery also costs people their categorical eligibility. Many of the people who are “Oregon always takers” will be compliers with respect to this new lottery structure.

          As another concrete example, we can consider the “perfect RCT” structure in which we forbid lottery losers from ever enrolling in Medicaid and hunt down every last lottery winner and enroll them. As Jared says below, “Oregon compliers”, “Oregon always takers”, and “Oregon never takers” will all be compliers with respect to this structure.

          I suspect that the discussion above may answer (2) from your first paragraph. Tell me if that’s wrong. Regarding (1) from your first paragraph, though, I’m not sure what you’re asking.

          Switching gears, I also realized that there’s one more thing I wanted to say in response to your discussion of TOT in this morning’s original post. TOT is not an estimator; it’s an estimand (i.e. a target of estimation). Per Jared’s comment, it’s the average effect of treatment for a defined group of people, those who get treated. In the Oregon example, the natural definition is that it’s the average treatment effect among the “Oregon compliers” and the “Oregon always takers”.

          PS Mostly Harmless is indeed great on this stuff.

      • “TOT compares treated with untreated.”

        Not in the usual sense. It’s in the name – the effect of treatment *on the treated*.

    • I think one thing that might be missing in this discussion is precise definition of the target population. In fact the study population here is not “all poor people in Oregon”. It is “all poor people in Oregon who are not otherwise eligible for Medicaid”.

      What Matt calls the always takers are people who are categorically eligible for Medicaid (disabled, pregnant, elderly). In fact they receive a slightly different coverage product (Oregon Health Plan Plus) than the lottery population who could be considered an expansion population–poor adults who are not categorically eligible and get Oregon Health Plan Standard with (slightly) higher premiums and co-pays.

      So if this were an RTC the intent would be to study the effect of offering Medicaid to people who aren’t categorically eligible. People who are categorically eligible for Medicaid would not be enrolled in the study at all and would not be randomized. Since this was a real policy implementation rather than an intentional RTC those steps were taken out of order, people were randomized and then screened instead of the more common eligibility screening then randomization.

      Bottom line though: this study does not speak to the impact of Medicaid for disabled or pregnant or elderly people. It looks at the impact for a different slice of the population.