*After posting and editing the following, I realized that I should promote (again) Mostly Harmless Econometrics by Angrist and Pischke. It covers in great detail issues raised below, and so much more. I’ve clearly forgotten some of its contents. I could not easily find answers to my questions below in the time available this morning. So, let’s crowd source them. *

This is why I blog and in the style I do so. After some back-and-forth in the comments to my post on bias yesterday (go read those comments), Matt offers more precise terminology — distinguishing bias from internal and external validity — within a differently organized discussion. I like what he’s done. My comments and questions are interleaved with his. Let’s keep discussing this!

(1) What internally valid estimates can we obtain from the Oregon Study?

We can obtain the effect of winning the lottery (ITT) and the effect on the population that gained insurance due to winning the lottery (LATE).

We cannot obtain the ATE [average treatment effect] or the TOT [effect of treatment on the treated]; the seemingly natural estimators of these quantities are biased since the populations we are comparing differ due to self-selection. The ITT and LATE avoid this problem because they scrupulously _solely_ compare the full group of lottery winners to the full group of lottery losers.

I agree that LATE *exploits* the lottery, but does it really compare the full groups of winners to losers? My understanding is it compares the two groups of compliers, as I wrote. That’s the difference between ITT and LATE.

(2) What internally valid estimates can we obtain from alternative study designs? How do they differ?

From a perfect compliance RCT, we can estimate the ATE for the study population. Relative to the group covered by the LATE, the group covered by the ATE also includes: (A) the types of people who still enroll if they lose; and (B) the people who will not enroll even if they win.

Point of clarification: With perfect compliance, there are no people who still enroll if they lose. There are no people who do not enroll even if they win. However, those groups exist in an RCT without full compliance and, as I wrote above, LATE filters our their effect. Under full compliance LATE is the same as ATE is the same as ITT. The way I’d put this is not that ATE includes these noncompliant groups. I’d say that the ITT and LATE estimates are the ATE in a fully-compliant RCT. They are not in an RCT without full compliance. Continuing with Matt’s section (2):

From a Oregon-like study in which we forbid enrollment by lottery losers, we can obtain the TOT. Relative to the group covered by the LATE, the group covered by the TOT adds (A) from above but not (B).

TOT compares treated with untreated. I can think of three ways to do this, and I’m not certain which one we call TOT. Way 1 compares all treated to untreated, regardless of random assignment. Way 2 does so only for those assigned to treatment. Way 3 does so only for those assigned to control. Which one is TOT? Why does it incorporate a group like (A) but not (B)? The only version of the three versions of TOT I suggested that is consistent with that is way 3. But way 3 is the one that sounds least likely to be what one means by TOT. (I’d order it as way 1 > way 2 > way 3.) Still continuing under his section (2):

The LATE/ATE/TOT difference is not about bias. Each average treatment effect is perfectly valid for the population it pertains to; those populations are just different.

I agree that the differences among these types of estimates have nothing to do with bias. However, Matt wrote early in his comment (way above), “the *seemingly natural* estimators of these quantities are biased since the populations we are comparing differ due to self-selection.” I think we need to explore this more. What does “seemingly natural” mean? It appears to be doing a lot of work here. When do we say we have obtained a biased estimate? Can we give a precise example? Does it merely mean we haven’t really computed one of them properly?

(3) Which estimates have the greatest external validity for the policy questions of current interest?

This is a hard question. The answer depends on whether the group affected by our proposed policy looks more like the group included in the LATE or the full population. How do we think about that?

Insert your existing discussion of this point.

Matt is either referring to my comments to my post or the post itself. Either way, I presume you’ve read them.