• Medicaid and saving babies

      0 comments

    As mentioned at the end of my prior post in the Medicaid-IV series Janet Currie and Jon Gruber published a 1996 paper on the effect of Medicaid expansion on infant mortality and birth weight. Here’s the abstract:

    A key question for health care reform in the United States is whether expanded health insurance eligibility will lead to improvements in health outcomes. We address this question in the context of the dramatic changes in Medicaid eligibility for pregnant women that took place between 1979 and 1992. We build a detailed simulation model of each state’s Medicaid policy during this era and use this model to estimate (1) the effect of changes in the rules on the fraction of women eligible for Medicaid coverage in the event of pregnancy and (2) the effect of Medicaid eligibility changes on birth outcomes in aggregate Vital Statistics data. We have three main findings. First, the changes did dramatically increase the Medicaid eligibility of pregnant women, but did so at quite differential rates across the states. Second, the changes lowered the incidence of infant mortality and low birth weight; we estimate that the 30-percentage-point increase in eligibility among 15-44-year-old women was associated with a decrease in infant mortality of 8.5 percent. Third, earlier, targeted changes in Medicaid eligibility, which were restricted to specific low-income groups, had much larger effects on birth outcomes than broader expansions of eligibility to women with higher income levels. We suggest that the source of this difference is the much lower take-up of Medicaid coverage by individuals who became eligible under the broader eligibility changes. Even the targeted changes cost the Medicaid program $840,000 per infant life saved, however, raising important issues of cost effectiveness.

    This study shares the same methodological approach, and many of the strengths and weaknesses of the Currie and Gruber paper I reviewed previously. So, I’m not going to repeat myself. There is one element of this study worth emphasizing, however. As stated in the abstract, the authors examined two types Medicaid expansions in the 1980s, one targeted and one broad.

    The targeted expansions were essentially modest changes to Medicaid eligibility around the edges of the program’s ties to AFDC (I’m obviously grossly simplifying). The broad expansion began in 1987 and liberalized the income cutoffs for pregnant women. By 1990 all states were required to cover pregnant women with incomes up to 133% of poverty and had the option of extending coverage up to 185% of poverty with federal matching funds.

    Results of the study differ across the two types of expansions. The targeted expansion had much stronger effects:

    [W]e find that a 30-percentage-point increase in eligibility under targeted programs would have been associated with a highly significant 7.8 percent decline in the incidence of low birth weight; a similar increase in eligibility under the broad programs would have decreased the incidence of low birth weight by only 0.2 percent. Similarly, a 30-percentage-point increase in targeted eligibility would have been associated with an 11.5 percent decline in infant mortality, compared to a 2.9 percent decline under the broad policy changes.

    The authors attribute this difference in outcomes across type of expansion to different rates of take-up. Lower take-up under the broad expansion attenuated its effect. To the extent that these findings can be generalized, they would seem to suggest that the broad Medicaid expansion under the ACA will have relatively small effects on health. However, the ACA’s expansion comes with an individual mandate, so take-up should occur at a much higher rate than under the broad expansions in the 1980s.

    • Twitter
    • Facebook
    • Digg
    • Delicious
    • Google Buzz
    • Yahoo Buzz
    • StumbleUpon
    • Share/Bookmark
  • Medicaid and child health

      0 comments

    Next up in my “Medicaid-IV” series–in which I’m reviewing papers that use instrumental variables techniques to estimate the effects of Medicaid on health outcomes–is the widely-cited 1996 Quarterly Journal of Economics paper by Currie and Gruber on Medicaid and child health (link to ungated version).

    Not surprisingly, the authors do a superb job of explaining their approach and interpreting their results. So, I’m going to liberally quote from the paper. Let’s start with the abstract just to get an overview, then I’ll hit some important issues not fully revealed by such a brief summary.

    We study the effect of public insurance for children on their utilization of medical care and health outcomes by exploiting recent expansions of the Medicaid program to low-income children. These expansions doubled the fraction of children eligible for Medicaid between 1984 and 1992. … [E]ligibility for Medicaid significantly increased the utilization of medical care, particularly care delivered in physicians’ offices. Increased eligibility was also associated with a sizable and significant reduction in child mortality.

    By “exploiting recent expansions of the Medicaid program” the authors mean they use state-year variations in those expansions to construct an instrument that is not correlated with individual characteristics but is correlated with Medicaid eligibility, and therefore with Medicaid enrollment. The instrument and how it works are mind-benders (I didn’t get it upon first encounter). It’s the average Medicaid eligibility rate under each of the year-state Medicaid rules where the average is computed over a year-but-not-state-varying population of kids. (I know that’s hard to grok. I could spend a whole post explaining it further, but I won’t. You’ll have to trust me that it is a valid instrument and has become standard technique for instrumenting for Medicaid status. Or you can read the paper. This is advanced material!)

    A good question is, “Why the focus on kids?” Currie and Gruber have a great answer:

    A potential problem with utilization measures, however, is that they confound access and morbidity. For example, the Medicaid expansions may have increased access to hospitals, but at the same time they could have increased the use of preventive care, improving health status and reducing the demand for hospital care. One way to surmount this problem is to focus on utilization that is explicitly preventative, and therefore unaffected by morbidity. Pediatric guidelines recommend at least one doctor’s visit per year for most children in our sample, so that the absence of a doctor’s visit in the previous year is suggestive of a true access problem, regardless of underlying morbidity.

    The results are well-summarized by the abstract quoted above, but I want to highlight a few things. The authors find that Medicaid eligibility cuts the probability in half that a child will go a year without seeing a physician in any setting. Much of this is due to increased visits to doctors’ offices. They also find that the 15.1 percentage point rise in Medicaid eligibility during the study period reduced child mortality by 5.1 percent. Their sub-analysis of mortality is sharp:

    If Medicaid eligibility reduces deaths by improving the utilization of care, then we would expect deaths due to “internal causes” (such as disease) to fall more than deaths due to “external causes” (such as accidents, homicides, suicides, and other external causes). [The results] show that this is indeed the case: increases in eligibility are correlated with a significant reduction in deaths due to internal causes, but have no significant effect on deaths due to external causes.

    I’ve saved the most puzzling finding for last, Medicaid eligibility was found to increase hospital visits. That sounds bad, and maybe it is. But the mechanism could be benign, as the authors explain.

    [H]ospitals may be better equipped to assist patients in claiming benefits. Potential eligibles for Medicaid must complete lengthy and complex application forms, provide extensive documentation (such as birth certificates, pay stubs, and confirmation of child care costs), and attend several interviews with caseworkers. … In response, many hospitals have established special offices, or contract with private companies, to assist Medicaid eligibles in completing these procedures. … The nontrivial costs of providing these services may be beyond the means of private doctors and clinics, leading them to recommend that potential eligibles seek care in a hospital setting.

    Before closing, it is worth noting two things. One, the control variables in the regressions do not include health status. That’s important since health status could be an outcome of Medicaid enrollment. (Inclusion of an outcome as a control variable leads to bias.) Second, as the authors point out, Medicaid expansions have two effects. They encourage additional Medicaid enrollment and discourage private coverage. Some new Medicaid enrollees had been privately insured, am effect known as “crowding out.” The estimates include all effects of Medicaid expansion on outcomes, including that due to crowding out, but do not distinguish among them.

    Finally, there is a question of generality of the findings. This is a study of Medicaid expansions that targeted children about 20 years ago. The ACA’s Medicaid expansion is far broader and occurring in four years from now. Can one generalize the findings of Currie and Gruber to other populations and eras? It’s hard to say. The authors published another paper that used the same techniques and focussed on the effect of Medicaid expansions for pregnant women, finding they lowered infant mortality and increased birth weight. So, the positive effects of Medicaid expansions on outcomes apply to more than one population, which strengthens claims of generality.

    • Twitter
    • Facebook
    • Digg
    • Delicious
    • Google Buzz
    • Yahoo Buzz
    • StumbleUpon
    • Share/Bookmark
  • Insurance and mortality for HIV patients (Medicaid IV)

      4 comments

    An individual’s health status affects Medicaid enrollment (the ill are more likely to enroll). Medicaid enrollment affects an individual’s health status too (one can argue about which way, for the better or worse). The two are simultaneous. That makes inferring the causal effect of Medicaid on health outcomes difficult.

    A few weeks ago I described the right way to tease out the causal effect of Medicaid enrollment on health outcomes:

    There are undoubtedly studies that consider Medicaid vs. uninsured outcomes using the random variations provided by the natural experiment that is Medicaid. Characteristics of the program vary by state and year, making it a perfect set-up for such an analysis of this issue. This second I can’t point to a study. But I know where to look.

    I’ve started to look and will begin to describe the relevant literature as I read the papers. I’m not going to filter or cherry pick papers based on their findings. All that matters to me is the quality of the methods applied. Feel free to send me links to papers you think qualify (look for peer-reviewed, natural or randomized experiments and/or instrumental variables approaches; the run-of-the-mill observational study that controls for observable individual characteristics won’t do). There may be many posts in this series of paper reviews. They’ll all be under the “Medicaid-IV” tag. When I think I’ve summarized them all, I’ll post a conclusion that reports on the full body of evidence.

    Below I’ll discuss a 2001 paper in the Journal of the American Statistical Association by Dana Goldman et al., Effect of Insurance on Mortality in an HIV-Positive Population in Care. Before I get to the paper, just in case it isn’t clear, by exploiting the variations in state-year Medicaid eligibility I’m talking about instrumental variables (IV) analysis, about which I’ve written considerably.* The sense in which those variations are random is that an individual’s characteristics cannot affect them. As far as an individual is concerned, the Medicaid policy in effect in his state and at a particular time is random. But Medicaid policy does affect Medicaid enrollment (it affects private enrollment too), so it can be exploited to infer the causal effect of Medicaid (or insurance in general) on health outcomes free of the confounding effects of health on Medicaid.

    Goldman and colleagues do just that using a nationally representative cohort of HIV-infected persons and sound IV methods. The abstract summarizes the highlights. It’s a bit of dense reading so if you wish to skip it just trust me that it communicates that the authors are following standard techniques for causal inference:

    A naïve single-equation model confirms the perverse result found by others in the literature—that insurance increases the probability of death for HIV+ patients. We attribute this finding to a correlation between unobserved health status and insurance status in the mortality equation for two reasons. First, the eligibility rules for Medicaid and Medicare require HIV+ patients to demonstrate a disability, almost always defined as advanced disease, to qualify. Second, if unobserved health status is the cause of the positive correlation, then including measures of HIV+ disease as controls should mitigate the effect. Including measures of immune function (CD4 lymphocyte counts) reduces the effect size by approximately 50%, although it does not change sign. To deal with this correlation, we develop a two-equation parametric model of both insurance and mortality. The effect of insurance on mortality is identified through the judicious use of state policy variables as instruments (variables related to insurance status but not mortality, except through insurance). The results from this model indicate that insurance does have a beneficial effect on outcomes, lowering the probability of 6-month mortality by 71% at baseline and 85% at follow-up. The larger effect at followup can be attributed to the recent introduction of effective therapies for HIV infection, which have magnified the returns to insurance for HIV+ patients (as measured by mortality rates). (Bold mine.)

    The reason to read the paper, or the first few pages of it anyway, is to get a sense of how to do Medicaid-health outcome studies properly. Importantly, the authors used arguably exogenous instruments–features of state Medicaid and AIDS drug assistance programs–and subjected them to power and falsification tests, which they passed. One can still argue that the instruments are not valid, but it would require an argument so contorted I cannot fathom what it could be.

    The reason to take the study with a couple of big grains of salt is that there are a few potential and actual problems, not least of which is that the results I made bold above are not statistically significant. In that sense, the findings are inconclusive about whether or not insurance reduced mortality for HIV patients.

    A second limitation is that it is not specifically a study of Medicaid. It’s a study of insurance, of any type. The authors lump patients with different types of insurance (public, private) together. That’s a big problem because characteristics of state Medicaid programs affect Medicaid enrollment and private coverage rates, but in opposite directions. It is also possible that Medicaid coverage and private insurance have opposite effects on outcomes. Ultimately, it is hard to draw policy conclusions with a study that mixes the two insurance types. If mortality improves is it due to public or private coverage? It’s impossible to tell. They acknowledge this limitation and correctly describe a more complex model that would separately identify the effects of public and private insurance on mortality. They wrote that such a model was a computational challenge. Today it would not be.

    A final critique is that the preferred model specifications include a measure of disease burden, the lowest ever CD4 count as of the baseline year. To the extent that Medicaid causes poor outcomes (due to, say the poor quality care it could plausibly promote) it is possible that the lowest ever CD4 count is itself an outcome of insurance coverage. It’s a big no-no to include an outcome as a control variable. So, the authors need to make an argument that including lowest ever CD4 count is OK. They didn’t, and I don’t know enough about AIDS to make the argument for them.

    * If you’re already puzzled, stop right here and go read some of my posts on IV and/or Steve Pizer’s tutorial paper. I am not exaggerating by suggesting that anyone who wants to understand research in social science and particularly anyone who is going to interpret that research for a wider audience really ought to take the time to understand the issues pertaining to IV, why it is used, and why many (though not all) observational studies that do not consider and deal with those issues are potentially flawed.

    • Twitter
    • Facebook
    • Digg
    • Delicious
    • Google Buzz
    • Yahoo Buzz
    • StumbleUpon
    • Share/Bookmark
  • Better than Harmless Econometrics

      3 comments

    Josh Angrist and Jörn-Steffen Pischke sent me a copy of their modestly titled book Mostly Harmless Econometrics: An Empiricist’s Companion (let’s call it MHE for short). If the title sounds slightly familiar then you’re probably a Douglass Adams fan–he wrote a Mostly Harmless book too–and you’d be correct in assuming that MHE is not your ordinary econometrics text.

    Angrist and Pischke claim their style has a “certain lack of gravitas.” With an emphasis on the practical and intuitive use of the most common, widely applicable, and relatively simple econometric models they provide a far less intimidating tour than most texts of techniques for the evaluation of social experiments, whether artificially or naturally randomized. Nevertheless, this book has math, more than I cared to study closely on first read, particularly in later chapters covering more advanced material.

    Yet, the writing style is far less stodgy than typical academic texts. The fun begins in Chapter 1 (Questions about Questions), in which Angrist and Pischke write,

    Research questions that cannot be answered by any experiment are FUQs: fundamentally unidentified questions. What exactly does a FUQ look like? …

    Suppose we are interested in whether children do better in school by virtue of having started school [at age 7 instead of 6]. … To be concrete, let’s look at test scores in first grade.

    The problem with this question … is that the group that started school at age 7 is older. And older kids tend to do better on tests, a pure maturation effect. … The problem here is that for students, start age equals current age minus time in school. … [T]he effect of start age on elementary school test scores is impossible to interpret even in a randomized trial, and therefore, in a word, FUQed.

    Putting aside the FUQed, Angrist and Pischke explain the essentials of causal analysis for observational studies, beginning with a gentle introduction to the selection problem and regression in Chapter 2 (The Experimental Ideal). One can gain tremendous insight with little heavy lifting by reading that brief, 12 page chapter alone.

    The real guts of the subject are presented in Chapter 3 (Making Regression Make Sense) and Chapter 4 (Instrumental Variables in Action). Slightly more advanced material is found in the final four chapters, covering fixed effects, differences-in-differences, regression discontinuity, quantile regression, and standard error estimation. I skimmed those final chapters only closely enough to know what’s there, for future reference. My main interest was in improving my understanding of IV basics, for which close reading beyond Chapter 4 is not necessary.

    MHE is not only an econometrics reference and tutorial, it’s also a guide to a subset of the observational study literature that applies sound technique. Every method is motivated and illuminated by reference to or examples from published work. That’s particularly valuable to the publishing practitioner who needs to demonstrate adherence to proven methodology by reference to prior studies.

    Thus, MHE is better than “mostly harmless,” and I recommend it highly, particularly to those who evaluate social programs, clinical trials, or otherwise wish to estimate causal effects from experimental or observational data. Yet I can think of a few, small ways MHE could be enhanced. My least important suggestion is an index of stylized facts. There are a handful of main points that the practitioner should carry around in his head, knowing he can look up the details when necessary. These might include, for example, that propensity scores only control for observable differences between treatment and control groups (pp. 86-87);  the fact that the instrument is independent of potential outcomes is a different idea than an exclusion restriction (p. 155; this, by the way, is a mind-bender and took me some time to appreciate); don’t include an outcome as one of the regressors (pp. 64-68); that non-linear models are very rarely necessary and very often lead to trouble (p. 190); among others.

    One problem with nonlinear models is that they generate biased results with two-stage prediction substitution, a fact Angrist and Pischke discuss in Chapter 4. It deserves to be mentioned, but they didn’t, that one can obtain unbiased estimates of causal effects with nonlinear models using two-stage residual inclusion (2SRI), which is surprisingly simple and easy to implement (Terza, Basu and Rathouz, 2008). This is only important in the small subset of circumstance in which linear models won’t do. One such circumstance arises in my work in which models are put to use for policy simulations. In that case, linear approximations that don’t reproduce crucial nonlinear features of a distribution can be a problem, if only in presentation (which is important).

    I’ll conclude by noting a large issue lurking in the background to which Angrist and Pischke only allude. That’s theory (by which I mean anything outside the data). What’s it for? Can one really conclude causality from data alone? The answer is “no,” but the reason is subtle. The topic almost arises twice, once in a discussion of how to decide whether a control variable is or is not an outcome variable. When one can’t use time to determine what can be the cause of what then “clear reasoning about causal channels requires explicit assumptions about what happened first, or the assertion that none of the control variables are themselves caused by the regressor of interest.” (p. 68) That’s theory folks.

    Later, on page 156 the authors write, “There is nothing in IV formulas to explain why [treatment] affects [outcomes]; for that, you need a theory.” OK then! Theory has a role. In fact, its role is larger than implied by these quotes. I assert that one can’t begin to understand if or when selection on observables or unobservables (or endogeneity in general) might occur without theory. Put it another way, the model one chooses to estimate and the manner in which one does so comes in part from theory, a point stressed by Andrew Gelman in his review of MHE (a review worth reading, by the way).

    In many cases, that theory is our own intuition, not some formal mathematical model. We know something about the world, about what can affect what, that we bring to the data. Without those extra-data notions, we wouldn’t even know what to study or how, let alone how to interpret what we find. I think this is something applied economists should appreciate. The data can reveal the size of causal effects, but only after we have decided what can cause what. Without such ideas, finding potentially valid instruments would be next to impossible. If you don’t believe me, next time you approach your analysis, ask a colleague to rename all your variables v1, v2, v3, etc. (and not provide you with a crosswalk to their actual names). Good luck!

    Later: See also the Mostly Harmless blog.

    References

    Terza JV, Basu A, Rathouz PJ. Two-stage Residual Inclusion Estimation: Addressing Endogeneity in Health Econometric Modeling.  J Health Economics 2008: 27: 531-43.

    • Twitter
    • Facebook
    • Digg
    • Delicious
    • Google Buzz
    • Yahoo Buzz
    • StumbleUpon
    • Share/Bookmark
  • Cellphone Use Skews Polls

      1 comment

    In a post today Nate Silver reminds us just how prevalent cellphone use is and how and why it is skewing polls. He brings the data:

    And he brings this geek-o-licious ‘graph that is co-linear with one of this blog’s current themes:

    The pollsters’ usual defense mechanism against this [under-sampling of cellphone-only households] is to weight their polls by demogrpahics — something which they need to do anyway, since polls are subject to many forms of non-response bias (for instance, it’s harder to get men on the phone then women). But this is potentilly an inadequate response for several reasons. First, some characteristics that correlate with both cellphone usage and political preferences may not correspond to those that are most commonly used to weight polls. It is somewhat rare, for instance, for pollsters to weight their polls by characteristics like urban/rural location or marital status, which are predictive of both cellphone usage and political beliefs. Being cellphone-dependent also appears to be significantly correlated with media consumption habits (in particular, getting more of one’s news from the Internet and less from television), which also seems to be increasingly important in determining one’s political views. And there are some characteristics that may be even more subtle. For instance, there are some hints in the CDC data (such as the higher prevalance of binge drinking) that cellphone-only adults are less “domestic” and more “bohemian”. I suspect that, in young adults, this is correlated with more liberal political views. (Bold mine.)

    By now readers of this blog can sum all this up much more succinctly. Polls are suffering from cellphone-use selection bias. Silver is telling us that attempts to correct it based on observable characteristics are inadequate. Being in the cellphone-only group (“treatment”) and political preferences (“outcome”) are both related to hard to measure factors (“unobservables”). Exclusive cellphone use is endogenous, causing poll results to be biased, even after controlling for observable factors.

    I wish I could say there is an obvious exogenous factor that could be exploited via instrumental variables techniques to address this issue. But I can’t think of anything that affects cellphone use and that is not related to political preferences. It’s awfully hard because it needs to be something that works for local polls (statewide at a minimum), so large scale geographic variations can’t be exploited. I don’t think this is approachable with IV, do you?

    • Twitter
    • Facebook
    • Digg
    • Delicious
    • Google Buzz
    • Yahoo Buzz
    • StumbleUpon
    • Share/Bookmark
  • Instrumental Variables vs. Randomized Trial

      2 comments

    I’ve made the claim that good observational studies of a medical therapy can be as informative as a randomized clinical trial (RCT). By a “good” observational study I mean one that handles the non-random selection of individuals into treatment appropriately, which often means using instrumental variables (IV). (Already lost? Read this.)

    One way to demonstrate that IV studies are comparably informative as RCTs is to show that results obtained either way are similar. Unfortunately, there are not many examples of health care treatments studied via both RCT and IV methods because use of the latter is rare in the field. Nevertheless, there are a few examples. Steve Pizer wrote about one in his tutorial paper on IV technique.

    A clearer focus on comparing methods was provided by Stukel et al. (2007), who used four different methods to assess the effects of cardiac catheterization on elderly patients hospitalized for acute myocardial infarction. … The investigators compared results from randomized trials to estimates from models using multivariate risk adjustment, propensity score risk adjustment, propensity score matching, and instrumental variables estimation featuring the regional cardiac catheterization rate as the identifying instrument. … Multivariable risk adjustment, propensity score risk adjustment, and propensity score matching all produced estimated reductions in mortality risk between 46 and 49 percentage points. Instrumental variables estimates were starkly different at 16 percentage points and compared more favorably to estimates from clinical trials, which ranged from 8 to 21 points. …

    So, the IV estimates were in the middle of the range found by RCTs. Meanwhile, estimates based on methods that can’t control for unobservable factors that affect selection and outcome (risk adjustment, propensity score techniques) produced results well outside the RCTs range. That’s precisely what one would expect if one understands IV and why it is necessary.

    Stukel et al. (2007) comment on an earlier IV study of cardiac catheterization by McClellan, McNeil, and Newhouse (1994) that found a lower reduction in mortality risk using differential distances to alternative types of hospitals as instruments. Results of the two studies are not reported in the same metric so they are not immediately comparable. However, there is sufficient information to make at least an approximate conversion (hint: see the asterisk footnote of Table 5 of Stukel et al. that provides a formula to approximately convert between an absolute mortality difference and a relative mortality rate). Doing so reveals that McClellan, McNeil, and Newhouse report an 8.5% reduction in mortality risk, nearly half that of Stukel et al., though still within the 8-21% range of RCTs.

    Stukel et al. attribute the difference in results between the two IV studies to differences in the degree to which instruments predict treatment, suggesting that the earlier study’s results may be biased downward due to weak instruments. McClellan, McNeil, and Newhouse note that the mortality reduction they find is “achieved during the first day of hospitalization and therefore appears attributable to treatments other than the procedures.” (See also Newhouse and McClellan 1998.)

    IV and RCT results compare favorably in studies of the effects of smoking by pregnant women on their child’s birth weight. Evans and Ringel (1999) use cigarette taxes as an instrument for smoking and find that birth weight is lower by 353-594 grams, depending on model specification. Results from an RCT on prenatal care that included a smoking cessation component puts the figure at 400 grams. Results for indicators of low (< 2,500 grams) and very low (< 1,500 grams) birth weight are also similar between the IV- and RCT-based studies.

    More thorough analysis of randomized vs. observational design results are found outside of health services research. For example, Cook, Shadish, and Wong (2008) compare randomized versus observational results from twelve job training and education program evaluations.

    Of the 12 recent within-study comparisons reviewed here from 10 different research projects … eight of the comparisons produced observational study results that are reasonably close to those of their yoked experiment, and two obtained a close correspondence in some analyses but not others. Only two studies claimed different findings in the experiment and observational study, each involving a particularly weak observational study. Taken as a whole, then, the strong but still imperfect correspondence in causal findings reported here contradicts the monolithic pessimism emerging from past reviews of the within study comparison literature.

    Of the observational studies that did produce results comparable to experimental counterparts, one involved IV and three exploit quasi-randomness akin to that upon which IV relies (regression-discontinuity). The unavoidable conclusion is that observational studies for which sources of exogenous randomness can be identified produce results comparable to those that might be obtained from a randomized controlled experiment.

    References

    Thomas D. Cook, William R. Shadish, Vivian C. Wong. (2008). Three Conditions under Which Experiments and Observational Studies Produce Comparable Causal Estimates: New Findings from Within-Study Comparisons. Journal of Policy Analysis and Management, Volume 27, Issue 4 (p 724-750).

    William N. Evans, Jeanne S. Ringel. Can higher cigarette taxes improve birth outcomes? Journal of Public Economics 72 (1999) 135–154.

    McClellan M, McNeil BJ, Newhouse JP. Does more intensive treatment of acute myocardial infarction in the elderly reduce mortality? analysis using instrumental variables. JAMA. 1994;272:859-866.

    Newhouse JP, McClellan M. Econometrics in outcomes research: the use of instrumental variables. Annu Rev Public Health. 1998;19:17-34.

    Stukel, T.A., Fisher, E.S., Wennberg, D.E., Alter, D.A., Gottlieb, D.J., Vermeulen, M.J.: Analysis of observational studies in the presence of treatment selection bias: effects of invasive cardiac management on AMI survival using propensity score and instrumental variable methods. JAMA 297(3), 278–285 (2007). doi:10.1001/jama.297.3.278

    • Twitter
    • Facebook
    • Digg
    • Delicious
    • Google Buzz
    • Yahoo Buzz
    • StumbleUpon
    • Share/Bookmark
  • Supply, Demand, and the Endogeneity of Prices

      4 comments

    In a comment, steve (not co-blogger Steve) reminded me of a very good post by Scott Sumner that illustrates the endogeneity of prices with respect to quantity. It turns out I had read it, but I’m glad to be reminded of it.

    So what do we know about prices? We know that if the price falls because supply increases, then consumption will increase, and if the price fell because demand fell, then consumption will decrease. In other words we know that if the price (or interest rate or exchange rate) changes, we can predict with 50% confidence that quantity will increase, and 50% confidence that quantity will decrease. So that’s progress, I guess.

    This ambiguity is precisely why Sumner advises to never reason from a price change. The problem is price and quantity are both related to supply and demand factors. Thus, knowing only that prices changed one can’t draw a conclusion about quantity without knowing something about supply or demand. Doing so is like trying to infer reading comprehension from foot size. Both are related to age, among other things. In a word, price is endogenous. Sumner could have just as easily said, “Never reason from a change in an endogenous variable.”

    As Angrist and Krueger described in a Journal of Economic Perspectives paper I summarized recently, the earliest known application of instrumental variables was to address the endogeneity of prices in estimating supply and demand elasticities of flaxseed.

    If the demand and supply curves shift over time, the observed data on quantities and prices reflect a set of equilibrium points on both curves. Consequently, an ordinary least squares regression of quantities on prices fails to identify—that is, trace out—either the supply or demand relationship. P.G. Wright (1928) confronted this issue in the seminal application of instrumental variables: estimating the elasticities of supply and demand for flaxseed, the source of linseed oil. Wright noted the difficulty of obtaining estimates of the elasticities of supply and demand from the relationship between price and quantity alone. He suggested (p. 312), however, that certain “curve shifters”—what we would now call instrumental variables—can be used to address the problem: “Such additional factors may be factors which (A) affect demand conditions without affecting cost conditions or which (B) affect cost conditions without affecting demand conditions.” A variable he used for the demand curve shifter was the price of substitute goods, such as cottonseed, while a variable he used for the supply curve shifter was yield per acre, which can be thought of as primarily determined by the weather. …

    Wright (1928, p. 314) observed: “Success with this method depends on success in discovering factors of the type A and B.” … Wright’s econometric advance went unnoticed by the subsequent  literature. Not until the 1940s were instrumental variables and related methods rediscovered and extended.

    References

    Angrist, Joshua and Alan Kreuger. (2001). “Instrumental Variables and the Search for Identification: From Supply and Demand to Natural Experiments,” Journal of Economic Perspectives, 15(4), 69-85.

    Wright, Phillip G. (1928). The Tariff on Animal and Vegetable Oils. New York: MacMillan.

    • Twitter
    • Facebook
    • Digg
    • Delicious
    • Google Buzz
    • Yahoo Buzz
    • StumbleUpon
    • Share/Bookmark
  • Limitations of Randomized Trials

      3 comments

    A 1998 paper by Joe Newhouse and Mark McClellan in the Annual Review of Public Health is insightful on the limitations of randomized controlled trials (RCTs) of clinical interventions. They note that a trial can only occur in the

    window of time where there is enough belief that a treatment is efficacious so that it is considered ethical to randomize patients to the treatment group but not sufficient belief in the efficacy of the treatment that it would be considered unethical to withhold the treatment.

    Even when a trial can be conducted, its results may lack generality because it is conducted in a setting (major teaching hospital, say) that differs from the norm. That is,

    the trial may demonstrate that a procedure is efficacious (i.e. obtains desired results under optimal conditions), but it will not necessarily show that it is effective (obtains desired results under typical or standard conditions). Or, somewhat related to this point, in the time since the trial was conducted physicians may have become better at performing a procedure, such that the results of the trial are no longer relevant to current practice. (Bold mine.)

    As if that weren’t enough, sometimes the population included in a trial differs from that which will (or is) actually treated generally. Women, children, the elderly, and individuals with comorbidities are among the sub-populations historically excluded from certain types of trials. That is,

    the results of the trial may have internal validity (comparisons between the treatment and control groups are unbiased for the population being studied) but not external validity (results do not necessarily apply to other populations). (Bold mine.)

    For all these reasons, well-designed observational studies can enhance our understanding of treatment outcomes by examining results over a broader setting and population than might be available in a RCT. Moreover, observational studies are less expensive, can be conducted more quickly, and are applicable in cases for which an RCT would be unethical or impractical, due to problems of recruitment, for example. Thus, as Newhouse and McClellan put it, “[t]he results of a well-designed observational study are useful even if the results of a clinical trial are available.”

    • Twitter
    • Facebook
    • Digg
    • Delicious
    • Google Buzz
    • Yahoo Buzz
    • StumbleUpon
    • Share/Bookmark
  • Instrumental Variable Corrected Randomized Trial

      0 comments

    Perhaps you’re of the mind that the only way to learn anything of value is by randomized controlled trial (RCT). I disagree with that position. For some things, RCTs are the right approach. But I think the topics of study that can’t benefit from some sound observational study, perhaps in preparation for a future RCT, are few. And then there are many topics and questions that can’t be studied by RCT due to methodological, ethical, practical, or financial considerations. So, there is plenty of room for observational studies and, given that, one ought to use the best available techniques, including instrumental variables (IV).

    However, even if one only wishes to do RCTs, IV can assist.Very often the perfect randomization contemplated by the researcher is compromised. Some assigned to the treatment group don’t comply. Some assigned to the control group receive treatment. When randomization breaks down things get messy, and it may not be clear what can be learned. Colleagues and I faced this very problem on a study of the Community Nursing Organization Demonstration years ago. We solved it by analyzing the randomized groups using the notion of “intent to treat” (ITT), essentially ignoring the contamination of treatment and control groups, arguing that it is akin to what would happen in the real world anyway. We supplemented the analysis by comparing those treated to a comparison group not involved in the study.

    But we could have done something else, and I wish we had. We could have considered the random assignment as an instrument for actual receipt of treatment. One has to admit, it is a very good instrument. It is highly correlated with treatment/control assignment (since most subjects comply) and it is not related to outcomes (which is the whole point of random assignment).

    The math and statistics of this approach are very straight forward. It’s all explained in a 2006 paper by Angrist in the Journal of Experimental Criminology (and elsewhere). I won’t go into the details. Suffice it to say, even if you love RCTs and only RCTs, sooner or later you’ll come across one for which randomization has failed. In that case, IV can assist. The method is credible, sensible, and sound. Moreover, it fully exploits the beautiful properties of the randomness with which the RCT was designed.

    What one obtains with such IV-corrected RCT analysis is an unbiased estimate of the causal effect of treatment on those whose treatment status was affected by randomization (called “compliers”). This estimate is known as the local average treatment effect (LATE). In the case of an RCT of a therapy that can’t be obtained outside the experiment and in which no individual in the control group received treatment (so that all individuals who received treatment did so due to randomization and would not have otherwise), one can obtain the IV-corrected treatment effect by dividing the ITT treatment effect by the probability of treatment assignment compliance. In this simplified (but common) setting, this calculation also provides the average treatment effect on the treated (ATET). It is clear from this example that the ATET differs from an ITT estimate when compliance with treatment assignment is not perfect (ATET > ITT). Also, in this example, but not in general, the LATE is the ATET.

    The key point is that the IV-corrected estimate is as valid, meaningful, and useful as the ITT estimate. It’s just an answer to a different question. ITT examines the effect of the intervention on a population, including that due to lack of compliance. IV techniques provide an estimate of the effect of treatment on those that comply (LATE). In the case for which compliance is one-sided (no one in the control group received treatment and everyone who was treated was randomized as such), LATE = ATET.

    • Twitter
    • Facebook
    • Digg
    • Delicious
    • Google Buzz
    • Yahoo Buzz
    • StumbleUpon
    • Share/Bookmark
  • IO vs Labor Economics

      0 comments

    The response by Nevo and Whinston to the forthcoming Journal of Economics Perspectives paper by Angrist and Pischke (about which I’ve been writing) is illuminating. It clearly delineates some key differences between industrial organization (IO) and labor economics. Some key passages:

    However, empirical analysis must deal not only with credible inference, but also with what might be called “generalization,” “extrapolation,” or “external validity”. … This is where structural analysis comes in. Structural analysis is not a substitute for credible inference. Quite to the contrary, in general, structural analysis and credible identification are complements. …

    Structural analysis gives us a way to relate observations of responses to changes in the past to predict the responses to different changes in the future.

    It does so in two basic steps. First, it matches observed past behavior with a theoretical model to recover fundamental parameters such as preferences and technology. Then, the theoretical model is used to predict the responses to possible environmental changes, including those that have never happened before, under the assumption that the parameters are unchanged. …

    Empirical work in industrial organization does differ in some striking ways from that in labor (and other fields that emphasize estimation of treatment effects). We have discussed extensively one important difference, the heavier reliance on structural modeling (and greater attention to issues this raises) in industrial organization, but this is not the only difference.

    Empirical papers in industrial organization are also less likely than are papers in labor to focus on pinning down a particular “number”–like an elasticity or a price effect. Many structural papers in industrial organization, for example, are focused on showing that an approach to answering a question is feasible.

    The paper is worth a full read.

    • Twitter
    • Facebook
    • Digg
    • Delicious
    • Google Buzz
    • Yahoo Buzz
    • StumbleUpon
    • Share/Bookmark