• Biased OLS vs. contaminated IV?

    If you’re in the observational study business, this, by Anirban Basu and Kwun Chan, looks potentially useful:

    In the outcomes research and comparative effectiveness research literature, there are strong cautionary tales on the use of instrumental variables (IVs) that may influence the newly initiated to shun this premier tool for casual inference without properly weighing their advantages. It has been recommended that IV methods should be avoided if the instrument is not econometrically perfect. The fact that IVs can produce better results than naïve regression, even in nonideal circumstances, remains underappreciated. In this paper, we propose a diagnostic criterion and related software that can be used by an applied researcher to determine the plausible superiority of IV over an ordinary least squares (OLS) estimator, which does not address the endogeneity of a covariate in question. Given a reasonable lower bound for the bias arising out of an OLS estimator, the researcher can use our proposed diagnostic tool to confirm whether the IV at hand can produce a better estimate (i.e., with lower mean square error) of the true effect parameter than the OLS, without knowing the true level of contamination in the IV.


    Comments closed
  • Bias and the Oregon Medicaid study

    There’s been some chatter about how the Oregon Medicaid study is or might be biased. That’s worth a post!

    There’s a precise way in which the study is not biased. By design it estimated the effect of Medicaid on those who won the lottery and enrolled, relative to those who lost the lottery and did not. This estimate is unbiased for the contrast between precisely these two groups, but not necessarily for others. In econometric jargon, this is known as the “local average treatment effect” (LATE). The “treatment effect” part of “LATE” is clear, but what’s this “local average” business?

    Sigh. I hate this terminology. It’s supposed to evoke the idea that the instrument (the lottery in this case) doesn’t have a “global” effect on study participants, causing all randomized to Medicaid (lottery winners) to be on and all those randomized to control (lottery losers) to not be. It has a more modest, “localized” effect. The other jargon used for this is that the LATE estimate is an estimate of the effect of treatment on “compliers.” That’s a more meaningful term to me. The compliers are those that do what randomization “tells” them to do, they enroll in Medicaid if randomized to do so and they don’t if not.

    Of course, you can’t expect full compliance in this study (or many other RCTs) because some lottery winners turned out to be ineligible for Medicaid by the time they were permitted to enroll. Some had too high income. Some moved out of state. Some may have found other sources of coverage. (You had to have income below 100% FPL, live in state, and uninsured for 6 months to be permitted to enroll.) Also, enrollment wasn’t mandatory. So, if you just decided it wasn’t worth the trouble or didn’t receive or notice the letter inviting enrollment, you might have missed the window (45 days is all they gave you).

    On the flip side, nobody was preventing lottery losers from enrolling on Medicaid if they became eligible in another way. The study pertained only to the expansion of Medicaid beyond the statutory requirements. If people ended up in one of the eligible categories (aged, blind, disabled, pregnant) they could get on Medicaid.

    So, there was considerable “crossover” (lottery losers enrolling in Medicaid, lottery winners not) or “contamination” or “noncompliance,” all jargon for the same thing. This was not a perfect RCT. Few are.

    What to do? The investigators did two things. First, they considered an “intent-to-treat” (ITT) approach, comparing lottery winners to losers no matter whether they enrolled in Medicaid or not. These results are in their first year paper. I’ve forgotten what they say specifically, though in general they’re much smaller effects than the LATE results. The concern with ITT is that all this crossover biases the results toward zero. There isn’t as much contrast between study arms due to noncompliance.

    Next, the investigators provided LATE estimates, about which I wrote above. These are unbiased for contrast among compliers. In this study, they’re about four times the size of the ITT estimates by virtue of the mathematics (“instrumental variables“) of LATE. But they need not be the same as one would find in the absence of noncompliance. There may be bias in that sense. Why?

    • Hypothesis 1: Those who took the trouble to enroll in Medicaid were sicker than those who didn’t. After all, why enroll if you don’t need it? Remember, even some lottery losers (18.5% of them) enrolled in Medicaid. The LATE estimate removes the effect of them since they are noncompliers. Also, some lottery winners didn’t enroll (most of them didn’t) and the LATE estimate removes their effect too. What’s left under this hypothesis is a comparison of relatively sicker people who did enroll in Medicaid with relatively healthier people who didn’t. The investigators actually found some evidence to suggest that Medicaid enrollees are sicker. Many other studies find that Medicaid enrollees are sicker to the point that some studies find an association of Medicaid with increased mortality. Under hypothesis 1, results are biased downward relative to what they would be under full compliance. Medicaid looks less effective than it might otherwise be. 
    • Hypothesis 2: Those who are more organized, better planners, with higher cognitive function and literacy (including health) skills enroll. It takes some awareness and planning to enroll, so there is some face validity to this argument. I’m aware of no evidence to support it though. (Got any?) Under this hypothesis Medicaid enrollees would do a better job of getting and staying healthy even apart from whatever Medicaid does for them. This would bias results toward showing a larger Medicaid effect than would be true in general (under full compliance).

    There may be other hypothetical sources of bias. The point I’d make about all of them is that we don’t know whether any of these biases actually exist and, if they do, how big an effect they have. It’s all speculation. Still, LATE is an unbiased (and causal) estimate of the effect of Medicaid on compliers. It does filter out some who want to be on Medicaid and can’t enroll (lost lottery, no other route) and filters out some who enroll but weren’t invited (lost lottery but became eligible another way). Some of these noncompliers could be unusually sick. Some noncompliers could be unusually organized and aware. LATE filters some of them out.

    Some might wonder about another type of estimate one could do, the effect of “treatment on the treated.” Here one just compares Medicaid enrollees to non-enrollees, ignoring the lottery draw. Unfortunately, this just exacerbates whatever bias might exist. There is no random assignment at play here. There’s no filtering for selection at all. You get an association, not a causal estimate. This is the problem with many studies of Medicaid and insurance. Randomness is key. The lottery should be exploited in some fashion (either ITT or LATE).

    Lastly, notice how complicated RCT interpretation is? Yes, it’s the gold standard, but it still has issues. Using an IV approach for a LATE estimate is, in my view, about the best you can do. But there may be bias when considering generalizing the findings outside the “local” effect of the instrument (lottery or random assignment). These concerns arise with any IV study. In this sense, IV and RCT are much closer cousins than one tends to think. Disparage one and you disparage the other.

    Not all that’s gold glitters, but it is still valuable.


    Comments closed
  • More instrumental variables studies of cancer treatment

    The study I wrote about earlier this week by Hadley et al. is just one of many to apply instrumental variables (IV) to analysis of cancer treatment (prostate in that case). Zeliadt and colleagues do so as well (also for prostate cancer) and cite several others. Both the Hadley and Zeliadt studies exploit practice pattern variation, specifically differences in prior year(s) rates of treatment across areas, to define IVs.

    For you to buy the results, you have to believe that lagged treatment rates strongly predict actual treatment (this can be shown) and, crucially, are not otherwise correlated outcomes, controlling for observable factors (this mostly requires faith). I would not believe the IVs valid if there were clear, accepted standards about whether and what treatment is best. If that were so, then treatment rates could be correlated with quality, broadly defined. Higher quality care might be expected in areas that follow the accepted standard more closely. Better outcomes could be do to broadly better care, not just to the particular treatment choice.

    However, in prostate cancer, there is no standard about what treatment is best. I accept the IVs as valid in this case.

    Among the other cancer treatment IV studies I found, some of which Zeliadt cites, several also exploit practice pattern variations:

    • Yu-Lao et al.: Again, prostate cancer, and, notably, appearing in JAMA. Yes, JAMA published an IV study based on practice pattern variation. More on why I am excited about that below.
    • Brooks et al.: Breast cancer
    • Earle et al.: Lung cancer

    I cannot say whether practice patterns make for valid IVs for breast and lung cancer at the time the Brooks and Earle studies were published. I’d have to think about it, and I have not. I merely note that exploiting practice pattern variation for IV studies is not novel, though it is not widely accepted either, particularly in medical journals. I think it should be, though only for cases for which a good argument about validity can be made, as I believe it can be for prostate cancer and, I am sure, some other conditions.

    Of course I would prefer to see more randomized controlled trials (RCTs) on all the areas of medicine in need of additional evidence. But those areas are, collectively, a massive territory. We neither have the time nor have we demonstrated a willingness to spend the money required to conduct RCTs in all areas. We have to prioritize. For cases for which IV studies are likely to be reasonably valid, we ought to apply the technique, not necessarily instead of an RCT — though with resource constraints, such an argument could be made — but certainly in advance of one.

    IV studies are cheaper, faster, and offer other advantages. They don’t require enrollment of patients. They can exploit the large, secondary data sets already in existence (Medicare, Medicaid, VA, commercial payers, hospital systems, and the like). As such, they permit stratification by key patient demographics that RCTs are often underpowered to support. Even when an RCT is warranted, a good IV study done in advance can help to refine questions and guide hypotheses.

    Given the vast need for evidence that overwhelms our capacity to provide it via RCTs, there isn’t a good argument for not doing IV studies in cases for which they justifiably valid. However, part of the package of scaling up an IV research agenda is publishing the findings in top journals — not just health economics journals, but also top medical journals like JAMA. This will require more clinical reviewers of manuscripts to gain comfort with the IV approach (start here). It will also require medical journals to solicit reviews by those who can vouch for instruments’ validity or point out when they are unlikely to be so.

    It’s hard and expensive to create purposeful randomness, as is required in an RCT. Yet, there is so much natural randomness around. We should be exploiting it. Good quasi-randomness is a terrible thing to waste.


    Comments closed
  • What drives choice of prostate cancer treatment?

    I thought I had blogged on this paper before, but I can’t find a prior post. So, here are some quotes and brief comments on Zeliadt, S. B., Ramsey, S. D., Penson, D. F., Hall, I. J., Ekwueme, D. U., Stroud, L., & Lee, J. W. (2006). Why do men choose one treatment over another? Cancer, 106(9), 1865-1874.

    In the largest study we reviewed, which involved 1000 patients, approximately 42% of patients defined an effective treatment as one that extended expected survival or delayed disease progression, whereas 45% indicated that effectiveness meant preservation of quality of life (QOL).5 This is in contrast to physicians, 90% of whom defined effectiveness as extending expected survival. In another study, fewer than 20% of patients ranked either “effect of treatment on length of life” or “chances of dying of cancer” as 1 of the 4 most important factors in making a decision.26 In 1 study of health state preferences, 2 of 5 men were unconditionally willing to risk side effects for any potential gain in life expectancy.64 These studies suggest that there is substantial variation in the significance that patients place on cancer eradication, and that treatment efficacy means more than “control” of the tumor for many patients.

    Concerns regarding cancer eradication appear to correlate directly with aggressiveness of therapy, with radical prostatectomy being the choice preferred by the majority of patients who focus on cancer control.

    So, concerns about cancer relate to treatment choice. To the extent that those attitudinal factors also relate to outcomes (e.g., through their relationship with care for other conditions), they are good candidates for unobserved factors that, in part, explain the difference between results of instrumental variables (or RCT) and other observational study techniques.

    Side effects like incontinence and impotence are frequently cited concerns, as reported in the paper. However,

    To our knowledge, there is limited information available regarding how men balance side effects in making their treatment decision. For example, although preservation of sexual function was rated as very important by 90% of men age younger than 60 years, and 79% of men age 75 years and older, in a separate question only 3% of these same men indicated that “having few side effects” was the most important consideration in initiating therapy.5 Fear of side effects was also stated by only 3% of men in a study in North Carolina, in which the majority of patients were black.8 Srirangam et al.45 reported that although 55% of spouses reported that side effects were important, only 6% indicated that side effects were deciding factors. One study comparing surgery and brachytherapy reported that 25% of patients chose between these 2 options based on the side effect profile.9 In addition, although Holmboe and Concato10 found that 49% of patients were concerned with incontinence and 38% were concerned with impotence, only 13% reported weighing the risks and benefits of treatment. These studies demonstrate the apparent disconnect between patients’ stated importance of side effects and the role that they actually play in reaching the final treatment decision.

    Ultimately, it’s what patients actually do, not what they say, that matters. Therefore, side effects may be less of a relevant factor in treatment decisions than is commonly believed. Put another way, that something is a concern doesn’t imply that it changes one’s decision. That does not take away from the fact that concerns are psychologically important.

    Less frequently discussed are concerns about other potential complications.

    Fear of surgical complications was emphasized by some men who selected watchful waiting. 7 A different study found that complications due to surgery were of concern to 12% of patients when considering surgery.3 A belief that radiation is harmful rather than therapeutic was offered by some men who selected surgery.44 When considering radiation therapy, 21% of men indicated concern about skin burns.3 Long recovery times were cited by 17% of patients.10 For a small percentage of men, issues such as fear of surgery or radiation appeared to be the primary factor in their decision regarding treatment.

    One reason complications and side effects may play a relatively small role in treatment decisions is that physicians are playing a large role in influencing those decisions.

    The role of the physician recommendation has received considerable attention in prostate cancer decision making due to the widely recognized preferences held by each physician specialty. As might be expected, opinions regarding the optimal treatment for localized prostate cancer vary among urologists, radiation oncologists, oncologists, and general practitioners. Urologists nearly universally indicate that surgery is the optimal treatment strategy, and radiation oncologists similarly indicate that radiation therapy is optimal.78

    To the extent that treatment choice is driven by physicians and is otherwise unrelated to outcomes, it suggests an opportunity for a valid instrument. This is what Hadley, et al. exploited.

    The paper continues with an exploration of the role of family members, race, socioeconomic, and psychological factors in treatment choice. Some of these (e.g., family relationships, socioeconomic factors, psychological factors) are likely to be incompletely observed and are, therefore, additional possible reasons why instrumental variables and RCT results differ from naive, observational studies. They key is that they may also be related to outcomes. It’s not too hard to imagine they could be.


    Comments closed
  • Methods for comparative effectiveness of prostate cancer treatments

    A research notebook entry on an important paper follows. I’ve left out quite a bit that is more tutorial. So, the paper is more accessible than it may seem.

    All quotes from Hadley, J., Yabroff, K. R., Barrett, M. J., Penson, D. F., Saigal, C. S., & Potosky, A. L. (2010). Comparative effectiveness of prostate cancer treatments: evaluating statistical adjustments for confounding in observational data. Journal of the National Cancer Institute, 102(23), 1780-1793:

    We selected 14,302 early-stage prostate cancer patients who were aged 66–74 years and had been treated with radical prostatectomy or conservative management from linked Surveillance, Epidemiology, and End Results–Medicare data from January 1, 1995, through December 31, 2003. Eligibility criteria were similar to those from a clinical trial used to benchmark our analyses. Survival was measured through December 31, 2007, by use of Cox proportional hazards models. We compared results from the benchmark trial with results from models with observational data by use of traditional multivariable survival analysis, propensity score adjustment, and instrumental variable analysis.

    This is an important exercise. Here’s why:

    The randomized controlled trial is considered the most valid methodology for assessing treatments’ efficacy. However, randomized controlled trials are costly, time consuming, and frequently not feasible because of ethical constraints. Moreover, some randomized controlled trial results have limited generalizability because of differences between randomized controlled trial study populations, who may be screened for eligibility on the basis of age and comorbidities, and community populations, who are likely to be much more heterogeneous with regard to health conditions and socioeconomic characteristics.

    To this, add that RCTs are often under-powered for stratification by key patient characteristics. This is where observational studies shine. Of course, biased selection is the principal concern with observational studies.

    Patient selection into specific treatments is an important consideration in all observational studies, but particularly for those in prostate cancer, because incidence is highest in the elderly who are also most likely to have multiple comorbidities.

    Observational study techniques are not equivalent in their ability to address the selection problem.

    Observational studies (1,11–13) have previously used traditional regression and propensity score methods to evaluate associations between specific prostate cancer treatments with survival. In these studies, the propensity score methods did not completely balance (ie, equalize) important patient characteristics such as tumor grade, size, and comorbidities across treatment groups. Furthermore, patients who received active treatment had better survival for noncancer causes of death than patients who received conservative management, indicating that unobserved differences between groups affected both treatment choice and survival.

    Instrumental variable analysis is a statistical technique that uses an exogenous variable (or variables), referred to as an “instrument,” that is hypothesized to affect treatment choice but not to be related to the health outcome (14–17). Variations in treatment that result from variations in the value of the instrument are considered to be analogous to variations that result from randomization and so address both observed and unobserved confounding. Instrumental variable analysis has been used with observational data to investigate clinical treatment effects among patients with breast cancer (18–20), lung cancer (21), or prostate cancer (5,22).

    The study findings support the use of instrumental variables.

    Propensity score adjustments resulted in similar patient characteristics across treatment groups, and survival was similar to that of traditional multivariable survival analyses. The instrumental variable approach, which theoretically equalizes both observed and unobserved patient characteristics across treatment groups, differed from multivariable and propensity score results but were consistent with findings from a subset of elderly patient with early-stage disease in the randomized trial.

    The authors’ preferred instrument captures practice pattern variation.

    We constructed the primary instrumental variable for treatment received by use of a two-step process. First, we used the entire dataset (n = 17,815) to estimate the probability of receiving conservative management as a function of patients’ clinical characteristics (tumor stage and grade, NCI comorbidity index, and Medicare reimbursements for medical care in the previous year), demographics (age, race, ethnicity, and marital status), year of diagnosis, and all possible interactions among these variables. Second, we calculated the difference between the actual proportion of patients receiving conservative management and the average predicted probability of receiving conservative management (generated from the logistic regression model) in each hospital referral region by year. Areas with relatively large positive differences between the actual and predicted proportions of patients receiving conservative management favor a conservative management treatment pattern, and areas with large negative differences between the actual and predicted proportions of patients receiving conservative management favor a radical prostatectomy treatment pattern. We then lagged this measure of the local area treatment pattern by 1 year and linked it to each patient in the analysis to enhance the instrument’s independence from patients’ current health and unobserved characteristics.

    Statistical methods:

    Treatment propensity (ie, the predicted probability of receiving conservative management) for the propensity score analysis and for constructing the lagged area treatment pattern for the instrumental variable analysis was estimated with logistic regression. The survival models were estimated with Cox proportional hazard models. Visual inspection of the parallelism of the Kaplan–Meier plots of the logarithms of the estimated cumulative survival models by treatment supported the proportionality assumption. The instrumental variable version of the Cox hazard model was estimating with the two-stage residual inclusion method (38), which has been shown to be appropriate for nonlinear outcome models. [...]

    [The instrument's] independence of the survival outcomes was confirmed by its lack of statistical significance as an independent variable in an alternative version (data not shown) of the Cox survival models.

    One acknowledged limitation, among many, is that PSA values were not available to the researchers. Another is that

    a complete statistical assessment of the Cox hazard model’s proportionality assumption indicated that the effects of some covariates may not be time invariant, especially in the analysis of all-cause mortality. Although a sensitivity analysis of the effects of allowing time-varying covariates did not alter the principle findings with regard to treatment effects, further analysis of time-varying effects may be warranted.

    All in all, a very nice paper. It’s worth a full read by observational researchers.


    Comments closed
  • Good quasi-randomness is hard to find

    What now seems like ages ago (but was only in early April), Joe Doyle and colleagues published an NBER paper finding, in their words, that “higher-cost hospitals have significantly lower one-year mortality rates compared to lower-cost hospitals.” The paper has already been discussed in the blogosphere. See Sarah Kliff’s first and second postsThom Walsh, and David Dranove, for example.

    Given all that commentary, which you are perfectly capable of reading along with the abstract, there’s little point in me describing or critiquing the results. Instead I’ll focus on some aspects of the methods.

    With acknowledgement of their imperfections, randomized trials are considered the gold standard for causal inference. One of the fundamental problems with studying the spending-outcomes relationship, however, is that we can’t randomize individuals to spending levels or even to hospitals. Instead, we must rely on the data we observe. If we’re clever, we can find something that is almost like randomizing patients to hospitals (or EDs), though. In this regard, Doyle and colleagues were extremely clever.

    We consider two complementary identifcation strategies to exploit variation in ambulance transports. The first uses the fact that in areas served by multiple ambulance companies, the company dispatched to the patient is effectively random due to rotational assignment or even direct competition between simultaneously dispatched competitors. Moreover, we demonstrate that ambulance companies serving the same small geographic area have preferences in the hospital to which they take patients. These facts suggest that the ambulance company dispatched to emergency patients may serve as a random assignment mechanism across local hospitals.

    Our second strategy localizes the “natural randomization” approach adopted by the Dartmouth researchers by exploiting contiguous areas on opposite sides of ambulance service area boundaries in the state of New York. In New York, each state-certified Emergency Medical Service (EMS) provider is assigned to a territory via a certificate of need process where they are allowed to be “first due” for response. Other areas may be entered when that area’s local provider is busy. We obtained the territories for each EMS provider from the New York State Department of Emergency Medical Services, and we couple these data with unique hospital discharge data that identifies each patient’s exact residential address. This combination allows us to compare those living on either side of an ambulance service area boundary. To the extent that these neighbors are similar to one another, the boundary can generate exogenous variation in the hospitals to which these patients are transported.

    If this doesn’t fill you with admiration you’re probably not an economist. In that case, trust me, they have found an exceptionally good source of quasi-randomness in patient assignment.

    Not long after I read this, I noticed this bit in a post by Jordan Rao about a recent paper by Emily Carrier, Marisa Dowling, and Robert Berenson:

    The paper, published in Health Affairs, found hospitals “wooing” EMS workers that service well-off neighborhoods, even sprucing up the rooms where the workers rest and fill out paperwork.

    This is a new phenomenon and, therefore, doesn’t detract from my admiration for Doyle et al.’s work, which focused on the early-to-mid 2000s. I raise the issue of hospitals trying to attract EMS workers from more affluent areas to suggest that in the future, an approach like Doyle et al.’s may have to address this type of thing. To the extent certain hospitals preferentially choose patients (e.g., more affluent ones) by influencing EMS workers, it is possible ambulance transports do not serve as a random assignment of patients to hospitals. What if higher spending hospitals are also the ones that play this game, attracting a more wealthy set of patients? If that were the case, it is likely that there are other unobservable characteristics of those patients that are correlated with outcomes. That would be a source of bias.

    This raises a more general point about the ambulance transport approach. It only addresses demand-side selection. The patients are (quasi-) randomly assigned to hospitals, in a way (potentially) not correlated with hospital spending. But that does not mean there aren’t unobservable (non-random) aspects of hospitals that are correlated with spending and outcomes. The quasi-randomness of ambulance transport does not address this supply-side selection.

    Good quasi-randomness is hard to find. Doyle et al. found some. Still, it doesn’t address every source of bias, nor should anyone expect as much from any study, even randomized experiments.


    Comments closed
  • Overidentification tests

    Last week, in Inquiry, my latest paper with Steve Pizer and Roger Feldman was published. An ungated, working paper version is also available. Note also that I wrote a bit about a portion of it in a prior post, though even that does not describe what the paper is about.  I’ll write more about the results in the paper in another post. If you can’t wait, click through for the abstract. For now, I want to focus on another technical detail, which is likely to interest all of five readers. You know who you are from the title of the post.

    Until fairly recently, my colleagues and I thought overidentification tests of instruments were worth doing. We no longer feel that way. Still, in order to be published, we have little choice but to do them when a reviewer demands them, but we still think they’re not very valuable.

    Though these are typically discussed as tests of excludability, they are, in fact, joint tests of excludability and homogeneity of treatment effects (Angrist 2010). Consequently, instruments that are excludable may be rejected due to local average treatment effects.

    Passing overid tests may convince some reviewers that one’s instruments are excludable from the second stage model, but it shouldn’t. Failing to pass doesn’t prove they are not. This is a rather weak case for their scientific value. Many papers in top economics journals using IV methods do not include overid tests. That’s just fine.

    “Angrist 2010″ is a personal communication with Josh Angrist.



    Comments closed
  • Practice patterns

    From Aspirin, Angioplasty, and Proton Beam Therapy: The Economics of Smarter Health Care Spending by Baicker and Chandra:

    There is also a substantial literature at the provider level showing that practice pattern norms drive similar care for all of the patients that a provider sees, regardless of individual insurance status – so that changes in the incentives applying to a large share of patients (say, Medicare beneficiaries) can drive changes in the care received by all patients (Glied and Zivin 2002; Baker and Corts 1996; Baker 1999; Frank and Zeckhauser 2007).

    This is of interest because to the extent that practice patterns are good predictors of the type of treatment one might receive but not correlated with unobservable factors that drive outcomes, they make good instruments for observational comparative effectiveness studies. This likely sounds like mumbo-jumbo to some readers, but a lot of money could ride on this type of thing.


    Baker, L, and K Corts. 1996. HMO Penetration and the Cost of Health Care: Market Discipline or Market Segmentation? American Economic Review 86 (2):389-394.

    Baker, Laurence C. 1999. Association of Managed Care Market Share and Health Expenditures for Fee-for-Service Medicare Patients. Journal of the American Medical Association 281 (5):432-437.

    Glied, S., and J. Zivin. 2002. How Do Doctors Behave When Some (But Not All) of Their Patients Are In Managed Care? Journal of Health Economics 21:337-353.

    Frank, Richard G., and R.P. Zeckhauser. 2007. Custom Made Versus Ready to Wear Treatments: Behavioral Propensities in Physicians Choices. Journal of Health Economics 26:1101- 1127.


    Comments closed
  • In defense of reduced form models

    This post is for economists in the sense that I freely use jargon and technical concepts of the discipline. If you can’t get past the first sentence, don’t worry about it, and just skip this post. (If you need something good to read, here’s a recommended post from the archives. It’s about Medicare Advantage, circa 2009. Though it is slightly dated, the main concepts remain relevant.)

    A slightly different version of the following appears in a forthcoming paper by me, Roger Feldman, and Steve Pizer. If you need to defend reduced form models using HHIs as a measure of market structure, cite this as:

    Frakt AB, Pizer SD, and Feldman R, “The Effects of Market Structure and Payment Rates on Private Medicare Health Plan Entry,” forthcoming in Inquiry.

    A working paper version is available on SSRN.


    There is an ongoing debate about the strengths and limitations of both structural and reduced form models in empirical industrial organization (IO) (Angrist and Pischke 2010, Nevo and Whinston 2010). I’m not going to take a side in that debate. However, I raise the existence of it to point out that it is by no means settled in the broad economics community that either paradigm is preferred for all applications. In this post I explore the strengths and limitations of both types of models, though largely defend the reduced form approach while acknowledging it is not necessarily ideal in all cases.

    Recent work on health care markets has used both structural and reduced form approaches: Maruyama (2011), Starc (2010), and Lustig (2010) using structural models and Dafny et al. (2009), Danfy (2010),  Schneider et al. (2008), Shen et al. (2010), Moriya et al. (2010), and Bates and Santerre (2008) using reduced form models. Structural models of entry have been applied in health care (most recently by Maruyama (2011)) and, for decades, to problems in non-health industries as well (e.g., Berry 1992, Seim 2006). Many of the reduced form models employ the Herfindahl-Hirschman Index (HHI) as an independent variable (Dafny et al. 2009, Bates and Santerre 2008, Schneider et al. 2008, Shen et al. 2010, Moriya et al. 2010). Although these are ad hoc, Gaynor and Town (2011) write that “one can think of them as attempting to capture the impacts of relative bargaining power on price, using buyer and seller HHIs as proxies for bargaining power.”

    Though there are strengths of structural models, which I’ll get to, one limitation is tractability. For some applications, they cannot be applied for this reason. For instance, Mazzeo (2002) examined entry into motel markets by firms endogenously choosing high, medium, or low quality.  This approach requires firms to choose only one product type in each market and it becomes intractable with more than three types.  If one considers, for example, applying such an approach to the Medicare market of health plans, this intractability becomes a barrier. Although that market has only three main types of products a firm could offer — CCPs (coordinated care plans, like HMOs or PPOs), PDPs (stand alone prescription drug plans), and PFFS (private fee-for-service) plans — Mazzeo’s approach cannot be applied because firms may enter with one of seven configurations (CCP only, PDP only, PFFS only, CCP-PDP, CCP-PFFS, PFFS-PDP, or CCP-PDP-PFFS).

    A fundamental distinction between the structural models espoused by practitioners of new empirical IO and the reduced form models coming from the structure-conduct-performance (SCP) paradigm is in the type of assumptions required. Structural modelers correctly point out that market structure is endogenous. In entry models, for instance, concurrent market structure is, in essence, the dependent variable. As an alternative to including market structure (even if lagged and/or instrumented) as an independent variable, structural models instead impose assumptions about the nature of competition between firms. Evaluating the validity of those assumptions requires a substantial research program, sometimes including access to exogenous data on markups (Nevo 2001).

    The reduced form SCP approach, on the other hand, permits one to be agnostic about the underlying game and, thereby, to avoid any game-theoretic assumptions (Gaynor and Town 2011). Naturally, the trade off is that one is not estimating fundamental parameters associated with a game. In addition, one must justify the use of HHI on the right-hand side. In particular, one must defend one’s instruments for HHI. However, this is not qualitatively different from a problem faced in structural models that include (instrumented) price as an independent variable. In both contexts, instruments must be defended as reflective of exogenous factors and excludable from the second stage.

    When valid instruments for HHI can be found, they may reflect unobserved factors affecting market structure. For example, in the case of health plan entry such factors might include local marketing organizations or longstanding provider networks.  When those factors shift, they affect HHI and entry. By putting (instrumented) HHI on the right-hand side, one gains some insight into the aggregate effects of those factors.  This is a less precise insight than might be gained from a more detailed structural model, but with the advantages of weaker assumptions and simpler interpretability and econometric methodology. In addition, despite its shortcomings, HHI remains an important market measure for policy. The antitrust agencies still use it to inform their analysis of markets for anticompetitive mergers (DoJ and FTC 2010). Consequently, there are both practical empirical and policy relevancy rationales for use of HHIs in a reduced form framework.

    Further Reading from TIE


    Angrist J and Pischke S. The Credibility Revolution in Empirical Economics: How Better Research Design is Taking the Con out of Econometrics. The Journal of Economic Perspectives 2010;24(2).

    Berry S. Estimation of a Model of Entry in the Airline Industry. Econometrica 1992;60(4); 889-97.

    Bates L, Santerre R. Do health insurers possess monopsony power? International Journal of Health Care Finance and Economics 2008;8; 1-11.

    Dafny L, Duggan M, Ramanarayanan S. Paying a Premium on Your Premium? Consolidation in the U.S. Health Insurance Industry. National Bureau of Economic Research Working Paper No. 15434; 2009; October.

    Dafny L. Are health insurance markets competitive? American Economic Review 2010;100;1399-1431.

    Gaynor M and Town R. Competition in Health Care Markets. Chapter for the Handbook of Health Economics, Volume 2. T. McGuire, M.V. Pauly, and P. Pita Barros, Editors 2011.

    Lustig J. Measuring welfare losses from adverse selection and imperfect competition in privatized Medicare. Unpublished manuscript, Boston University 2010.

    Maruyama S. Socially Optimal Subsidies for Entry: The Case of Medicare Payments to HMOs. International Economic Review 2011;52(1).

    Mazzeo M. Product Choice and Oligopoly Market Structure. RAND Journal of Economics 2002;33(2); 221–42.

    Moriya A, Vogt W, and Gaynor M. Hospital prices and market structure in the hospital and insurance industries. Health Economics, Policy and Law 2010;5; 459-479.

    Nevo A. Measuring Market Power in the Ready-to-Eat Cereal Industry. Econometrica 2001;69(2); 307-342.

    Nevo A and Whinston M. Taking the dogma out of econometrics: Structural modeling and credible inference. Journal of Economic Perspectives 2010;24(2).

    Schneider J, Li P, Klepser D, Peterson N, Brown T, and Scheer R. The effect of physician and health plan market concentration on prices in commercial health insurance markets. International Journal of Health Care Finance and Economics 2008;8:13-26.

    Seim K. An Emperical Model of Firm Entry with Endogenous Product-Type Choices. RAND Journal of Economics 2006;37(3); 619-40.

    Shen Y, Wu V, and Melnick G. Trends in hospital cost and revenue, 1994-2005: How are they related to HMO penetration, concentration, and for-profit ownership? Health Services Research 2010;45(1); 42-61.

    Starc A. Insurer pricing and consumer welfare: Evidence from Medigap. Unpublished manuscript, Harvard University 2010.

    U.S. Department of Justice (DoJ) and the Federal Trade Commission (FTC). Horizontal Merger Guidelines. August 19, 2010.

    Comments closed
  • Causality in health services research (a must read)

    A new paper in Health Services Research by Bryan Dowd, Separated at Birth: Statisticians, Social Scientists, and Causality in Health Services Research, is a must read for anyone wishing a deeper understanding of estimation and experimental methods for causal inference. Here’s the abstract:

    Health services research is a field of study that brings together experts from a wide variety of academic disciplines. It also is a field that places a high priority on empirical analysis. Many of the questions posed by health services researchers involve the effects of treatments, patient and provider characteristics, and policy interventions on outcomes of interest. These are causal questions. Yet many health services researchers have been trained in disciplines that are reluctant to use the language of causality, and the approaches to causal questions are discipline specific, often with little overlap. How did this situation arise? This paper traces the roots of the division and some recent attempts to remedy the situation.

    The paper is followed by commentaries by Judea Pearl and A. James O’Malley.

    There are too many excellent paragraphs and points to quote, and I really want you to read the paper. (I’m looking into whether an ungated version can be made available. It’s unlikely, but I’ll try.) Here are just a few of my favorite passages from the introduction and conclusion, strung together:

    Determining whether changing the value of one variable actually has a casual effect on another variable certainly is one of the most important questions faced by the human race. In addition to our common, everyday experiences, all of the work done in the natural and social sciences relies on our ability to learn about causal relationships. [...]

    It is not unusual for a health services research study section (the group of experts who review research proposals and make funding recommendations) to include analysts who maintain that only randomized control trials (RCTs) yield valid causal inference, sitting beside analysts who have never randomized anything to anything. Two analysts debating the virtues of instrumental variables (IV) versus parametric sample selection models might be sitting next to analysts who never have heard of two-stage least squares.

    Academic disciplines routinely take different approaches to the same question, but it is troubling when approaches to the same problem are heterogeneous across departments and homogeneous within departments and remain so for decades——suggesting an unhealthy degree of intellectual balkanization within the modern research university. It is one thing to disagree with your colleagues on topics of common interest. It is another thing to have no idea what they are talking about. [...]

    The challenge for health services research and the health care system in general is to contemplate the physician’s decision problem as she sits across the table from her patient. On what evidence will her treatment decisions be based? A similar case she treated 5 years ago? Results from an RCT only? What if there are not any RCT results or the RCT involved a substantially different form of the treatment applied to patients substantially different from the one sitting across the table? What if the results came froman observational study, but the conditions required for the estimation approach were not fully satisfied?

    Between the introduction and conclusion is the history of methods for causal inference and how they relate and diverged. Many points are ones I’ve made on this blog. But Dowd is far more expert than I in many respects and illuminates nuances I’ll probably never approach in a post.

    Comments closed