Observational Studies of Comparative Effectiveness

This post has been cited in the 13 May 2010 edition of Health Wonk Review and is co-authored by Steve Pizer and Austin Frakt. The ideas are based on Steve’s 2009 paper “An intuitive review of methods for observational studies of comparative effectiveness.”

Making causal inferences in observational studies is more challenging than in randomized experiments. But econometric and statistical techniques have now improved to the point that a knowledgeable practitioner can draw causal conclusions from sound observational research. Though these techniques have already been employed in economics they have not been widely applied or appreciated in health services research. Given their utility and ease of application, that should end.

Randomized clinical trials support causal inferences with relatively uncomplicated statistical methods because they minimize the risk of selection bias. Perfect randomization ensures that treatment and comparison groups do not differ systematically in any dimensions except for receipt of treatment, so differences in outcomes can be cleanly attributed to treatment.

Unfortunately, causal inferences in observational studies are much more difficult to make. The root of the problem is that patients (or their doctors) choose treatment themselves, using information that is often inaccessible to the researcher. Some factors that influence the treatment decision may be observable to the researcher and can be accounted for by including measures of them in the outcome model (e.g. risk adjustment models). Other unobservable factors, if correlated with both the choice of treatment and the outcome, will lead to biased estimates of the treatment effect unless specialized techniques are used. (Note, this post is couched in terms of a clinical study where the effect of treatment is of interest. That’s not critical, however. The techniques described are far more general than the language may suggest. Think of “treatment” as the main independent variable, categorical or continuous. All that’s important is that it and the dependent variable are both correlated with unobservable factors.)

Two techniques for mitigating bias when treatment and outcome are both correlated with unobservables are propensity score matching and instrumental variables (IV) estimation. Propensity score matching is especially useful for small studies where extensive and customized data collection is feasible, minimizing the likelihood of important variables remaining unobserved. IV estimation is superior in cases where important variables are known to be unobserved, as is commonly the case in large studies of administrative data, provided that suitable instruments can be identified. A suitable instrument is a variable that has a strong influence on the likelihood of receiving treatment, but, apart from its effect on treatment, has no direct effect on the outcome.

Local practice pattern variations make natural instruments and have been been successfully applied in several recent health services studies. Stukel and colleagues (2007) assessed the effects of cardiac catheterization on elderly patients hospitalized for acute myocardial infarction. The methodological concern was that patients in poorer health were less likely to receive invasive care, potentially making the effects of treatment look better than they actually were. The investigators used the regional cardiac catheterization rate as the instrument in this study.

Brookhart and Schneeweiss (2007) recommend practice patterns computed at the individual provider level as instruments for observational comparative effectiveness studies. This approach has been used to evaluate changes in risk of gastrointestinal complications and acute myocardial infarction associated with use of Cox-2 inhibitors compared to nonselective NSAIDs (Brookhart, et al. 2006, Schneeweiss, et al. 2006).

Certain conditions must be met for IV estimation to produce valid estimates. First, the instrument(s) must be strongly enough associated with the selection of treatment (Bound, et al 1995). Staiger and Stock (1997) have established statistical tests and threshold values for evaluating the strength of instruments. Second, the instruments should not be associated with the outcome except through their effect on treatment. This condition can only be tested if more than one instrument is available. In such cases overidentifying restrictions tests can be constructed to determine whether it is valid to exclude the instrument from the outcome model (Davidson and MacKinnon 1993). This is a reason to use both geographic practice patterns and individual provider-level practice patterns as instruments where possible.

IV estimation and associated statistical tests can be performed easily with most statistical software packages (e.g. Stata) when outcomes are modeled by ordinary least squares regression. When outcomes are modeled by nonlinear functions like logistic regression or duration models then IV estimates are more difficult to obtain. However, Terza, Basu and Rathouz (2008) derived consistent two-step instrumental variables estimators for use in these situations.

In general, the statistical and computational barriers to IV estimation are low, and the chief challenge is conceptual. Finding good instruments can be difficult and take some creativity. When they are found, application of appropriate technique leads to valid causal inference. Thus, given the cost and challenges of randomized trials, IV estimation is a valuable tool for comparative effectiveness research.


Pizer SD. An intuitive review of methods for observational studies of comparative effectiveness. Health Serv Outcomes Res Method 2009; 9:54–68.

Stukel TA, et al. Analysis of observational studies in the presence of treatment selection bias: effects of invasive cardiac management on AMI survival using propensity score and instrumental variable methods. JAMA 2007; 297(3): 278-85.

Brookhart MA, Schneeweiss S. Preference-based instrumental variable methods for the estimation of treatment effects: Assessing validity & interpreting results. International J Biostatistics 2007;3(1);Article 14

Brookhart MA, Wang PS, Solomon DH, Schneeweiss S. Evaluating Short-Term Drug Effects Using a Physician-Specific Prescribing Preference as an Instrumental Variable. Epidemiology 2006; 17(3): 268-75.

Schneeweiss S, et al. Simultaneous Assessment of Short-Term Gastrointestinal Benefits and Cardiovascular Risks of Selective Cyclooxygenase 2 Inhibitors and Nonselective Nonsteroidal Antiinflammatory Drugs: An Instrumental Variable Analysis. Arthritis Rheum 2006; 54 (11); 3390–8.

Bound J, et al. Problems with instrumental variables estimation when the correlation between the instruments and the endogenous explanatory variable is weak. J Am Statistical Assoc 1995; 90 443-50.

Staiger D, Stock J. Instrumental variables regression with weak instruments. Econometrica1997;65:557-86.

Davidson R, MacKinnon JG. Estimation and Inference in Econometrics. Oxford University Press, NY, 1993.

Terza JV, Basu A, Rathouz PJ. Two-stage Residual Inclusion Estimation: Addressing Endogeneity in Health Econometric Modeling.  J Health Economics 2008: 27: 531-43.

Hidden information below


Email Address*