*This post has been cited in the 13 May 2010 edition of Health Wonk Review and is co-authored by Steve Pizer and Austin Frakt. The ideas are based on Steve’s 2009 paper “An intuitive review of methods for observational studies of comparative effectiveness.” *

Making causal inferences in observational studies is more challenging than in randomized experiments. But econometric and statistical techniques have now improved to the point that a knowledgeable practitioner can draw causal conclusions from sound observational research. Though these techniques have already been employed in economics they have not been widely applied or appreciated in health services research. Given their utility and ease of application, that should end.

Randomized clinical trials support causal inferences with relatively uncomplicated statistical methods because they minimize the risk of selection bias. Perfect randomization ensures that treatment and comparison groups do not differ systematically in any dimensions except for receipt of treatment, so differences in outcomes can be cleanly attributed to treatment.

Unfortunately, causal inferences in observational studies are much more difficult to make. The root of the problem is that patients (or their doctors) choose treatment themselves, using information that is often inaccessible to the researcher. Some factors that influence the treatment decision may be observable to the researcher and can be accounted for by including measures of them in the outcome model (e.g. risk adjustment models). Other unobservable factors, if correlated with both the choice of treatment and the outcome, will lead to biased estimates of the treatment effect unless specialized techniques are used. (Note, this post is couched in terms of a clinical study where the effect of treatment is of interest. That’s not critical, however. The techniques described are far more general than the language may suggest. Think of “treatment” as the main independent variable, categorical or continuous. All that’s important is that it and the dependent variable are both correlated with unobservable factors.)

Two techniques for mitigating bias when treatment and outcome are both correlated with unobservables are propensity score matching and instrumental variables (IV) estimation. Propensity score matching is especially useful for small studies where extensive and customized data collection is feasible, minimizing the likelihood of important variables remaining unobserved. IV estimation is superior in cases where important variables are known to be unobserved, as is commonly the case in large studies of administrative data, provided that suitable instruments can be identified. A suitable instrument is a variable that has a strong influence on the likelihood of receiving treatment, but, apart from its effect on treatment, has no direct effect on the outcome.

Local practice pattern variations make natural instruments and have been been successfully applied in several recent health services studies. Stukel and colleagues (2007) assessed the effects of cardiac catheterization on elderly patients hospitalized for acute myocardial infarction. The methodological concern was that patients in poorer health were less likely to receive invasive care, potentially making the effects of treatment look better than they actually were. The investigators used the regional cardiac catheterization rate as the instrument in this study.

Brookhart and Schneeweiss (2007) recommend practice patterns computed at the individual provider level as instruments for observational comparative effectiveness studies. This approach has been used to evaluate changes in risk of gastrointestinal complications and acute myocardial infarction associated with use of Cox-2 inhibitors compared to nonselective NSAIDs (Brookhart, et al. 2006, Schneeweiss, et al. 2006).

Certain conditions must be met for IV estimation to produce valid estimates. First, the instrument(s) must be strongly enough associated with the selection of treatment (Bound, et al 1995). Staiger and Stock (1997) have established statistical tests and threshold values for evaluating the strength of instruments. Second, the instruments should not be associated with the outcome except through their effect on treatment. This condition can only be tested if more than one instrument is available. In such cases overidentifying restrictions tests can be constructed to determine whether it is valid to exclude the instrument from the outcome model (Davidson and MacKinnon 1993). This is a reason to use both geographic practice patterns and individual provider-level practice patterns as instruments where possible.

IV estimation and associated statistical tests can be performed easily with most statistical software packages (e.g. Stata) when outcomes are modeled by ordinary least squares regression. When outcomes are modeled by nonlinear functions like logistic regression or duration models then IV estimates are more difficult to obtain. However, Terza, Basu and Rathouz (2008) derived consistent two-step instrumental variables estimators for use in these situations.

In general, the statistical and computational barriers to IV estimation are low, and the chief challenge is conceptual. Finding good instruments can be difficult and take some creativity. When they are found, application of appropriate technique leads to valid causal inference. Thus, given the cost and challenges of randomized trials, IV estimation is a valuable tool for comparative effectiveness research.

**References**

Pizer SD. An intuitive review of methods for observational studies of comparative effectiveness. Health Serv Outcomes Res Method 2009; 9:54–68.* *

Stukel TA, et al. Analysis of observational studies in the presence of treatment selection bias: effects of invasive cardiac management on AMI survival using propensity score and instrumental variable methods. JAMA 2007; 297(3): 278-85.

Brookhart MA, Schneeweiss S. Preference-based instrumental variable methods for the estimation of treatment effects: Assessing validity & interpreting results. International J Biostatistics 2007;3(1);Article 14

Brookhart MA, Wang PS, Solomon DH, Schneeweiss S. Evaluating Short-Term Drug Effects Using a Physician-Specific Prescribing Preference as an Instrumental Variable. Epidemiology 2006; 17(3): 268-75.

Schneeweiss S, et al. Simultaneous Assessment of Short-Term Gastrointestinal Benefits and Cardiovascular Risks of Selective Cyclooxygenase 2 Inhibitors and Nonselective Nonsteroidal Antiinflammatory Drugs: An Instrumental Variable Analysis. Arthritis Rheum 2006; 54 (11); 3390–8.

Bound J, et al. Problems with instrumental variables estimation when the correlation between the instruments and the endogenous explanatory variable is weak. J Am Statistical Assoc 1995; 90 443-50.

Staiger D, Stock J. Instrumental variables regression with weak instruments. Econometrica1997;65:557-86.

Davidson R, MacKinnon JG. Estimation and Inference in Econometrics. Oxford University Press, NY, 1993.

Terza JV, Basu A, Rathouz PJ. Two-stage Residual Inclusion Estimation: Addressing Endogeneity in Health Econometric Modeling. J Health Economics 2008: 27: 531-43.

by Brad F on May 3rd, 2010 at 06:44

Given these techniques, have they been shown to reflect validity/precision, better than what was available to us before, when credible RCTs on the same subjects were published subsequent to the observational trials? Or, is the availability of CER using VI’s and propensity models still too young to really answer question.

Thanks

Brad

by Austin Frakt on May 3rd, 2010 at 08:21

@Brad F – Great question. I will point to some successes (IV study corresponds to subsequent RCT) in a subsequent post. But there aren’t many for precisely the reason you point out: the use of sound observational design in health services research is relatively new.

by steve on May 3rd, 2010 at 19:52

We are looking to do some of this in my specialty, anesthesiology. Mine is a data rich specialty, but lack of uniform data collection makes it difficult to aggregate. That may be changing.

http://journals.lww.com/anesthesiology/Fulltext/2009/12000/Too_Much_of_a_Good_Thing_Is_Wonderful_.7.aspx

Steve

by Michael Gupta on May 13th, 2010 at 16:09

Hi,

very interesting post. From my understanding, an instrument is a way to correct for selection bias by adding another factor into your model, a factor that is affecting outcome but is not measured or controlled for in a study. But this instrument must be unrelated to the outcome except through it’s effect on treatment.

It would seem to me that it would be hard to find these instruments, especially if you were looking to correct for a factor that caused increased treatment. For example if an observational study failed to measure or control for a poor prognostic factor and that factor was associated with a higher rate of treatment there would be no way to create an instrument that both reflected the poor prognostic factor and was unrelated to the outcome besides the increased association with treatment.

Do you need a specific situation to use the IV method?

Thanks. Michael

by Austin Frakt on May 13th, 2010 at 18:51

@Michael Gupta – You wrote, “For example if an observational study failed to measure or control for a poor prognostic factor and that factor was associated with a higher rate of treatment there would be no way to create an instrument that both reflected the poor prognostic factor and was unrelated to the outcome besides the increased association with treatment.”

That’s not correct. You only need the instrument to be predictive of treatment but not predictive of potential outcomes other than through its effect on treatment. It isn’t really adding another factor, it’s providing the randomization one needs to draw causal inferences. The randomization in a clinical trial is such an instrument, which is why it works. But notice that randomization (which can be viewed as a coin flip) does not itself reflect or relate to any prognostic factor. In fact, it would be bad if it did.

Finding an instrument is the key, but it isn’t as hard as you think. One can find instruments in many settings so the situations in which IV can be applied is broad. As described in this post, practice patterns can work, for example (related to treatment, not otherwise related to potential outcomes). Others will be described in subsequent posts.