The study about which I wrote below is ungated at this link until Feb. 28, 2015.
A new study suggests that a common medication for type 2 diabetes causes more harm than previously thought, including increasing avoidable hospitalizations and mortality, relative to an alternative medication. The study’s methods are as interesting as its findings.
The study, published in Value in Health, compared two of the most common types of second-line diabetes medications — sulfonylureas (SUs) and thiazolidinediones (TZDs) — in a sample of patients already on the standard first line treatment, metformin. If the study’s instrumental variable (IV) assumptions are accepted (about which, more below) the authors found that, relative to TZDs, SUs cause a 68% increase in risk of avoidable hospitalization and a 50% increase in risk of death. Results for experiencing a heart attack or stroke were not statistically significant.
The study was based on an analysis of merged Veterans Health Administration (VA) and Medicare data for over 80,000 VA patients followed for up to ten years. Full disclosure: it was led by my colleague Julia Prentice, and I work closely with another co-author, Steve Pizer.
As discussed in the paper, as well as the accompanying press release, choosing a second line treatment for type 2 diabetes has become increasingly complex. Existing studies fail to provide all the information clinicians and patients need to make informed choices. (This was also discussed at the recent New England Comparative Effectiveness Public Advisory Council (CEPAC) meeting on the subject). Existing randomized trial findings are either based on too brief follow-up or are underpowered to yield statistically significant mortality findings, for example. The Prentice study is based on a sample about 20 times the size of prior type 2 diabetes medication RCTs, offering sufficient power to study mortality and other low-frequency outcomes.
The study is based on an IV analysis in which prescribing patterns are used as a source of random variation. Importantly, for this purpose, VA patients are assigned to primary care physicians at random. The instrument was, for each patient, the proportion of second line prescriptions (SUs or TZDs) written for SUs by his provider over the year prior to the date on which the patient initiated SU or TZD. Supporting this approach, prescribing pattern has been applied as an instrument in prior work.
But is it a good instrument? The key question is whether it is biased by correlation with any unobservable factor that also affects outcomes.
Prentice et al. offer strong evidence that it is not, with several falsification tests. First, they stratify demographic, comorbidity, and provider quality data by above and below median prescribing rates, showing they are balanced. This is the analog to a table 1 in an RCT, which provides evidence of validity of randomization — a type of falsification test.
Next, the authors examined two populations that did not receive the treatment under study but potentially should be subject to the same omitted variable bias — if there is any — as the primary sample: (1) a population on metformin but not on a second line treatment and (2) a population that had initiated metformin and then insulin without any other drug. For neither population was their instrument related to outcomes, indicating that the instrument only affects outcomes through it’s effect on treatment with SU or TZD, i.e., it is not correlated with an omitted variable that affects outcomes.
What’s particularly nice about these two falsification test populations is that they bracket the study population in disease severity. Population (1) is somewhat healthier, having not moved on to a second line treatment. Population (2) is somewhat sicker (something the authors confirmed), having moved on to insulin. It’s a rare IV study that includes such a thorough and convincing validation.
What the authors don’t say explicitly, but I will, is that these methods are generalizable to other comparative effectiveness observational studies. Provided certain conditions are met, practice pattern variation can be used as an instrument, though one should always validate it with falsification tests whenever it’s applied. The IV + falsification test pair strikes me as a powerful and useful tool, though not one applicable to all problems, to be sure.
I’ll conclude with two sets of questions for knowledgeable readers:
- Are the clinical findings from this study convincing? Should they influence clinical practice? Is an RCT required, if one of sufficient size and duration could be accomplished? Is that remotely feasible?
- Do the methods applied in this study offer a scale-able approach to the big data causal inference problem? That is, could more analysts be trained in the application of IV and falsification tests? If so, what’s the next step? If not, what’s the reliable, alternative approach in situations in which there is reasonable concern of omitted variable bias?
Comments are open for one week for responses to these two sets of questions.