The following is jointly authored by Adrianna and Austin.
Randomized controlled trials are the gold standard in empirical research, but that doesn’t mean they’re the only standard worth paying attention to. If we only find value in RCTs, researchers are wasting an awful lot of time and headspace on alternate methods. So, that recent NYTimes hit piece on the Center for Medicare and Medicaid Innovation strikes us as troubling.
Aaron covered some important technical points yesterday. RCTs can have fantastic internal validity—when they’re conducted well, we can say with relative certainty how treatment did or did not affect the study population—but our capacity to generalize those results is often limited. Dan Diamond has a piece worth reading, too:
CMMI’s approach isn’t totally above reproach; the data that the center is seeing from its pilots could be confused by secular trends, like changes in population, practices, and so on. That’s why, Harvard’s Jha acknowledged, it’s important to design studies with a contemporary control group and statistical testing.
But under CMMI’s ambitious charter, researchers are attempting to track a range of payment and delivery reforms. And it’s hard to think of how the center could use an RCT for some of its projects.
For example, I asked a half-dozen different researchers to construct a hypothetical RCT to test how accountable care organizations would work. All were stumped.
These aren’t clinical trials where you can pass out pills and placebos and carefully record individual health outcomes; CMMI is all about changing institutional practices. And just because health policy is closer in proximity to medicine (and its many RCTs) doesn’t actually make health policy more amenable to this kind of study than any other policy domain.
Take ACOs as an example. What would we supposedly randomize: patients, physicians, or entire hospital systems? Can you imagine the backlash if Medicare tried to foist the program on randomized-but-disinterested providers? Patients would be tricky, too. The way ACOs work now, a Medicare beneficiary is passively “assigned” to an ACO if their physician belongs to that ACO. But that beneficiary isn’t required to limit their care to the ACO—an acknowledged wrinkle—and they may not actually realize that they’re taking part in a new delivery paradigm. (That, itself, is a source of natural (and imperfect) randomness that could be exploited.) According to one Health Affairs brief, critics “believe that patients should have a choice about participating in an arrangement that could reward providers for reducing services.” That sort of rhetoric hardly bodes well for implementing randomized trials in the health services delivery setting.
Moreover, CMMI wasn’t designed to focus on cumbersome, time-consuming, and relatively static experiments. This is a good thing.
Having seen firsthand some CMMI projects evolve on the ground, ability to adapt/change care delivery model in early stages has been critical
— John Graves (@johngraves9) February 6, 2014
These demonstrations aim to do two things to health services delivery: improve quality while maintaining or decreasing costs, or reduce costs while maintaining or improving quality. The emphasis is on “rapid-cycle” evaluation—collecting and analyzing data in near-real time, providing feedback on the programs. Far from wasting resources, CMMI is actually bound by law to modify or terminate demonstrations that have insufficient evidence of success.
Well-conducted policy trials are important and we can learn a lot from them. That said, they don’t come easy or cheap, so they’re not very common. Nor are they immune to threats to internal validity from contamination/crossover/attrition, problems that can be addressed by—wait for it—observational study techniques.
The Oregon Medicaid experiment was a terrific empirical exercise. It’s also paradigmatic of limitations that policy RCTs face—an entire methods course could probably be taught on it. In order to correct potential biases inherent in the design, the authors employed instrumental variables, an observational technique. A constrained sample size meant power problems. A focus on Portland restricts the results’ external validity. Scholars have (and will continue to) debate the study’s findings and their generalizability.
Empirical science, and every technique thereof, is imperfect and incremental. But it’s the best we have. Insisting on only one research modality—the RCT—and overlooking the potential gains and relevance of other approaches is costly, both in dollars and applicable knowledge.