Many researchers and physicians assert that randomized clinical trials (RCTs) are the “gold standard” for evidence about what works in medicine. But many others have pointed to both strengths and limitations in RCTs (see, for example, Austin’s comments on Angus Deaton here). Nancy Cartwright is a major philosopher of science. In this Lancet paper she provides insights into why RCTs are so highly valued and also why they are by themselves insufficient to answer the most important questions in medicine.
RCTs have been taken to be a gold standard because they are, according to Cartwright, “self-validating.” What this means is that an RCT can establish a causal connection between a treatment and an outcome more or less by virtue of the design.
all features causally relevant to the outcome other than the treatment (and its downstream effects) are distributed identically between treatment and control groups. If the outcome is more probable in the treatment than the control group,… the only explanation possible is that the treatment caused the outcome in some members of that group.
An RCT done right means that correlation between treatment and outcome does imply causation. In most sciences, you need strong theories to understand what the data mean and to justify your causal interpretation of an experiment’s results (see here). There are no control groups in astrophysics. In an RCT, the evidence for the causal effect of the treatment comes from the experimental design and doesn’t depend on your theory about what makes the treatment work.
Because RCTs stand on their own without a superstructure of theory, Cartwright says that RCTs provide “clinchers” to arguments. This is a powerful argument for RCTs. So why does Cartwright argue that RCTs are not the gold standard?
Cartwright thinks that the problem is that an RCT tell us that a treatment “works somewhere” but it doesn’t always tell us that “it works for us.” In what way might an RCT not work “for us”? Here are two big problems.
First, the RCTs may get “us” wrong. That is, the people studied in the RCT may not be the same as the people we need to treat in a clinic. For many reasons, the people who end up in RCTs differ from general clinical populations (they are often healthier, younger, they live in or near cities with research hospitals, etc.). The effects of treatments often vary depending on age, genotype, the presence of other disorders and medications, and so on. The results of the RCT may not generalize to your patient because the trial did not include people like your patient.
The second and more interesting reason why an RCT may not “work for us” is that an RCT takes human decision making out of the clinical process as a matter of design. RCTs take decision making out of the clinical process in many ways. Medications are provided to participants. Compliance with dosing schedules may be monitored. Most importantly, RCTs are engineered to prevent either clinicians or patients from choosing what happens after the patient agrees to being randomized. This is why studies are “blinded”: otherwise, clinicians and patients would compromise the randomization.
But in the real world, unlike RCT world, treatments happen because physicians choose to prescribe them and patients choose to take them. And in the real world many treatments fail because they do not engage physicians or patients to use them. RCTs elicit engagement in ways that may be infeasible in the real world. This is a major reason why treatment effects estimated in RCTs do not generalize to actual clinical practice.
When apparently superior treatments fail to gain usage, it’s tempting to think that physicians who do not prescribe them or the patients who fail to take them are at best uninformed or at worst stupid. My view is that more often we have failed to make a serious effort to understand the choices that physicians and patients actually face and how they value treatment outcomes. A complete theory of how treatment works must account for the thoughts, attitudes, and behaviors of the human actors in the clinical process.
The upshot is that RCTs have exceptional virtues as research design, but they are not the gold standard because they cannot address all the questions that we need to answer to improve clinical practice. We need a wide repertoire of studies including observational research. In some cases, observational designs can provide the basis for strong causal inferences. But we also need observational research because it is essential to have a detailed understanding of how treatment occurs when people freely choose courses of action. That is the process that we need to debug.
We also need to take social scientific theories more seriously. James Heckman argues that a complete understanding of a policy effects — and by extension, clinical treatments — requires a social scientific theory that can explain the actions of the people carrying out the policy. When people treat the RCT as a gold standard and then notice that RCTs are in a sense theory-independent, they can be misled into thinking that we do not need theories. But that belief will blind you to important questions.