Austin has been writing about algorithms and the potential for the automation of medical care. One challenge, which I’ll discuss here, is how clinicians should apply statistical estimates of risk to guide their treatment of individual patients.
Empirically-based algorithms and guidelines for care attempt to synthesize evidence into specific directions for how to care for an individual patient. The best algorithms tailor care based on facts about the patient. Janet Currie and her colleagues showed that physicians treating heart attack patients who tailor their care to individual patient characteristics as indicated by evidence-based norms have better mortality outcomes than physicians who pursue idiosyncratic practice styles. Findings like this suggest that lives can be saved by making medicine more algorithmic or, if you like, more automated.
One problem is that much of patient care should reflect shared decision making between providers and patients and this can’t be fully automated. As Peter Ubel and Atul Gawande illustrate, shared decision making is more than just being considerate to the patient and respecting his rights, as important as those things are. Many clinical choices have multiple outcomes. For example, one treatment offers improved quality of life but reduced life expectancy relative to an alternative. In such cases, the “best care” isn’t well-defined unless the patient decides how she values quality of life relative to life expectancy. In these cases, an algorithm can’t lead to best care unless the doctor and the patient engage in shared decision making.
In an interesting JAMA paper, Allan Sniderman, Ralph D’Agostino, and Michael Pencina raise a different challenge for algorithms. They point out that one of the foundations for algorithms (and predictive analytics generally) is using prognostic models to estimate the risk of a clinical outcome for an individual patient from population data. They call these model-estimated risk probabilities “epidemiological risks”.
However, to apply these risk estimates wisely, a physician must understand that the concept of risk is more subtle and complex than is generally appreciated. Specifically, it is important to distinguish between 2 meanings of risk: risk in the epidemiological sense (the risk of a group of individuals) vs risk in the clinical sense (the risk for individual members of the group). But how accurately does the overall risk for the group reflect the risk of each of the individuals who make up the group?
The problem is that the epidemiological risk is an average of the risks experienced by the individuals in a population. Suppose 10 patients have clinical characteristics A, B, and C. The epidemiological model says that we should expect one of these patients to die in the next year, and hence we say that they have a 10% epidemiological risk. But individual risks within that group will vary, possibly a lot. It could be that one patient has a 50% risk, five patients have a 10% risk, and four patients have no risk.
So if you’re a clinician, it’s very likely that the epidemiological risk is not the individual risk of the patient in front of you. Nevertheless, your task is to optimize care for that individual patient. So, if the epidemiological risk — which guides the algorithm — is likely incorrect for my patient, then what’s the use of the algorithm?
To answer that question, we need to understand the fundamental problem that creates a difference between the epidemiological risk and the individual. Sniderman et al. go badly off the rails here when they assert that
Contrary to what is thought, [the epidemiological] risk level is not that person’s personal risk because probability is not meaningful in an individual context.
This explanation is a longstanding but sectarian view in the philosophy of probability. This view is mistaken, because it’s perfectly rational to use probabilities to express a consistent set of beliefs about the likelihoods of individual future events. Otherwise, saying “this patient will probably survive the operation” would be a logical absurdity.
More importantly, the view that probabilities don’t apply to individual events is inconsistent with Sniderman et al.’s (excellent) discussion what the real problem is. The authors repeatedly refer to individual risks in ways that make clear that “risks” meet all the probability axioms and, in short, are probabilities.
The real problem is that the epidemiological risk is based on only a subset of the factors that determine the actual risk for a given individual. Because we have not measured and modeled every relevant factor, the individual risk will always deviate with some probability from the epidemiological risk.
This has an important implication for the clinician: before using an epidemiological risk, she has to make a judgement whether her patient’s case is affected by a factor that was not considered in the estimation of the epidemiological risk. Sniderman et al.’s point — which I wholly endorse — is that clinicians need to be vigilant for cases in which the epidemiological risk omits a significant clinical factor and badly misstates the individual risk. Again, the algorithm can’t run automatically without the engagement of the clinician, this time to evaluate the applicability of the epidemiological risk estimate.
One point that I wish Sniderman et al. had made, however, is that clinicians need to recognize that even if the epidemiological risk is likely incorrect, it might reflect our best current knowledge. Clinicians should worry not just about the applicability of the epidemiological risk but also about their own ability to do better. Consider again Currie’s finding that guideline-following physicians had better outcomes. There are also many studies showing that actuarial predictions are better than clinician’s judgments (including a study of mine here).
There are two big lessons here. First, “automation” may not be the best descriptor for a clinical algorithm of a complex care process. Perhaps it would be better to describe a clinical algorithm as a template for shared decision making for the clinician and patient, with default — but not mandatory — guidance about how to tailor care to the patient, based in part on epidemiological risk estimates.
The second lesson is that the limits of epidemiological risk do not invalidate algorithmic guidelines. We do need criteria for those guidelines that are good enough that clinicians can usually trust the default risks. And we need to be constantly improving these guidelines, carrying out large N prognostic studies that include more of the relevant clinical factors. This will bring the epidemiological risk estimates closer, on average, to the individual risks. Data-driven progress in predictive analytics will reduce — but never eliminate — the challenge of applying an epidemiological risk to an individual patient.