First, you put the Clipper Skipper out to pasture, because he has the unilateral power to screw things up. You replace him with a teamwork concept—call it Crew Resource Management—that encourages checks and balances and requires pilots to take turns at flying. Now it takes two to screw things up. Next you automate the component systems so they require minimal human intervention, and you integrate them into a self-monitoring robotic whole. You throw in buckets of redundancy. You add flight management computers into which flight paths can be programmed on the ground, and you link them to autopilots capable of handling the airplane from the takeoff through the rollout after landing. […] As intended, the autonomy of pilots has been severely restricted, but the new airplanes deliver smoother, more accurate, and more efficient rides—and safer ones too.
It is natural that some pilots object. […] [A]n Airbus man told me about an encounter between a British pilot and his superior at a Middle Eastern airline, in which the pilot complained that automation had taken the fun out of life. […]
In the privacy of the cockpit and beyond public view, pilots have been relegated to mundane roles as system managers, expected to monitor the computers and sometimes to enter data via keyboards, but to keep their hands off the controls, and to intervene only in the rare event of a failure. […] Since the 1980s, when the shift began, the safety record has improved fivefold, to the current one fatal accident for every five million departures. No one can rationally advocate a return to the glamour of the past. […]
Once you put pilots on automation, their manual abilities degrade and their flight-path awareness is dulled: flying becomes a monitoring task, an abstraction on a screen, a mind-numbing wait for the next hotel […] [a] process  known as de-skilling. […]
The automation is simply too compelling. The operational benefits outweigh the costs. The trend is toward more of it, not less. And after throwing away their crutches, many pilots today would lack the wherewithal to walk.
Safer by design yet operationally boring, is this the future of medicine? A meaningful subset of it? Not in a million years?
The following originally appeared on The Upshot (copyright 2015, The New York Times Company).
Two recent studies of Medicare’s new way to pay for health care show that it’s reducing spending and improving quality. The problem is, health care organizations don’t always stick with the program.
Both studies examined Medicare’s 32 Pioneer Accountable Care Organizations. This program, and a related, similar one with a larger number of participants, offers health care organizations the opportunity to earn bonuses in exchange for accepting some financial risk, provided they meet a set of quality targets.
Across a variety of measures, the two studies found that Pioneer A.C.O. quality of care held steady or improved.
Even if the overall savings are modest and assessed only in the first year or two of the program, the studies’ findings are good news for Medicare. Inspired by some of the nation’s most revered health care organizations — like Kaiser Permanente and the Mayo Clinic — Medicare’s A.C.O. program is its flagship reform initiative. It’s intended to promote the delivery of more efficient and effective care, paying more for value than for volume. Medicare has announced it intends to accelerate the transition from volume to value in the coming years. The new studies offer some confidence that it can do so while reducing spending and without harming quality.
However, there is still cause for concern. Thirteen A.C.O.s left the Pioneer program after the first year. Even though those A.C.O.s had saved money too, according to the studies, this is a troubling sign. A program that fails to retain its members cannot succeed in the long term. And, as these two studies cover only the first two years, despite the encouraging findings they do not provide information about what happened in the longer term.
Because the program is voluntary, an organization that can earn more by leaving, or one that anticipates it cannot recoup investments necessary to succeed, will not participate. One reason organizations may have dropped out is that payments decrease quickly as organizations become more efficient.
Dr. Michael McWilliams, lead author of the New England Journal of Medicine study, suggested that Medicare may achieve greater success over time with a more gradual approach that better balances the goal of achieving savings with the need to retain participants.
“Building on this early success will require greater rewards for A.C.O.s that generate savings,” Dr. McWilliams said.
Dr. McWilliams’s study also found that organizations that consolidated hospitals with physician practices performed no better than those that did not. This suggests that such consolidation — which has been rampant in the industry and drives up prices paid by commercial insurers — is not necessary to reduce Medicare spending and improve care.
“If financial integration between physicians and hospitals fosters more effective responses to new payment models, those efficiencies have not yet manifested among A.C.O.s,” Dr. McWilliams said.
The voluntary nature of the program also challenges study of it. Self-selection invites the possibility that organizations that opt in could be different from those that don’t, perhaps better able to reduce spending and improve quality. Randomizing organizations into the program — akin to a randomized controlled trial of medical therapies — would offer more certain estimates of the program. But it’s not practical to force such a large change on health care organizations, and possibly dangerous to experiment so directly with factors that could affect patient care.
The two studies’ researchers used a different approach to tease out estimates of the program’s effects. They compared changes in cost and quality experienced by beneficiaries in A.C.O.s with those of comparable beneficiaries served by other organizations. Dr. McWilliams’s study also tested whether those changes corresponded to the timing of A.C.O. participation. Since no changes were detected before program initiation, that provides confidence that the findings are because of the program itself.
The good news is that, at least in the first year or two of participation, A.C.O.s seem to spend less and deliver equivalent or better care than other health care organizations. The bad news is that many organizations drop out of the program, even as they’re succeeding.
Recent studies have credited Pioneer ACOs with some savings, though less to none accounting for program costs. How do those study results square with predictions? I actually don’t know, and I’m not sure anybody does. The Pioneer Program is a demonstration program within the Center for Medicare and Medicaid Innovation. Demos don’t get scored (I’m told).
I can, however, find* predictions of savings from Medicare’s Shared Savings Program, which is a larger and slightly different ACO program than Pioneer. (Learn about the differences here.) Those predictions are below.
When reading the following, keep in mind that today Medicare costs over $500 billion per year. So, when discussing 10-year budget savings, compare figures below to something on the order of $5 trillion. In other words, $5 billion savings over 10 years is roughly 0.1 percent of total Medicare spending.
 In December 2008, the Congressional Budget Office (CBO) provided some clues to its thinking about ACOs (which, at the time, it called Bonus Eligible Organizations, or BEOs). Under the assumption that 20 percent of beneficiaries participated in such an organization by 2014 and 40 percent by 2019, CBO scored $5.3 billion in savings over 2010-2019. Today, only 7 million beneficiaries are associated with ACOs, or about 14 percent. The savings would decline over that span for several reasons, one of which is that as organizations became more efficient, bonus payments would grow. CBO warned that their prediction was highly uncertain because it was not clear precisely how organizations would respond to financial incentives and due to the voluntary nature of the program. Regulations are evolving to attract and retain more organizations, which often means paying them more.
My guesses: the CMS actuary is probably wrong on the low side, but possibly not by much. CBO’s estimate could be close to right, but possibly too high. We already see some savings from Pioneer ACOs, for example, but fewer beneficiaries are associated with ACOs than expected and modifications of regulations may increase program costs.
People who think ACOs will definitely turn the tide of health care spending once and for all are overconfident. Maybe ACO savings can grow over future decades, but it’d have to do some significant compounding to reach a substantial portion of total Medicare spending this century. One way it could compound is if ACOs succeeded in controlling the diffusion of expensive, new health care technology.** I am unaware of anyone making an explicit argument that they will do so.
* By “find” I mean that I asked Loren Adler, and he emailed me links.
** If I’m doing the math right, with 0.1% savings relative to all of Medicare spending in the first decade, even if that doubled every decade, we’d only see about 0.8% savings by 2100.
It’s important to promote good research in health care, particularly if it’s relevant to policy or patient care. It’s equally important to disclose limitations of that research, and, if you’re a journalist or act like one, the good folks at Health News Review will let you know when you’ve failed to do so.
A research letter in JAMA Internal Medicine by Michael Wang, Mark Bolland, and Andrew Grey illustrated just how frequently papers, press releases, and news stories mention limitations of observational studies. Spoiler: It’s relatively uncommon that they do so. No wonder the Health News Review crew is so busy!
Wang et al.
collated 81 prospective cohort and case-control studies with clinical outcomes published between January 1, 2013, and June 30, 2013, in the Annals of Internal Medicine, BMJ, JAMA, JAMA Internal Medicine, Lancet, New England Journal of Medicine, and PLoS Medicine; 48 accompanying editorials; 54 journal press releases; and 319 news stories generated within 2 months of publication. […] For each of the resulting 583 documents, [they] determined whether any study limitation was reported and whether there was an explicit statement that causality could not be inferred.
Their figure below illustrates the proportion of the various means of reporting results that discuss limitations. Nowhere does mention of limitations of causality rise above about 17%. Is that a problem?
Maybe not. The reason is that Wang et al. used a very strict definition of “limitation”. Their point of view is that observational studies have a fundamental “inability to attribute causation.” Therefore, they looked for statements that “causality cannot be inferred.” That’s too strong.
All causal inferences rely on assumptions. The assumptions required of observational studies are, in some senses (depending on application), stronger than those for randomized trials, but quite frequently a causal inference is reasonable, especially with enough probing of those assumptions. What I’d look for in describing limitations is not a statement that causality cannot be inferred, but a statement about under what assumptions it can be, and disclosure of threats to those assumptions. (More about this here and here, for instance.)
Nevertheless, limitations should be acknowledged. Sometimes they really are too severe for reasonable interpretations of causality. About that, Stephen Soumerai, Douglas Starr, and Sumit Majumdar have published a worthwhile primer on how to spot inappropriate causal inferences, with examples from studies past and news media reporting about them.
Ultimately, the authors recommend interrupted time series (of which, difference-in-differences is one variant) as a means of more plausible causal inference with observational data. Other approaches, not discussed, are also worthwhile, in my view, but require more training and care. I don’t fault the authors for not including them in what is intended to be a relatively easy read.
IBM is now training Watson to be a cancer specialist. The idea is to use Watson’s increasingly sophisticated artificial intelligence to find personalized treatments for every cancer patient by comparing disease and treatment histories, genetic data, scans and symptoms against the vast universe of medical knowledge. […]
But the Watson project and similar initiatives also have raised speculation — and alarm — that companies are seeking to replace the nation’s approximately 900,000 physicians with software that will have access to everyone’s sensitive personal health information. […]
“I think a lot of folks in medicine, quite frankly, tend to be afraid of technology like this,” said Iltifat Husain, an assistant professor at the Wake Forest School of Medicine.
Husain, who directs the mobile app curriculum at Wake Forest, agrees that computer systems like Watson will probably vastly improve patients’ quality of care. But he is emphatic that computers will never truly replace human doctors for the simple reason that the machines lack instinct and empathy. […]
Unlike a human brain that can be distracted, confused or inspired by huge volumes of information, Watson is not a creative thinker but a rational one. It looks at known associations among various bits of data and calculates the probability that one provides a better answer to a question than another and presents the top ideas to the user.
A report by Milliman includes data from which I made the following charts. They show, by condition, how much it costs to treat someone with a given condition with and without a substance use disorder (SUD).
OK, it’s a bit more complex than that, because I’m only showing part of the data and I’m constrained by how Milliman broke it down. Each chart shows the total for people without a mental health (MH) condition or a SUD in blue. In red, the Medicare chart shows the additional amount for people with a SUD and the Medicaid does so for people with a SUD or mental health condition. But we know MH conditions and SUDs co-occur with high frequency, so qualitatively I believe the story that SUDs add a lot to cost.
More controversially, I suspect, is their finding that “algorithms may be one way to improve care, at least for common situations like emergency treatment of heart attack.” This would seem to threaten physician autonomy. If it’s true that algorithms can improve care—even if in a few, circumscribed ways—is physician autonomy an important consideration nonetheless? (One concern might be that fewer people will want to be doctors. Another concern is that an algorithm might be “right” on average but not capture patients’ physiological or preferential heterogeneity. As Peter Ubel wrote in NEJM today, sometimes in guidelines values masquerade as facts. Are there other concerns?)
“For however much automation has helped the airline passenger by increasing safety it has had some negative consequences,” says Langewiesche. “In this case it’s quite clear that these pilots had had experience stripped away from them for years.” The Captain of the Air France flight had logged 346 hours of flying over the past six months. But within those six months, there were only about four hours in which he was actually in control of an airplane—just the take-offs and landings. The rest of the time, auto-pilot was flying the plane. Langewiesche believes this lack experience left the pilots unprepared to do their jobs. […]
However potentially dangerous it may be to rely too heavily on automation, no one is advocating getting rid of it entirely. It’s agreed upon across the board that automation has made airline travel safer. The accident rate for air travel is very low: about 2.8 accidents for every one million departures.
Medical errors are far more common. As Aaron told us, “the wrong site is operated on in about 1 in a 100,000 procedures. Foreign objects are left in the body in about 1 in 10,000 procedures.” These are “never” events. They should not happen. One would think that exceedingly simple algorithms (like checklists) could prevent them. Is there a good reason not to try (or try harder) to implement them?
The comparison of medical errors to aviation errors is not new. The anesthesiologist community has adapted a system of error tracking and correcting from the aviation industry. To what extent does it include application of algorithms to roles humans used to fill? To what extent is that perceived as going too far, causing a degradation in skill that could be called upon when the algorithm fails?
Aviation is not medicine, of course. A key difference may be that planes are far more similar to one another in response to flight commands than are humans in response to medical treatments. The classic response to any algorithmic medicine (or guideline) is that it’s “one size fits all” and that a good physician knows how to treat an idiosyncratic patient. But what if all physicians are not equally good? What if patient idiosyncrasy invites and is used to justify greater variation in practice than is warranted?
Perhaps one day aviation and medicine will be more similar. Today, all planes of a given model are designed to be equivalent. Today, we don’t (yet) completely know what the different, roughly equivalent models of humans are. Advances in genomics might one day tell us.
Yet, well short of that day there may be areas in which more algorithmic, medical decision support can advance safety and improve outcomes. My guess is that more physicians will be working with algorithms in the future. To the extent it leads to better care, payers and patients might reasonably demand it. It will, however, be a long, long time before robots take docs’ jobs.
I’m going to defer discussion of findings and just make note of some of the referenced claims in the background section, but here’s the abstract if you really must know:
When a patient arrives at the Emergency Room with acute myocardial infarction (AMI), doctors must quickly decide whether the patient should be treated with clot-busting drugs, or with invasive surgery. Using Florida data on all such patients from 1992-2011, we decompose physician practice style into two components: The physician’s probability of conducting invasive surgery on the average patient, and the responsiveness of the physician’s choice of procedure to the patient’s condition. We show that practice style is persistent over time and that physicians whose responsiveness deviates significantly from the norm in teaching hospitals have significantly worse patient outcomes, including a 7% higher probability of death in hospitals among the patients who are least appropriate for the procedure. Our results suggest that a reallocation of invasive procedures from less appropriate to more appropriate patients could improve patient outcomes without increasing costs. Developing protocols to identify more and less appropriate patients could be a first step towards realizing this improvement.
As you can tell from the abstract, the paper is about the extent to which modifying physician practice variation with protocols that are sensitive to patient characteristics (to which some physicians are less so) would improve outcomes. Hence, the following claims from the background, linked to references, are highly relevant:
In a 1954 publication, Meehl “argued that predictions based on these simple models were generally more accurate than those of [clinical psychologists]. A more recent meta-analysis of 136 studies in clinical psychology and medicine also found that algorithms tended to either outperform or to match the experts.”
“The advantage of the algorithms arises mainly because the algorithms are more consistent than the experts.”
There are many possible reasons for mistakes by experts in medicine: (1) defensive medicine, but “Baicker et al. (2007) argue that there is little connection between malpractice liability costs and physician treatment of Medicare patients”; (2) financial incentives, but “a recent national survey of general surgeons which used hypothetical clinical scenarios suggested that the decision to operate was largely independent of malpractice concerns and financial incentives”; (3) patient preferences, but Cutler et al.“conclude that patient demand is a relatively unimportant determinant of regional variations and that instead the main driver is physician beliefs about appropriate treatment that are often unsupported by clinical evidence.” (Bill wrote about this study here.) Work by Finkelstein et al. is consistent with this.
That leaves (4) influence by peers—indeed, “knowledge spillovers are the main theoretical driver of small area variation in procedure use in” the model by Chandra and Staiger; and (5) “Doyle et al. (2010) suggest that some doctors may just be more competent than others.”
The paper by Currie et al. builds on the work cited in (4) and (5). “Our main focus is on identifying doctors who, for whatever reason, are making poor use of the observable data about their patients when making treatment decisions. We will show that patients of these doctors tend to have worse outcomes than other comparable patients.
I wanted to point you to more recent data from our Health Reform Monitoring Survey that shows an even steeper drop in the share of adults ages 18-64 with problems paying family medical bills when we look at the period between September 2013 and March 2015. This change could be due to both further takeup of coverage in 2015 and to a lower likelihood that those who obtained coverage in 2014 would report problems paying family medical bills in the previous year as more time passes. We also found that adults were more likely to report problems paying family medical bills in March 2015 if they had low incomes, were uninsured for part or all of the previous 12 months, were in fair or poor health, or had high deductibles, and that those with problems paying bills were much more likely to forgo needed care because they could not afford it compared with adults who don’t have medical bill problems.
Pulling up the study, here are two charts I found interesting. The first shows the widening gap in health care affordability between Medicaid expanding and non-expanding states. The second shows the association of affordability with deductible level.
Back to Genevieve’s email:
A separate brief focused on medical debt using December 2014 HRMS data shows some of the limits of insurance coverage in protecting families against large medical bills, as 70% of adults with medical debt reported that they or their family members incurred all of the debt when they were insured.
Here’s one chart from that brief. No doubt because cost sharing is higher, those with private coverage (ESI or nongroup) are more likely to report medical debt due to cost sharing than those with public coverage.