When I was a kid, my parents refused to let me drink coffee because they believed it would “stunt my growth”. It turns out, of course, this is a myth. Studies have failed, again and again, to show that coffee or caffeine consumption are related to reduced bone mass or how tall people are. But that’s just the tip of the iceberg. Break out your supersized cup of joe, cause coffee is the topic of this week’s Healthcare:
For those of you who want to read more, you can find all the supporting links at my Upshot piece on Coffee, from which this episode was adapted. There’s also a follow-up Q&A you might enjoy.
Each circle in that chart is a county in the United States, with bigger circles representing bigger counties. The fitted lines tell us that as the proportion of people receiving mammograms goes up, the rate of people being diagnosed with cancer goes up, pretty dramatically. But the rate of people dying from breast cancer in 10 years is pretty much unaffected.
When analyzed at the county level, the clearest result of mammography screening is the diagnosis of additional small cancers. Furthermore, there is no concomitant decline in the detection of larger cancers, which might explain the absence of any significant difference in the overall rate of death from the disease. Together, these findings suggest widespread overdiagnosis.
It’s hard to find someone who’ll argue against the notion that science should be open and transparent. In practice, however, much of what scientists do is not available for public scrutiny and debate. What can we do? A recent issue of Science Magazine provides perspectives on how to move toward open science (see also Austin commenting on Brendan Nyhan).
Brian Nosek and colleagues report on a project to define standards for open science. The standards include:
Standards for transparency about study designs, study materials, data sharing, and analytic methods (that is, standards for sharing the code comprising statistical models or simulations).
Standards for preregistration of studies and data analysis plans.
Replication standards that recognize the value of replication for independent verification of research results and identify the conditions under which replication studies will be published in the journal.
Citation standards that extend current article citation norms to data, code, and research materials.
Over 100 journals have signed on to these standards, including Science. Relatively few medical journals are signatories. Readers with connections to journals should raise these issues with editorial boards and professional societies.
Getting these standards adopted across science will be a struggle. Some fields have huge incentives for priority of discovery. This encourages labs to conceal methods and even details about results for as long as possible. For scientists in any field, open science requires great attention to accurate documentation of methods and curation of data. These are time-intensive tasks. Therefore, practicing open science competes against the expectations that scientists achieve high research productivity.
For this reason, Bruce Alberts and colleagues argue that scientists’ career
incentives should be changed so that scholars are rewarded for publishing well rather than often. In tenure cases at universities, as in grant submissions, the candidate should be evaluated on the importance of a select set of work, instead of using the number of publications or impact rating of a journal as a surrogate for quality.
The challenge in implementing evaluation on quality rather than quantity will be finding valid and reliable ways to measure importance.
What’s at stake in open science? Transparency is part of what defines science, so open science is just better science. Period.
But open science is also critical for the applications of science. As Austin and I discussed recently, part of the solution to the troubles concerning conflicts of interest in medical research is to make science more transparent. To the degree that we have confidence in the data and methods of a study, the less it matters that an author has a financial tie to industry. To the degree that we can verify, we don’t have to trust.
The same applies to using science in policy. Getting to empirically-driven policy requires that we gather data and evaluate social programs against benchmarks. Credibility of evidence is everything here. Most of us have strong prior beliefs about the effectiveness of social programs, and we are inclined to distrust the research of those with whom we disagree. We will never completely counter those priors. But what we can do is make the data, design, methods, and analyses in policy-oriented science as reproducible as possible.
I think the vast, vast majority of concerned tweets, emails, phone calls, and texts I get from people (including friends) are panicked concerns that something or other is causing cancer. Everyone is freaked out about it. I always find this odd, because I have this general sense that things are getting better, not worse, when it comes to cancer. If something ubiquitous were causing cancer in a big way, we’d see it.
Introduction: Healthy People 2020 (HP2020) calls for a 10% to 15% reduction in death rates from 2007 to 2020 for selected cancers. Trends in death rates can be used to predict progress toward meeting HP2020 targets.
Methods: We used mortality data from 1975 through 2009 and population estimates and projections to predict deaths for all cancers and the top 23 cancers among men and women by race. We apportioned changes in deaths from population risk and population growth and aging.
The researchers took the mortality data from 1975 through 2009, combined that with population data and other things, and then estimated the changes in death from many cancers over time. Sure, the number of total deaths increased, but that was mostly because more and more people were getting older. When you look at the age-adjusted rates (deaths per population), the risk of dying of cancer went down in all groups. They’re also predicted to continue to decline through 2020. Here’s a chart:
Let me be clear: the number of people dying from cancer in the US may increase in the future. In fact, I think it’s likely. But that’s mostly because of demographics and a growing population. The risk of dying from cancer, ie the rate of death per population, is going down. This is a good thing. It means that in general, we’re doing something right here. Pick something else for your panic du jour.
Here are two fascinating charts from the recent NBER working paper by Mariacristina De Nardi and colleagues. Both are average, per person total health care spending by type of service, the first by age and the second by month prior to death. The second is estimated from a model, which is why it’s so smooth.
Nursing home care is driving the entirety of growth in health spending after age 85 or so. Even before age 85, it’s the main driver. (Yes, the article says “nursing home” specifically, even when referencing the figure that just says “nursing”.)
Hospital care is the biggest source of cost in the last year of life, followed by nursing home care.
Per person spending is not informative about total spending. One would have to multiply by the number of people (and by age, for age-specific averages, per the first chart above). I point this out just so those on Twitter who think I’m not aware of this fact are assured that I am.
In fact, Wachter mentions Flight 447’s fatal crash as he explains how automation can create new vectors for disaster, even as it closes off old ones. Both he and Langewiesche provide evidence that automation—auto-pilots in aviation and clinical decision support and order fulfillment features in electronic medical systems—improves safety on average while courting danger in a subset of cases. This is not a knock on their work or the compelling anecdotes they use to drive their narratives, but it’s a plea for a bit of perspective as you read either story, and I highly recommend both.
At the heart of both is a cascade of errors that begins with a human (or humans’) misunderstanding of the mode in which an automated system is operating. Wachter offers a very nice example of such a “mode error,” which I’m certain you can relate to: ACCIDENTALLY TYPING WITH CAPS LOCK ON. The caps lock key toggles the keyboard mode such that all (or most) keys behave differently.
When typing, an inadvertent caps lock toggle can cause annoying mode errors, like failing to properly enter a password. When flying an aircraft or ordering medications for a patient, mode errors can be deadly, even if they’re usually annoyances that get remedied before disaster strikes.
The pilots aboard Flight 447 didn’t recognize that their plane had switched modes, relying less on auto-pilot and ceding more control to them. They misinterpreted this sudden grant of autonomy as a confusing set of malfunctions. Likewise, the physician that initiated the sequence of errors that landed Pablo Garcia in the ICU, but might have killed him, didn’t recognize a mode change: the electronic medication entry system had switched from interpreting entries in milligrams to milligrams per kilogram of patient weight, thus multiplying a 160mg dose by a factor of 39.
Failure of humans to recognize mode changes and failure of systems to make them more obvious but without exacerbating “alarm fatigue” are among the many ways automation can harm. It relies on humans’, often well-earned, trust in automation. When we ignore the warning signs that an automated system is telling us, in part that’s because that system has served us very well in the past, sparing us from far more errors than it creates. Despite their intent, the vast majority of alarms (car alarms, fire alarms, the flashing “check engine” light, and the like) are not signals of immediate danger, so our learned response is to treat them as such and to ignore them when possible. Though infrequently, this will sometimes be a mistake. It won’t always lead to disaster (because we have other means of obtaining the right information and correcting our first, false assumption), but it could do so.
Such assumptions are not unique to automated systems. I’m well aware that every wail from my children is not a signal of deadly distress. Their sounds of alarm don’t always mean what they think they mean. Likewise, the political candidate who warns the end of America if his opponent is elected is no longer alarming.
Our trust in (or conferring of) authority is not unique to the machine-human relationship either. Though I do trust many machines, I trust a great number of humans too. They’ve earned it. And yet they err, and their errors cause me harm, just as mine cause harm to others. Naturally, we should be aware of the harms of machines, of humans, and of the marriage of the two. We should strive to reduce the potential for grave error, provided we can do so in ways that don’t invite greater costs (by which I do not merely mean money).
A careful read of the accounts of Flight 447 and patient Pablo Garcia reveals the overwhelming benefits of automation in aviation and medicine, as well as the dangers that still remain. There is much more work to do, as both authors expertly document. Humans are highly imperfect. So are our systems designed to protect us from ourselves.
* Also, let me assure you that I understand the differences between aviation and medicine, as I mentioned previously. All recent posts on automation are so tagged.
I write about nutrition far more now than I used to. Part of that is because – as with health policy – as I’ve learned how little of what we say is based on data and evidence, the more irritated I become (see my Twitter avatar).
I recently came across a Viewpoint in JAMA that is illustrative of how things are changing in nutrition. It’s by Dariush Mozaffarian and David Ludwig, and it talks about the Dietary Guidelines Advisory Committee report. Here’s something I already wrote about at The Upshot:
In the new DGAC report, one widely noticed revision was the elimination of dietary cholesterol as a “nutrient of concern.” This surprised the public, but is concordant with more recent scientific evidence reporting no appreciable relationship between dietary cholesterol and serum cholesterol1 or clinical cardiovascular events in general populations.
But they want to focus on something else:
A less noticed, but more important, change was the absence of an upper limit on total fat consumption. The DGAC report neither listed total fat as a nutrient of concern nor proposed restricting its consumption. Rather, it concluded, “Reducing total fat (replacing total fat with overall carbohydrates) does not lower CVD [cardiovascular disease] risk.… Dietary advice should put the emphasis on optimizing types of dietary fat and not reducing total fat.” Limiting total fat was also not recommended for obesity prevention; instead, the focus was placed on healthful food-based diet patterns that include more vegetables, fruits, whole grains, seafood, legumes, and dairy products and include less meats, sugar-sweetened foods and drinks, and refined grains.
The complex lipid and lipoprotein effects of saturated fat are now recognized, including evidence for beneficial effects on high-density lipoprotein cholesterol and triglycerides and minimal effects on apolipoprotein B when compared with carbohydrate. These complexities explain why substitution of saturated fat with carbohydrate does not lower cardiovascular risk. Moreover, a global limit on total fat inevitably lowers intake of unsaturated fats, among which nuts, vegetable oils, and fish are particularly healthful. Most importantly, the policy focus on fat reduction did not account for the harms of highly processed carbohydrate (eg, refined grains, potato products, and added sugar)—consumption of which is inversely related to that of dietary fat.
As with other scientific fields from physics to clinical medicine, nutritional science has advanced substantially in recent decades. Randomized trials confirm that diets higher in healthful fats, replacing carbohydrate or protein and exceeding the current 35% fat limit, reduce the risk of cardiovascular disease. The 2015 DGAC report tacitly acknowledges the lack of convincing evidence to recommend low-fat–high-carbohydrate diets for the general public in the prevention or treatment of any major health outcome, including heart disease, stroke, cancer, diabetes, or obesity. This major advance allows nutrition policy to be refocused toward the major dietary drivers of chronic diseases.
As I’ve said repeatedly, I don’t know that the evidence is so clear that we should be making declarative statements telling anyone how to eat. But it’s amazing just how much the tide has turned not just against carbohydrates, but towards fat. I spent decades being told to reduce my fat intake, lower and lower and lower; that may have been the wrong thing to do.
Some ideas for reducing publication bias and increasing the credibility of published scientific findings, from Brendan Nyhan:
Pre-accepted articles, based on pre-registered protocol and before findings are known. Brendan reports that this is already happening at AIMS Neuroscience, Cortex, Perspectives on Psychological Science, Social Psychology, and for a planned special issue of Comparative Political Studies.
Results-blind peer review. A similar idea to pre-accepted articles, this would evaluate submissions on all aspects of a paper (data, methods, import) apart from the actual findings. Brendan notes that this has been attempted at Archives of Internal Medicine.
Verifying replication data and code, basically providing everything necessary to replicate the study. Already standard at the American Journal of Political Science and American Economic Review.
Reward higher quality and faster reviews with credits, redeemable for faster review of one’s own manuscripts. I’m not aware of a any journal that has attempted any program aimed at reducing review times, let alone succeeded in doing so.
Forward reviews of promising manuscripts to section journals. That is, if the flagship journal can’t accept, but recommends publication in an affiliated journal, streamline the process by treating the flagship’s review as the first round. Something like this already happens with JAMA journals and the American Economic Review and its affiliated journals.
Triple blind reviewing would blind the editor from the authors, not just the authors and reviewers from one another. Already standard at Mind, Ethics, and American Business Law Journal.
As Brendan writes, all of these have limitations and none can remove all potential bias or gaming. Yet it’s hard to argue they’re not worth considering.
First, you put the Clipper Skipper out to pasture, because he has the unilateral power to screw things up. You replace him with a teamwork concept—call it Crew Resource Management—that encourages checks and balances and requires pilots to take turns at flying. Now it takes two to screw things up. Next you automate the component systems so they require minimal human intervention, and you integrate them into a self-monitoring robotic whole. You throw in buckets of redundancy. You add flight management computers into which flight paths can be programmed on the ground, and you link them to autopilots capable of handling the airplane from the takeoff through the rollout after landing. […] As intended, the autonomy of pilots has been severely restricted, but the new airplanes deliver smoother, more accurate, and more efficient rides—and safer ones too.
It is natural that some pilots object. […] [A]n Airbus man told me about an encounter between a British pilot and his superior at a Middle Eastern airline, in which the pilot complained that automation had taken the fun out of life. […]
In the privacy of the cockpit and beyond public view, pilots have been relegated to mundane roles as system managers, expected to monitor the computers and sometimes to enter data via keyboards, but to keep their hands off the controls, and to intervene only in the rare event of a failure. […] Since the 1980s, when the shift began, the safety record has improved fivefold, to the current one fatal accident for every five million departures. No one can rationally advocate a return to the glamour of the past. […]
Once you put pilots on automation, their manual abilities degrade and their flight-path awareness is dulled: flying becomes a monitoring task, an abstraction on a screen, a mind-numbing wait for the next hotel […] [a] process  known as de-skilling. […]
The automation is simply too compelling. The operational benefits outweigh the costs. The trend is toward more of it, not less. And after throwing away their crutches, many pilots today would lack the wherewithal to walk.
Safer by design yet operationally boring, is this the future of medicine? A meaningful subset of it? Not in a million years?