The following originally appeared on The Upshot (copyright 2018, The New York Times Company).
The medical research grant system in the United States, run through the National Institutes of Health, is intended to fund work that spurs innovation and fosters research careers. In many ways, it may be failing.
It has been getting harder for researchers to obtain grant support. A study published in 2015 in JAMA showed that from 2004 to 2012, research funding in the United States increased only 0.8 percent year to year. It hasn’t kept up with the rate of inflation; officials say the N.I.H. has lost about 23 percent of its purchasing power in a recent 12-year span.
Because the money available for research doesn’t go as far as it used to, it now takes longer for scientists to get funding. The average researcher with an M.D. is 45 years old (for a Ph.D. it’s 42 years old) before she or he obtains that first R01 (think “big” grant).
Given that R01-level funding is necessary to obtain promotion and tenure (not to mention its role in the science itself), this means that more promising researchers are washing out than ever before. Only about 20percent of postdoctoral candidates who aim to earn a tenured position in a university achieve that goal.
This new reality can be justified only if those who are weeded out really aren’t as good as those who remain. Are we sure that those who make it are better than those who don’t?
A recent study suggests the grant-making system may be unreliable in distinguishing between grants that are funded versus those that get nothing — its very purpose.
When a health researcher (like me) believes he has a good idea for a research study, he most often submits a proposal to the N.I.H. It’s not easy to do so. Grants are hard to write, take a lot of time, and require a lot of experience to obtain.
After they are submitted, applications are sorted by topic areas and then sent to a group of experts called a study section. If any experts have a conflict of interest, they recuse themselves. Applications are usually first reviewed by three members of the study section and then scored on a number of domains from 1 (best) to 9 (worst).
The scores are averaged. Although the bottom half of applications will receive written comments and scores from reviewers, the applications are not discussed in the study section meetings. The top half are presented in the meeting by the reviewers, then the entire study section votes using the same nine-point scale. The grants are then ranked by scores, and the best are funded based on how much money is available. Grants have to have a percentile better than the “payline,” which is, today, usually between 10 and 15 percent.
Given that there are far more applications than can be funded, and that only the best ones are even discussed, we hope that the study sections can agree on the grades they receive, especially at the top end of the spectrum.
In this study of the system, researchers obtained 25 funded proposals from the National Cancer Institute. Sixteen of them were considered “excellent,” as they were funded the first time they were submitted. The other nine were funded on resubmission — grant applications can be submitted twice — and so can still be considered “very good.”
They then set up mock study sections. They recruited researchers to serve on them just as they do on actual study sections. They assigned those researchers to grant applications, which were reviewed as they would be for the N.I.H. They brought those researchers together in groups of eight to 10 and had them discuss and then score the proposals as they would were this for actual funding.
The intraclass correlation — a statistic that refers to how much groups agree — was 0 for the scores assigned. This meant that there was no agreement at all on the quality of any application. Because they were concerned about the reliability of this result, the researchers also computed a Krippendorff’s alpha, another statistic of agreement. A score above 0.7 (range 0 to 1) is considered “acceptable.” None were; the values were all very close to zero. A final statistic measured overall similarity scores and found that scores for the same application were no more similar than scores for different applications.
There wasn’t even any difference between the scores for those funded immediately and those requiring resubmission.
It would be easy to mistake this study as a death knell for the peer review process. It’s not. A careful reader must note that all of the grants in this study were exceptional. They succeeded, after all. Since the N.C.I. funds only about 10 percent of grants, we’re looking only at proposals in the best decile, and it’s likely that there might be less variability in scores among those than among grants occupying the full spectrum of quality.
This should still concern us greatly. This system was devised back when more than half of submitted grants were funded. That’s very different than what we see today.
The current system favors low-risk research. If you’re going to fund only a small percentage of proposals, you tend to favor the ones most likely to show positive results. You don’t want to have to defend null findings as a “waste of money.”
The current system favors experienced researchers over new ones. They have thicker curriculum vitae, more preliminary data and name recognition. Moreover, they know how to work the system. At this point in my career, I know how to write multiple grants efficiently. I’m better at it than I used to be.
The current system can also be biased against women and minorities in ways that could keep them out of funding range. The system is not blinded, and many studies have shown that even after controlling for other factors, the ways in which grants are discussed, scored and funded can favor men over women, and whites over minorities.
If researchers are getting into the top 10 percent more than others based on such factors, especially with less and less money available, many great proposals — and many great researchers — are being sidelined inappropriately.
We may be missing out on a lot of excellent, and perhaps novel, work that can’t break into the top 10 percent because of structural problems. There are things we could do to fix that. One might be, of course, to increase funding across the board. John Ioannidis has proposed that we fund researchers, not research. A group of informaticists from Indiana University has suggested a percentage of funding be put to all scientists for a vote.
Other solutions are more radical. One might involve a modified lottery. The current system seems to do reasonably well at discriminating between “bad” and “good” grants. Once those good ones are put aside, we might do better by assigning funding through chance. Ferric Fang and Arturo Casadevall, who are researchers and journal editors, have proposed that such a system could reduce bias and increase diversity among researchers, suggesting that seniority and other factors still play too large a part in funding decisions.
They make the case that we already have a de facto lottery now, except it’s not random, and therefore unfair.
The current granting system doesn’t just fund the researchers of today — it also steers the careers of tomorrow. Should it fail, the repercussions will be felt for decades.