Austin alerted me to the April 2013 issue of Medical Care, which includes an important Point-Counterpoint regarding a central substantive and methodological disagreement within the world of medicine and public health. Ruth Etzioni and colleagues, and Joy Melnikow and colleagues present sharply different views on both the substance and the methodology.
On the substance, the opposing Medical Care pieces present different perspectives on prostate-specific antigen (PSA) screening for prostate cancer. Prostate cancer kills roughly 30,000 American men every year. So this is a critical population health concern.
Differences in methodological perspective are equally important. The U.S. Preventive Services Task Force, like much of the health services research community, holds a dim view of PSA testing. USPSTF’s skepticism reflects disappointing findings in two very large randomized clinical trials (RCTs). Yet some analytic models suggest that PSA screening is more beneficial than standard statistical analysis of RCT data would suggest. These issues may be especially important in reducing long-term mortality—which is inherently difficult to suss out in a realistic trial.
I sat down over Skype with Dr. William Dale to try to make sense of these questions. See Part I here, and Part II here.
William is section chief in geriatric and palliative care medicine here at the University of Chicago Medical Center. He treats many prostate cancer patients. He is also a highly trained practitioner and consumer of comparative effectiveness research. We’ve taught together for years. It was nice to sit down and talk shop in this different format.
Part I presents some basics of PSA screening. We discuss why, on average, PSA screening is overused. In too many cases, over-screening leads to unnecessary, even harmful interventions. There is continued resistance–among both patients and providers—to the judicious use of comparative effectiveness research.
We also discuss why any effort to promote uniform recommendations may be fundamentally the wrong approach in a heterogeneous population. There is, simultaneously, much overuse and much underuse of PSA screening. Low-risk American men and various elderly populations are screened too much. Yet specific subpopulations—some groups of African-American men, as well as men with first-degree-relative prostate cancer survivors—might even require more aggressive screening. On average, we almost certainly overscreen. Yet this fact about the overall population doesn’t tell the whole story.
Part II may be more interesting to people who’ve already thought about these issues in detail. We discuss the ironic fact that pervasive (and thus, in the aggregate, costly) therapies such as mammography and PSA testing are among the interventions most difficult to evaluate though a transparent randomized trial. PSA screening has been subject to two very large trials conducted over more than a decade.
Because death is a rare and delayed outcome, you need to follow a really huge sample over a long period of time. Study investigators originally anticipated that 25 percent of the control group would obtain PSA screening anyway. In fact, 52 percent of the control group did so. Moreover, the PSA screening trials were stopped early, given the lack of evidence of patient benefit associated with the screening. Even if the trial weren’t stopped early, it’s not always practical or useful to run a trial out twenty years when we need answers long before that time.
An efficient randomized trial includes relatively homogeneous treatment and control groups. Such homogeneity makes it easier to distinguish the impact of a specific treatment. Unfortunately the real patient population is far more diverse. For example, only four percent of the study population was African-American, even though African-Americans have markedly higher prostate cancer mortality than non-Hispanic whites.
For these reasons and others, even large trials often lack the sample size really required to tease out the differences between treatment and control groups. Analytic simulation models (exemplified by the CEPAC HIV model) could improve on the current situation. If one has a good model and a good set of biomarkers to capture the natural history of disease, such simulations can greatly complement randomized trials.
Despite these advantages, I’m skeptical that USPSTF will move away from RCTs as the central tool to evaluate medical and public health interventions. There is at least an apparent solidity and transparency to RCTs that makes these the gold standard for policymakers and those devising controversial guidelines of care. Micro-simulation models need too many moving parts, often require forbidding complexity, offer too many portals for special-interest critique and manipulation.
Thus, the randomized trial is king. For the most part, this is a good thing. But this leads key questions unanswered, too.