It is not unusual for a health services research study section (the group of experts who review research proposals and make funding recommendations) to include analysts who maintain that only randomized control trials (RCTs) yield valid causal inference, sitting beside analysts who have never randomized anything to anything. Two analysts debating the virtues of instrumental variables (IV) versus parametric sample selection models might be sitting next to analysts who never have heard of two-stage least squares.
Brian Dowd is right, and this is a vexing problem. (I quoted more of Dowd’s paper here. You’ll find it ungated here.) When I prepare to review grant proposals, I sometimes face a situation which I’ll stylize and blogify as follows.
This is an important, modifiable clinical issue facing bazillions of people. There’s an expensive pill for it, but it’s not clear how good it is relative to this even more expensive surgical approach some studies find efficacious. There’s no RCT comparing this pills to that surgery. There never will be either because that’d be unethical and/or impractical and/or prohibitively expensive. We propose to fill this void in the literature. But, oh yeah, we can’t do an RCT. Bummer. Sad trombone.
But, lo! Here’s a crap-ton of secondary data. We propose an observational study. Yeah, yeah! One little problem, though, we might have omitted variable bias because we can’t observe all important factors. (An RCT would fix this, but …)
To address this problem, we propose using propensity scores…
Wait, wait, wait, wait, wait, wait … hold up. You can’t tell me you’re going to fill an RCT void with propensity scores because they address confounding from unobservables. They do not do that. But, I totally feel your pain. I mean, you’re right, no RCT will ever be done on this. OK, OK, maybe in a million years, after there are a bunch of very suggestive observational studies—maybe then someone will fund a randomized intervention that gets at this. So, I’m with you, let’s do some good observational studies first.
But, will propensity scores convince anyone that you’re getting at a causal effect? I’m skeptical. Now if you had a good instrument …
Huhwut? IV? Come on, that never works. You can’t ever prove causality with IV. You IV guys just make contorted arguments and at the end of the day I never know if I can trust the instrument or to what population it applies. There are so many bad IV studies bamboozling people. Too confusing.
Besides, these guys are proposing to use rich dataset of demographics merged with a huge dataset of comorbidities mashed up with a big ‘ol dataset of environmental factors smacked upside a wicked cool dataset on provider social networks. Propensity scores are totally fine. Get outta here with your IV!
So, does this study deserve funding or not? There’s no objectively right answer. Maybe we should do the study both ways. Should review panels insist on that? If results conflict, which one is right?