• Please let us know if you see popup ads on TIE

    I saw a popup ad on TIE today (see below). This is unacceptable. We do not allow advertising here. But it’s happened before that some third-party we rely on (e.g., for traffic monitoring) pushes popups our way. This is bad behavior and we won’t tolerate it. We’ll dump that third-party, once we figure out who it is. My only other worry is that it’s my computer that’s the problem, not the site. (I will run a malware scan.)

    Have you seen an ad here? If so, let me know in the comments, by email, or Twitter. Please grab a screenshot and share it as well, if you can.



    Comments closed
  • Methods: RCT’s simplicity advantage

    In “Assessing the case for social experiments,” James Heckman and Jeffrey Smith warn against mistaking the apparent simplicity of randomized controlled trials for actual simplicity. For, they are not so simple when assumptions on which they rely are violated. (How often they are violated and the extent to which that threatens validity of findings  is not so clear, but it’s plausible that they are, to some extent, in a nontrivial proportion of cases.)

    In an experiment, the counterfactual is represented by the outcomes of a control group generated through the random denial of services to persons who would ordinarily be participants. [... T]wo assumptions must hold. The first assumption requires that randomization not alter the process of selection into the program, so that those who participate during an experiment do not differ from those who would have participated in the absence of an experiment. Put simply, there must be no “randomization bias.” Under the alternative assumption that the impact of the program is the same for everyone (the conventional common-effect model), the assumption of no randomization bias becomes unnecessary, because the mean impact of treatment on participants is then the same for persons participating in the presence and in the absence of an experiment.

    The second assumption is that members of the experimental control group cannot obtain close substitutes for the treatment elsewhere. That is, there is no ” substitution bias.” [...]

    It has been argued that experimental evidence on program effectiveness is easier for politicians and policymakers to understand. This argument mistakes apparent for real simplicity. In the presence of randomization bias or substitution bias, the meaning of an experimental impact estimate would be just as difficult to interpret honestly in front of a congressional committee as any nonexperimental study. The hard fact is that some evaluation problems have intrinsic levels of difficulty that render them incapable of expression in sound bites. Delegated expertise must therefore play a role in the formation of public policy in these areas, just as it already does in many other fields. It would be foolish to argue for readily understood but incompetent studies, whether they are experimental or not.

    Moreover, if the preferences and mental capacities of politicians are to guide the selection of an evaluation methodology, then analysts should probably rely on easily understood and still widely used before-after comparisons of the outcomes of program participants. Such comparisons are simpler to explain than experiments, because they require no discussions of selection bias and the rationale for a control group. Furthermore, before-after comparisons are cheaper than experiments. They also have the advantage, or disadvantage, depending on one’s political perspective, that they are more likely to yield positive impact estimates (at least in the case of employment and training programs) due to the well-known preprogram dip in mean earnings for participants in these programs.

    In fact, I frequently see policy arguments made with before-after type evidence. A familiar theme these days is that anything that’s happened in health care since March 2010 is due to Obamacare. Nothing could be more preposterous,* yet this is all a politician needs for a talking point.

    * Well that’s not true. It’d be more preposterous to say that anything that’s happened in health care since March 2010 is due to the 2020 presidential election. That would not fly as a talking point. Not yet, anyway.


    Comments closed
  • The quality of Medicare Advantage

    The following originally appeared on The Upshot (copyright 2014, The New York Times Company).

    Medicare Advantage plans — private plans that serve as alternatives to the traditional, public program for those that qualify for it — underperform traditional Medicare in one respect: They cost 6 percent more.

    But they outperform traditional Medicare in another way: They offer higher quality. That’s according to research summarized recently by the Harvard health economists Joseph Newhouse and Thomas McGuire, and it raises a difficult question: Is the extra quality worth the extra cost?

    It used to be easier to assess the value of Medicare Advantage. In the early 2000s,Medicare Advantage plans also cost taxpayers more than traditional Medicare. It also seemed that they provided poorer quality, making the case against Medicare Advantage easy. It was a bad deal.

    At that time, Medicare beneficiaries could switch between a Medicare Advantage plan and traditional Medicare each month. (Now, beneficiaries are generally locked into choices for all or most of a year.) In that setting, theMedicare Payment Advisory Commission (MedPAC) found that relatively healthier beneficiaries were switching into Medicare Advantage and relatively sicker ones were switching out.

    This suggested that Medicare Advantage didn’t provide the type of coverage or the access to services that unhealthier beneficiaries wanted or needed. Since the point of insurance is to pay for needed care when one is sick, it was tempting to condemn the program as having poor quality and failing to fulfill a basic requirement of coverage.

    But things have changed. Mr. Newhouse and Mr. McGuire show, for example, that by 2006-2007, health differences between beneficiaries in Medicare Advantage and those in traditional Medicare had narrowed. About the same proportion of beneficiaries in Medicare Advantage as in traditional Medicare rated their health as fair or poor. This suggests that sicker beneficiaries were not switching out of Medicare Advantage and healthier ones were not switching in to the extent they had been in earlier years.

    Also, in contrast to studies in the 1990s, more recent work finds that Medicare Advantage is superior to traditional Medicare on a variety of quality measures. For example, according to a paper in Health Affairs by John Ayanian and colleagues, women enrolled in a Medicare Advantage H.M.O. are more likely to receive mammography screenings; those with diabetes are more likely to receive blood sugar testing and retinal exams; and those with diabetes or cardiovascular disease are more likely to receive cholesterol testing.

    That Health Affairs paper also found that H.M.O. enrollees are more likely to receive flu and pneumonia vaccinations and about as likely to rate their personal doctor and specialists highly.

    There are reasons Medicare Advantage plans might promote higher-quality care. So long as beneficiaries don’t switch among plans too rapidly (and the evidence is that once they select a plan, they tend to stick with it), plans have a financial incentive to keep their enrollees healthy, incurring less downstream cost. It’s possible, therefore, that they may offer incentives to providers to perform preventive services.

    Moreover, in contrast to traditional Medicare, which must reimburse any provider willing to see beneficiaries enrolled in the program, Medicare Advantage plans establish networks of providers. This permits them, if they choose, to disproportionately exclude lower-quality doctors, ones who do not provide preventive services frequently enough, for example.

    Contemplating these more recent findings on quality alongside the higher taxpayer cost of Medicare Advantage plans invites some cognitive dissonance. On the one hand, we shouldn’t pay more than we need to in order to provide the Medicare benefit; we should demand that taxpayer-financed benefits be provided as efficiently as possible. Medicare Advantagedoesn’t look so good from this perspective.

    On the other hand, we want Medicare beneficiaries — which we all hope to be someday, if we’re not already — to receive the highest quality of care. Here, as far as we know from research to date, Medicare Advantage shines, at least relative to traditional Medicare.

    Is Medicare Advantage worth its extra cost? A decade ago when quality appeared poor, the answer was easy: No. Today one must think harder and weigh costs against program benefits, including its higher quality. The research base is still too thin to provide an objective answer. Mr. Newhouse and Mr. McGuire hedge but lean favorably toward Medicare Advantage, saying cuts in its “plan payments may be shortsighted.”


    Here’s a bonus chart, not provided in the original post, of some of the quality measure results described in the post.

    MA-FFS quality


    Comments closed
  • The history of the politics and abuse of methodology

    About my post on RCT’s gold-standard reputation, below is text of an email from a reader who wishes to remain anonymous. I’m posting it not because of the compliments (I do like them, though) but because I am grateful for the final two paragraphs on the history of the politics and abuse of methodology, about which I know very little.

    The comments are open for one week. Chime in if you know any relevant history. Bring the dirt! References welcome.

    I think when thinking about the role RCTs have in medicine you’re dead-on when saying the conceptual simplicity is really, really important. The people who read economics journals almost all have major quant training. Doctors are supposed to understand medical journals and most of them have very little.

    I’d bring up 2 related points.

    The first is FDA. B/c medications must be approved with 2 pivotal trials, we’re all used to seeing RCTs regularly and seeing them as literally the government’s official imprimatur of success.

    The second is marketing. In the ’90s pharma figured out how to use RCTs to their advantage. Design massive trials in highly-selected populations, don’t look hard for side effects, and don’t publish the negative trials. If p<0.05, market to everyone. Gold standard blockbuster! For example, there’s reason to worry if SSRIs help much of anyone. http://www.nejm.org/doi/full/10.1056/NEJMsa065779

    The third, supporting your contention, is history. In the ’90’s there were really 3 schools fighting over how clinical data should be used in clinical practice. At Yale, Alvan Feinstein wanted a very detail-oriented, methods-based clinical epidemiology. David Eddy instead envisioned systems of care involving RCTs, decision analyses, and decision support. Finally at McMaster, Sackett, Guyatt, etc, developed a very simple view of the evidence hierarchy. McMaster won. Their success in marketing a simple approach toward clinicians with books, curricula, and doctor-focused series in JAMA was central to that.


    Comments closed
  • Methods: There is no gold standard

    In “Instruments, Randomization, and Learning about Development,” Angus Deaton pulls no punches. He’s just as brutal, blunt, and precise about pitfalls and misuse of instrumental variables (IV) as randomized controlled trials (RCTs).

    I found insightful his emperor-has-no-clothes argument that the RCT is not deserving of its “gold standard” reputation, despite rhetoric to the contrary. I speculate RCTs have achieved their special status for several reasons:

    1. They are relatively conceptually simple, requiring less mathematical and statistical training than is required of many other methods. (Though, the basic explanation of them hides a lot of complexity, which leads to improper use and interpretation, as Deaton shows.)
    2. RCTs address the problem of confounding from unobservables (though this fact is not unique to RCTs), which, historically, has been a major impediment to causal inference in social sciences and in the advancement of medicine. (As Deaton explains, such confounding is not the only problem confronting empirical methods, and RCTs do not necessarily address the others better than nonexperimental methods.)
    3. RCTs lend themselves to a routinized enterprise of  evidence-based change (e.g., in medicine) in a way that other strong methods for causal inference do not (or not yet). Equivalently simple approaches that could be easily routinized offer far weaker support for causal inference. It is plausible to me that promotion of RCTs as the methodologically strongest approach to causality has spared us from many more studies of associations that can’t come even close to RCTs’ validity for causal inference, imperfect though it may be. It’s possible association-type studies could do a lot of damage to human welfare. (Evidence-based, pre-RCT medicine was pretty sketchy, for example.) This, perhaps, is the strongest moral justification for claiming that RCTs are “the gold standard,” even if they do not merit that unique standing: a world in which that is less widely believed could be much worse.
    4. Perhaps because of the forgoing features of RCTs, they have been adopted as the method of choice by high-powered professionals and educators in medical science (among other areas). When one is taught and then repeats that RCTs are “the gold standard” and one is a member of a highly respected class, that view carries disproportionate weight, even if there is a very good argument that it is not necessarily the correct view (i.e., Deaton’s, among others). Another way to say this is that the goldenness of RCTs’ hue should be judged on its merits of each application; we should be careful not to attribute to RCTs a goldenness present in the tint of glasses we’ve been instructed to wear.

    Let me be clear, Deaton is not claiming (nor am I) that some other method is better than RCTs. He is simply saying that there does not exist one method (RCTs, say) that deserves preferential status, superior to all others for all subjects and all questions about them. I agree: there is no gold standard.

    At the same time, applying some standards  in judging methodology is necessary. How this ought to be done varies by context. Official bodies charged with guarding the safety of patients (e.g., the FDA or the USPSTF) are probably best served with some fairly hard-and-fast rules about how to judge evidence. Too much room for judgement can also leave too much room for well-financed charlatans to sneak some snake oil through the gate.

    Academics and the merit review boards that judge their research proposals or the referees that comment on their manuscripts have more leeway. My view in this context is that a lot rides on the precise question one is interested in, the theoretical or conceptual model one (or the community of scholars) thinks applies to it, and the data available to address it, among other possible constraints. This is not a set-up for a clean grading system; there’s no substitute for expertise and opinions will vary. These are major limitation of the acceptance that there is no hierarchy to quality of methodology, in general.

    Below are my highlights from Deaton’s paper, with my emphasis added. Each bullet is a direct quote.

    On IV

    • [Analysts] go immediately to the choice of instrument [], over which a great deal of imagination and ingenuity is often exercised. Such ingenuity is often needed because it is difficult simultaneously to satisfy both of the standard criteria required for an instrument, that it be correlated with [treatment] and uncorrelated with [unobservables affecting outcomes]. [...] Without explicit prior consideration of the effect of the instrument choice on the parameter being estimated, such a procedure is effectively the opposite of standard statistical practice in which a parameter of interest is defined first, followed by an estimator that delivers that parameter. Instead, we have a procedure in which the choice of the instrument, which is guided by criteria designed for a situation in which there is no heterogeneity, is implicitly allowed to determine the parameter of interest. This goes beyond the old story of looking for an object where the light is strong enough to see; rather, we have at least some control over the light but choose to let it fall where it may and then proclaim that whatever it illuminates is what we were looking for all along.
    • Angrist and Jorn Steffen Pischke (2010) have recently claimed that the explosion of instrumental variables methods [] has led to greater “credibility” in applied econometrics. I am not entirely certain what credibility means, but it is surely undermined if the parameter being estimated is not what we want to know.
    • Passing an overidentification test does not validate instrumentation. [Here's why.]

    On RCTs

    • The value of econometric methods cannot and should not be assessed by how closely they approximate randomized controlled trials. [...] Randomized controlled trials can have no special priority. Randomization is not a gold standard because “there is no gold standard” []. Randomized controlled trials cannot automatically trump other evidence, they do not occupy any special place in some hierarchy of evidence, nor does it make sense to refer to them as “hard” while other methods are “soft.” These rhetorical devices are just that; metaphor is not argument, nor does endless repetition make it so.
    • One immediate consequence of this derivation is a fact that is often quoted by critics of RCTs, but often ignored by practitioners, at least in economics: RCTs are informative about the mean of the treatment effects [] but do not identify other features of the distribution. For example, the median of the difference is not the difference in medians, so an RCT is not, by itself, informative about the median treatment effect, something that could be of as much interest to policymakers as the mean treatment effect. It might also be useful to know the fraction of the population for which the treatment effect is positive, which once again is not identified from a trial. Put differently, the trial might reveal an average positive effect although nearly all of the population is hurt with a few receiving very large benefits, a situation that cannot be revealed by the RCT.
    • How well do actual RCTs approximate the ideal? Are the assumptions generally met in practice? Is the narrowness of scope a price that brings real benefits or is the superior ity of RCTs largely rhetorical? RCTs allow the investigator to induce variation that might not arise nonexperimentally, and this variation can reveal responses that could never have been found otherwise. Are these responses the relevant ones? As always, there is no substitute for examining each study in detail, and there is certainly nothing in the RCT methodology itself that grants immunity from problems of implementation.
    • In effect, the selection or omitted variable bias that is a potential problem in nonexperimental studies comes back in a different form and, without an analysis of the two biases, it is impossible to conclude which estimate is better—a biased nonexperimental analysis might do better than a randomized controlled trial if enrollment into the trial is nonrepresentative. 
    • Running RCTs to find out whether a project works is often defended on the grounds that the experimental project is like the policy that it might support. But the “like” is typically argued by an appeal to similar circumstances, or a similar environment, arguments that depend entirely on observable variables. Yet controlling for observables is the key to the matching estimators that are one of the main competitors for RCTs and that are typically rejected by the advocates of RCTs on the grounds that RCTs control not only for the things that we observe but things that we cannot. As Cartwright notes, the validity of evidence-based policy depends on the weakest link in the chain of argument and evidence, so that by the time we seek to use the experimental results, the advantage of RCTs over matching or other econometric methods has evaporated. In the end, there is no substitute for careful evaluation of the chain of evidence and reasoning by people who have the experience and expertise in the field. The demand that experiments be theory-driven is, of course, no guarantee of success, though the lack of it is close to a guarantee of failure. 

    The paper is very readable, though I skipped (or lightly skimmed) a middle section that did not appear to have a high density of general advice, if any. There’s some math, but it’s simple and, in some places, important for understanding key points, including a few I quoted above. Only in one or two spots did I find the words insufficient to understand the meaning. Perhaps they were a bit too efficient. Find the paper, ungated, here.


    Comments closed
  • Medicare Advantage upcoding

    According to a new paper by Richard Kronick and W. Pete Welch, upcoding by Medicare Advantage plans happens. Big time. This matters because Medicare Advantage (MA) plans are paid more for higher risk score enrollees. Here’s the money chart:

    risk scores

    Maybe the increase in MA risk scores relative to fee for service (FFS) Medicare is due to compositional changes. Do changes in the mix of MA enrollees explain it? Nope.

    On average over the 2004–2013 period, caseload dynamics had virtually no net effect on the difference between MA and FFS in the rate of growth of risk scores.

    If MA enrollees were actually getting sicker, we might expect their death rates to grow relative to FFS. Do they? Nope, the opposite, actually.

    Mortality rates declined somewhat more rapidly in MA than in FFS from 2004 to 2012, providing no support for the hypothesis that the underlying morbidity of MA beneficiaries increased relative to FFS.

    But how is it that MA is able to pull off this upcoding, relative to FFS?

    FFS diagnoses are drawn only from health care claims submitted for payment. MA plans may also review medical records and can report all diagnoses that are supported in the record, including those that were not reported by physicians on any health care claim or encounter record. MA plans can also employ nurses to visit enrollees in their homes to conduct health assessments and report diagnoses that are found. [...]

    FFS coding is known to be both incomplete and variable. Incomplete coding is evidenced by lack of persistence in coding of chronic conditions. For instance, among Medicare beneficiaries diagnosed with quadriplegia in one year, only 61% had a diagnosis of quadriplegia reported in the subsequent year (Medicare Payment Advisory Commission, 1998). Coding intensity also varies geographically; in the Hospital Referral Regions (HRRs) with the most intense practice patterns, the probability that a beneficiary is diagnosed with three or more chronic conditions is double the probability in low-intensity HRRs, with no evidence of differences in underlying prevalence.

    Could upcoding be a good thing? The authors offer that it’s possible, but they don’t believe so.

    Coding more carefully may have real health benefits. Better identification of problems and better documentation of problems that have been identified could improve the quality of treatment provided and may even lower costs—or they may lead to unnecessary treatment and higher costs. In either case, however, the purpose and design of the risk-adjusted payment system is not to improve the quality of coding. It is to ensure that plans are paid according to the health of the patients they enroll. It is unlikely that the increased payments achieved by plans through increased coding intensity are related to substantial health benefits that better coding might produce.

    This is the system. Plans get paid for sicker patients, according to codes. I don’t blame them for trying to code more completely (so long as it’s not outright fraud) and collecting more payment for it. I’d do it, were it my line of work. But, to the extent that it results in inefficient use of taxpayer resources, I also think it’s perfectly fine for the government to try to claw back some of that payment, and it does so.

    How much plans really upcode and how much the government should claw back  is a contentious, political issue, but one amenable to evidence. That evidence strongly suggests higher coding intensity with no increase in actual illness burden among MA enrollees. This is not a good reason to pay plans more, even if they are somehow deserving of higher payments for another reason (not that I believe that).


    Comments closed
  • You had one job

    Via You Had One Job:

    one job

    (This was probably planned to look like a failure.)


    Comments closed
  • Why opioid substitution therapy is not just replacing one addiction for another

    The following is the text of my email to Peter Friedmann, Brown University Professor of Medicine and Professor of Health Services, Policy & Practice and an expert on substance use and addiction medicine:

    In response to my NYT piece on treatment for opioid dependence, I received email with a question I thought I answered in the post: why should we replace one addiction for another?

    But now I suspect there’s something else behind this question that I did not address because I didn’t know how. Perhaps people think that longer-term opioid dependents are getting high rather than relieving cravings when they use. And, maybe some also think that by prescribing methadone or buprenorphine, one is just offering a legal way of obtaining that high.

    I can imagine that if one thinks that opioid users are getting high both ways, then using substitution therapy feels somehow like an illegitimate way to treat the chronic condition: replacing one addiction for another, as they say. Is there any truth to this?

    Peter’s response:

    The concern you describe speaks to a common misunderstanding of the definition of addiction.

    Substance addiction is a behavioral disorder characterized by a compulsive drive for the reinforcing euphoria, loss of control over it, and substance use despite adverse consequences leading to the sacrifice of normative life goals, values, coping mechanisms and functioning (including health, family, work, etc.) at the altar of the substance.

    People often confuse physical dependence (tolerance and withdrawal) with addiction (partly because the confusing DSM-IV and earlier definition of substance dependence was synonymous with addiction).  The oral, long-acting opioid agonists currently FDA approved to treat opioid use disorders produce physical dependence, but by diminishing the reinforcing effects of short-acting opioids they function to extinguish addictive behavior.

    what does it feel like

    The chart above gives you an idea of the difference – drug users of short-acting opioids like heroin or oxycodone spend most of their days either “high” or “sick” – and require multiple doses a day.  Over time the development of tolerance often means they are either sick or straight – they no longer get high (more complex discussion – but at this point they need a higher dose, more potent opioid or to change to a more direct route of administration to get high again – but eventually tolerance will occur again).

    The goal of maintenance doses of long-acting opioids like methadone or buprenorphine is to keep them straight (feeling normal) for most of the time, as opposed to feeling either “high” or “sick” most of the time (see chart below)


    The idea is that blocking the euphoria (positive reinforcement) and stopping the withdrawal symptoms (negative reinforcer for cessation) will extinguish the antisocial and dysfunctional behaviors.  Spending more time feeling normal also allows the patient to focus on the difficult process of self-exploration, life re-orientation and relationship re-building necessary for long-term remission and recovery.

    Now I know, and you do too. See also Peter’s interview with Harold Pollack.


    Comments closed
  • How not to compare observational studies to a randomized trial

    Initially, I was excited about this review from The Cochrane Collaboration, which concluded, “[O]n average, there is little evidence for significant effect estimate differences between observational studies and RCTs.”

    Then I read it and, unless I’m confused about something, I think they asked the wrong question and got a useless answer.

    In the literature from 1990-2013, the authors found 14 studies that compared results of observational studies to an RCT. A subset of these 14 studies examine one specific condition or treatment. That’s a worthwhile exercise, provided the observational studies used sound methods. (Some reasonable criteria must be applied.) I’d very much like to know if a collection of observational studies using good methods can be meta-analyzed to yield something close to the result of an RCT. This is basically a variant of the question of whether there’s wisdom in crowds.

    Another subset of the 14 studies identified consists of RCT vs observational comparisons that lump together a swath of conditions/treatments. This is dumb. I don’t care at all if a collection of observational studies looking at different conditions and treatments, on average, is close to corresponding RCTs. In fact, I’d expect, on average, that they would be close since they’d be randomly biased in different directions (effects both smaller and larger than found in RCTs).

    Worse, the Cochrane study took all of these 14 studies and meta analyzed them. Examining everything in one glop like this, they found exactly what you’d expect. With stuff randomly biased positive and negative, they got close to zero apparent overall bias. This is not useful information. It’s meta-analysis of things that are heterogeneous in a way that guarantees a result one could predict in advance.

    By the way, it’s not a forgone conclusion that (unbiased) observational studies should match RCT results. It’s reasonable to expect that an RCT focused on a carefully selected subpopulation might produce different results than an observational study (or a collection of them) focused on a broader population. RCTs are awesome at internal validity but not external validity. So, though it’s a worthwhile question if a collection of observational studies match an RCT on the same subject, it’s only a worthwhile question if you believe they should match, i.e., that the RCT’s sample is representative of the populations examined by the observational studies.

    Ultimately, there’s just no shortcut to looking carefully at study designs and samples.


    Comments closed
  • Sometimes you have no choice but to fund methods with limitations

    It is not unusual for a health services research study section (the group of experts who review research proposals and make funding recommendations) to include analysts who maintain that only randomized control trials (RCTs) yield valid causal inference, sitting beside analysts who have never randomized anything to anything. Two analysts debating the virtues of instrumental variables (IV) versus parametric sample selection models might be sitting next to analysts who never have heard of two-stage least squares.

    Brian Dowd is right, and this is a vexing problem. (I quoted more of Dowd’s paper here. You’ll find it ungated here.) When I prepare to review grant proposals, I sometimes face a situation which I’ll stylize and blogify as follows.

    Proposal text:

    This is an important, modifiable clinical issue facing bazillions of people. There’s an expensive pill for it, but it’s not clear how good it is relative to this even more expensive surgical approach some studies find efficacious. There’s no RCT comparing this pills to that surgery. There never will be either because that’d be unethical and/or impractical and/or prohibitively expensive. We propose to fill this void in the literature. But, oh yeah, we can’t do an RCT. Bummer. Sad trombone.

    But, lo! Here’s a crap-ton of secondary data. We propose an observational study. Yeah, yeah! One little problem, though, we might have omitted variable bias because we can’t observe all important factors. (An RCT would fix this, but …)

    To address this problem, we propose using propensity scores…

    Economist reviewer:

    Wait, wait, wait, wait, wait, wait … hold up. You can’t tell me you’re going to fill an RCT void with propensity scores because they address confounding from unobservables. They do not do that. But, I totally feel your pain. I mean, you’re right, no RCT will ever be done on this. OK, OK, maybe in a million years, after there are a bunch of very suggestive observational studies—maybe then someone will fund a randomized intervention that gets at this. So, I’m with you, let’s do some good observational studies first.

    But, will propensity scores convince anyone that you’re getting at a causal effect? I’m skeptical. Now if you had a good instrument …

    Biostats reviewer:

    Huhwut? IV? Come on, that never works. You can’t ever prove causality with IV. You IV guys just make contorted arguments and at the end of the day I never know if I can trust the instrument or to what population it applies. There are so many bad IV studies bamboozling people. Too confusing.

    Besides, these guys are proposing to use rich dataset of demographics merged with a huge dataset of comorbidities mashed up with a big ‘ol dataset of environmental factors smacked upside a wicked cool dataset on provider social networks. Propensity scores are totally fine. Get outta here with your IV!

    So, does this study deserve funding or not? There’s no objectively right answer. Maybe we should do the study both ways. Should review panels insist on that? If results conflict, which one is right?


    Comments closed