How much Medicare data are being withheld anyway?

From 2012 forward, Medicare research data (and Medicaid’s from 2010 onward) do not include substance use disorder related claims. How much data are we talking about? This post answers that question.

(Catch up on this ongoing story by reading posts under the CMS-SUD tag.)

An anonymous, credible source with Medicare claims data ran some numbers using diagnosis and procedure codes in the spreadsheet I posted. With them, we can draw some strong inferences about the amount of data that are withheld from Medicare claims, among other things.

The answers seem to be that ~5% of inpatient (MEDPAR) claims, ~1% of outpatient institutional (OP-SAF) claims, and ~0.5% of outpatient physician (Carrier SAF) are missing. This was assessed by examining complete 2011 files (or random samples of them) and counting how many claims are caught by the dragnet defined by the codes in the spreadsheet I posted, relative to the total.

Next, the analyst examined 2012 files and found that they contained no (or very nearly no) claims using the same methodology. Independently, a separate, credible research group at a different institution, which will also remain anonymous, conducted a similar analysis with similar findings.

Here’s what I can conclude from this information and from speaking and emailing with researchers at these institutions:

  1. Claims with any of the codes in the spreadsheet are being removed. This is new information, as there was some ambiguity about exactly how the codes in the spreadsheet are being used by CMS, which provides the data. Now we know. There’s no reason to believe any additional data are removed, though that cannot be ruled out from the analysis above.
  2. The proportion of removed claims, though somewhat small relative to overall Medicare claims, would deliver a devastating blow to any analysis of substance use or diseases that disproportionately affect substance users (e.g., HIV, hepatitis C, among others), and would certainly be detrimental to comorbidity controls standard in analysis of any population.
  3. As this is a nonrandom sample deletion, certain populations, regions, and institutions are likely to be disproportionately affected.
  4. The above only pertains to Medicare data. I hypothesize that Medicaid data would be equally or more severely affected, but one can only know by crunching the data.
  5. None of the senior investigators I spoke with were aware of the data deletion when they received the data or, in one case, before I informed the individual about it. These groups had to look at the data to discover the extent of the problem.
  6. This and other aspects of the story suggests there has been a failure to adequately disclose and communicate this issue and the decisions that led to it. For this reason, TIE has become the clearinghouse on this issue: please send any information you have.

We will continue to post details as they become available to us.


Hidden information below


Email Address*