The following is a guest post from Michael McWilliams, MD, PhD; Michael E. Chernew, PhD; Bruce E. Landon, MD, MBA, MSc; and Aaron L. Schwartz, PhD. It responds to the comments by Dr. Stephen Soumerai and Dr. Ross Koppel.
We appreciate the interest that Soumerai and Koppel have shown in our work and agree with them that organizational selection into the voluntary Pioneer ACO program is the most important potential source of bias to address in an evaluation of the program’s effects. Contrary to Soumerai and Koppel’s assertion, however, we explicitly recognize this potential threat to validity and address it empirically to the extent possible.
The details of the selection process by which CMMI chose organizations from a pool of voluntary applicants are rather immaterial. The bottom line, as we note in our discussion and as noted by Soumerai and Koppel, is that the Pioneer ACOs differ substantially from much of the delivery system in many respects, including their capacity to bear financial risk and manage care. The non-equivalence of the control group has implications for both the external validity (generalizability) and internal validity of our findings, but it is important not to conflate the two.
Clearly, one would not expect the response of Pioneer ACOs to new payment incentives to generalize to less advanced provider organizations. We do not contend that our results generalize in this way. This is emphasized in our paper and well understood by CMS, as evidenced by the availability of one-sided ACO contracts and the advanced payment model in the Medicare Shared Savings Program. But the causal effect of ACO contracts on spending among organizations participating in the Pioneer program is nevertheless of interest, particularly because a major goal of ACO models is to encourage advancement of the delivery system to Pioneer levels and beyond. Whether our estimates represent the causal effect of Pioneer ACO contracts on spending in the organizations that participated in the program is a question of internal validity.
For selection to threaten the internal validity of our findings, the participating organizations would have had to select into the program in a manner that was related to changes in spending that would have occurred in the absence of ACO contract incentives. We conducted multiple tests that suggest this scenario is unlikely. The tests are not “Herculean.” Rather, they are simply standard tests of key assumptions made in difference-in-differences comparisons. (Also, as we noted in the paper, results were virtually identical in an analysis without statistical adjustments). In particular, the spending levels and trends over the 3 pre-contract years were nearly identical for the ACO and control groups. Thus, we would not expect a differential reduction in spending in the ACO group without a differential change in behavior. And we would not expect a differential change in behavior without a differential change in incentives, because lowering spending under fee-for-service reimbursement would run counter to an organization’s financial interest.
Soumerai and Koppel express concern about ACOs delaying “cost saving measures until they are paid for.” But that is not a threat to internal validity. It doesn’t matter if Pioneer ACOs were waiting for an opportunity for financial incentives to be aligned with their capacity (latent or newly developed) to provide more cost-effective care. That would not invalidate the interpretation that the change in payment incentives elicited the savings among the Pioneer ACOs. This is a critical point that Soumerai and colleagues have failed to recognize in their critiques of our evaluations of Medicare and commercial ACO initiatives. Their argument only pertains to the generalizability of our results, which we agree is limited to organizations similar to Pioneer ACOs.
Those who read our paper in its entirety will find this discussion of selection bias summarized in the limitations section. Study limitations are always challenging to cover exhaustively because of the draconian space constraints imposed by clinical journals, but we believe we convey the critical points:
Our study had several limitations. First, although the characteristics of the patients differed minimally between the ACO group and the control group, the Pioneer program is voluntary, and program participants differ from nonparticipating providers in many respects. Baseline differences in spending between ACOs and non- ACO providers were minimal, however, and we adjusted for those differences by means of difference- in-differences comparisons. Second, organizations may have decided to participate in the Pioneer program because of ongoing or planned efforts to constrain spending. Similar spending trends between ACOs and other providers suggested that no such efforts were under way during the precontract period, however, and constraining fee-for-service spending would not have served the financial interests of the organizations without participation in alternative payment models. Thus, although our findings regarding Pioneer ACOs may not be generalizable to other provider organizations, they do suggest that differential spending changes were related to the start of ACO contracts.
Of course, the possibility of residual unmeasured confounding still remains, as it does in any observational study. A randomized trial would of course be better. But randomizing provider groups to payment systems is infeasible, and even if it were feasible, CMS did not randomize providers to the ACO programs; so we must evaluate them as implemented. In the absence of random assignment or even a naturally occurring exogenous predictor of changes in payment (e.g., a valid instrument), there is a larger question about our role as researchers at the heart of this discussion. We can allow imperfections to relegate us to the sidelines or we can acknowledge that the world is messy, stay on the playing field, and use the best available methods to inform policy. Do we let the most significant Medicare payment reform in three decades go without a rigorous, objective academic evaluation because it was not randomized? We think not. The ACO programs are evolving and expanding rapidly, and policymakers and the public need the best available evidence from studies using the best available methods to guide their evolution and learn from their successes and failures.
Indeed, it was in the spirit of advancing the dialogue and policy that we conducted the study. Soumerai and Koppel focus on the overall 1.2% savings, but arguably the more important findings relate to differences in performance, or lack thereof, between ACOs that differ in key respects. We characterize the 1.2% spending reduction for what it is—modest, and even more modest after netting out CMS bonus payments, as we do. In our concluding paragraph, we also note the importance of accounting for the costs to ACOs of controlling spending (costs that are difficult to measure) in any estimation of the social value of the ACO programs. But the modest spending reductions, coupled with unchanged or improved performance on quality measures (including patient experiences), are nevertheless important because they suggest a promising behavioral response. In some areas of expected waste, particularly post-acute care, we find larger savings that are not so modest.
Will the savings grow over time? Only time will tell, but we would note that savings grew substantially over the first four years of the Blue Cross Blue Shield of Massachusetts’ Alternative Quality Contract, with initial savings in year 1 similar to those we found in the Pioneer program. Moreover, our findings of higher savings among ACOs with initially higher risk-adjusted spending and savings among ACOs that dropped out of the program have important implications for the design of ACO payment models now. Policymakers cannot wait 4 years for evidence to guide regulatory changes, because the changes need to be made now for the ACO programs to have any hope of getting the incentive right and building on their early modest success.
Two other misinformed notions advanced by Soumerai and Koppel merit response. First, the comparison of the Pioneer ACO program to a pay-for-performance program is not useful because it is not a pay-for-performance program. Second, as a full reading of our methods and appendix would make clear, our finding of greater savings among ACOs with higher initial spending was not due to regression to the mean. We addressed mean reversion by using 2008 values to categorize ACOs, effectively using differences in spending levels before the study period that were driven in part by randomness (the source of regression to the mean) to instrument for stable differences from 2009-2011. We demonstrated that after initial mean reversion from 2008 to 2009, there was no evidence of further reversion from 2009 to 2011 (our pre-period). So there is no statistical basis to expect further regression to the mean from 2011 to 2012.
We do not conclude that the Pioneer ACO program is a resounding success, that Medicare official should go home and declare victory over modest savings in the first year. We agree with Soumerai and Koppel about the importance of learning from failure, and the withdrawal from the Pioneer program by 13 of 32 organizations is certainly a red flag. For that reason, we used an appropriate quasi-experimental design to produce findings that not only provide an objective estimate of the effect of current policy but also suggest concrete steps to improve the Medicare ACO programs. Although a lot more needs to be learned, we believe our study contributes much “to advance the dialogue.” Those contributions were recognized by the journal that published it and the experts that reviewed it. We also agree with Soumerai and Koppel that as researchers “we have an obligation to consider methodological issues to inform ourselves, our students and healthcare policy.” That is why we are and will remain out on the field.