• Bias In, Bias Out

    Melissa Garrido, PhD (@GarridoMelissa) is the Associate Director of the Partnered Evidence-based Policy Resource Center (PEPReC) at the Boston VA Healthcare System, U.S. Department of Veterans Affairs, and a Research Associate Professor with the Department of Health Law, Policy, and Management at Boston University School of Public Health.

    **This is a research notebook entry — I’ve summarized a few recent articles on racial bias in predictive algorithms. The links below are a useful starting point if you are interested in learning more about this topic.

    The saying “garbage in, garbage out” is used to urge investigators to carefully consider the variables and data being fed into a statistical model. The same applies to racial bias. Without considering how structural racism contributes to and is reflected by data, modeling strategies, and interpretation, we risk perpetuating or worsening inequalities.

    Two articles about racial bias in clinical decision-making and predictive algorithms highlight their potential to reinforce and worsen racial disparities in health care access, quality, and outcomes:

    In Reconsidering the Use of Race Correction in Clinical Algorithms, Darshali Vyas, Leo Eisenstein, and David Jones provide examples of the unintended effects of including measures of race and ethnicity in clinical decision-making algorithms. For each example, they highlight an attendant concern about equity — many of the algorithms systematically produce different risk estimates for people of color or underrepresented groups than for white patients. Differences in risk estimates can lead to systematic differences in further diagnostic testing or treatment. For instance, the Vaginal Birth after Cesarean (VBAC) risk calculator assigns a lower probability of success with VBAC for African American and Hispanic women than for white women — increasing the likelihood that African American and Hispanic women undergo unnecessary Cesarean deliveries.

    In Dissecting Racial Bias in an Algorithm Used to Manage the Health of Populations, Ziad Obermeyer, Brian Powers, Christine Vogeli, and Sendhil Mullainathan demonstrate the ways in which racial bias can arise from models that do not include race as a variable. They use health system data that include components of an algorithm that is used by a large academic hospital to predict need for a care management plan, the predicted risk score, and outcomes. In an algorithm that does not include race as a predictor, the authors find evidence of calibration bias — Black and white patients have different risk scores despite having an identical set of other covariates. White patients with fewer health concerns were scored as higher risk than Black patients with more severe health needs, meaning that Black patients would be less likely than white patients to be referred to care management programs when appropriate.

    In this case, the bias arises from the fact that health care costs were used as the outcome with an assumption that health care costs are a valid marker of health care need. However, the Black patients in this sample had lower health care costs than white patients with similar levels of health needs — reflecting racial inequities in access to health care.

    Removal of race from a model may lead to improvements in some outcomes but declines in others. For instance, models to predict estimated glomerular filtration rates (eGFR), a measure of kidney function, systematically estimate that Black patients have better kidney function than white patients. This can lead to systematic delays in referral to specialty care among Black patients. Removing race from the model may improve specialty care access but lead to decreases in eligibility for certain medications among Black patients and decreases in the number of Black adults who are eligible to be kidney donors. Creators and users of predictive models and clinical decision-making tools need to carefully think through unintended consequences of modeling decisions. In the case of eGFR, a joint National Kidney Foundation – American Society of Nephrology task force will be issuing recommendations on potential changes to kidney function estimation in early 2021.

    The choice to include or exclude race as a covariate in a model is just one of many modeling choices that can influence the degree to which bias is included or perpetuated in a model. Predictive models that are developed using data on a non-representative patient population are unlikely to produce accurate or meaningful estimates for broader groups of patients.  In models developed with electronic health record data, inequities in regular access to care may mean there is less data available for patients at risk of adverse outcomes. Where possible, it may be better to incorporate data that occurs early on in an illness and that is less dependent on patients’ access to regular follow-up visits. Directed acyclic graphs (DAGs) may be helpful for carefully thinking through the ways in which structural inequality influences relationships among variables (such as access to health care and rates of health care use) in a model and the inferences that can be made from the model.

    In addition, the goals of the model should be considered when determining whether it is fair. If developers seek to create a statistical model that performs similarly across different groups of patients, the choice of performance metric may lead to unintended consequences. Improved calibration of predicted and observed outcomes across groups may come at the price of increased false negatives or false positives in one group of patients.

    Predictive models play a large role in guiding decisions about treatment and resource allocation — close attention to their development and use is needed to guard against inequities in health care access and outcomes.