Ai recognition of patient race in medical imaging
By Sam Marie Engle
Can computers figure out your race by looking at your wrist bones or lungs? Yes, according to a study published today by the prestigious scientific journal The Lancet Digital Health. That’s not the whole story, though: the bigger issue is researchers don’t know how the machines do it.
The findings come after months of work by a team of experts in radiology and computer science led by Judy W. Gichoya, MD, MS, assistant professor and director of the Healthcare Innovations and Translational Informatics Lab in the Emory University School of Medicine’s Department of Radiology and Imaging Sciences. Joining her from Emory are Radiology’s Hari Trivedi, MD, assistant professor, Anant Bhimireddy, MS, systems software engineer, and computer science student Zachary Zaiman. The team includes colleagues in the United States from Georgia Tech, MIT, Stanford, Indiana University Perdue University, and Arizona State, plus experts in Canada, Taiwan, and Australia.
Can computers figure out your race by looking at your wrist bones or lungs? Yes, according to a study published today by the prestigious scientific journal The Lancet Digital Health. That’s not the whole story, though: the bigger issue is researchers don’t know how the machines do it.
The findings come after months of work by a team of experts in radiology and computer science led by Judy W. Gichoya, MD, MS, assistant professor and director of the Healthcare Innovations and Translational Informatics Lab in the Emory University School of Medicine’s Department of Radiology and Imaging Sciences. Joining her from Emory are Radiology’s Hari Trivedi, MD, assistant professor, Anant Bhimireddy, MS, systems software engineer, and computer science student Zachary Zaiman. The team includes colleagues in the United States from Georgia Tech, MIT, Stanford, Indiana University Perdue University, and Arizona State, plus experts in Canada, Taiwan, and Australia.
The team used large-scale medical imaging datasets from both public and private sources, datasets with thousands of chest x-rays, chest CT scans, mammograms, hand x-rays, and cervical spine x-rays from racially diverse patient populations.
They found that standard deep learning models—computer models developed to help speed the task of reading and detecting things like fractures in bones and pneumonia in lungs—could predict with startling accuracy the self-reported race of a patient from a radiologic image, despite the image having no patient information associated with it.
The immediate question was whether the models, also known as artificial intelligence (AI), were determining race based on what researchers call surrogate covariables. Breast density, for example, tends to be higher in African American women than in white women, and research shows African American patients tend to have higher bone mineral density than white patients, so were the machines reading breast tissue density or bone minerality as proxies for race? The researchers tested this theory by suppressing the availability of such information to the AI model and it still predicted patient race with alarming accuracy: more than 90% accurate.
Even more surprising, the AI models could determine race more accurately than complex statistical analyses developed specifically to predict race based on age, sex, gender, body mass, and even disease diagnoses.
The AI models worked just as well on x-rays, mammograms, and CT scans and were effective no matter which body part was imaged. Finally, the deep learning models still correctly predicted self-reported race when images were deliberately degraded to ensure the quality and age of the imaging equipment wasn’t signaling socioeconomic status, which in turn could correlate with race. Fuzzy images, high resolution images downgraded to low resolution, and scans clipped to remove certain features did not significantly affect the AI models’ ability to predict a patient’s race.
They found that standard deep learning models—computer models developed to help speed the task of reading and detecting things like fractures in bones and pneumonia in lungs—could predict with startling accuracy the self-reported race of a patient from a radiologic image, despite the image having no patient information associated with it.
The immediate question was whether the models, also known as artificial intelligence (AI), were determining race based on what researchers call surrogate covariables. Breast density, for example, tends to be higher in African American women than in white women, and research shows African American patients tend to have higher bone mineral density than white patients, so were the machines reading breast tissue density or bone minerality as proxies for race? The researchers tested this theory by suppressing the availability of such information to the AI model and it still predicted patient race with alarming accuracy: more than 90% accurate.
Even more surprising, the AI models could determine race more accurately than complex statistical analyses developed specifically to predict race based on age, sex, gender, body mass, and even disease diagnoses.
The AI models worked just as well on x-rays, mammograms, and CT scans and were effective no matter which body part was imaged. Finally, the deep learning models still correctly predicted self-reported race when images were deliberately degraded to ensure the quality and age of the imaging equipment wasn’t signaling socioeconomic status, which in turn could correlate with race. Fuzzy images, high resolution images downgraded to low resolution, and scans clipped to remove certain features did not significantly affect the AI models’ ability to predict a patient’s race.
The Real Danger
“The real danger is the potential for reinforcing race-based disparities in the quality of care patients receive,” says Dr. Gichoya.
“In radiology, when we are looking at x-rays and MRIs to determine the presence or absence of disease or injury, a patient’s race is not relevant to that task,” she explains. “We call that being race agnostic: we don’t know, and don’t need to know someone’s race to detect a cancerous tumor in a CT or a bone fracture in an x-ray.”
That’s good because, while unconscious bias for radiologic interpretation is not well understood, unconscious biases about racial groups have resulted in well-documented differences in the quality and kind of health care people receive in other specialties due to their race.
Finding ways to lighten increasing radiology workloads is a major reason why radiologists are turning to deep machine learning or AI: can certain diagnostic tasks be automated so radiologists can spend more time on complex cases? That’s the hope . . .and the danger.
“We don’t know how the AI models are detecting race so we can’t develop an easy solution,” says Dr. Gichoya. “Just as with human behavior, there’s not a simple solution to fixing bias in machine learning. The worst thing you can do is try to simplify a complex problem.”
And yet, that’s what critics are demanding.
“In radiology, when we are looking at x-rays and MRIs to determine the presence or absence of disease or injury, a patient’s race is not relevant to that task,” she explains. “We call that being race agnostic: we don’t know, and don’t need to know someone’s race to detect a cancerous tumor in a CT or a bone fracture in an x-ray.”
That’s good because, while unconscious bias for radiologic interpretation is not well understood, unconscious biases about racial groups have resulted in well-documented differences in the quality and kind of health care people receive in other specialties due to their race.
Finding ways to lighten increasing radiology workloads is a major reason why radiologists are turning to deep machine learning or AI: can certain diagnostic tasks be automated so radiologists can spend more time on complex cases? That’s the hope . . .and the danger.
“We don’t know how the AI models are detecting race so we can’t develop an easy solution,” says Dr. Gichoya. “Just as with human behavior, there’s not a simple solution to fixing bias in machine learning. The worst thing you can do is try to simplify a complex problem.”
And yet, that’s what critics are demanding.
The Power and Price of Public Engagement
The team first published their work in July 2021 on arXiv.org, Cornell University’s open-access platform for scientists to post and debate scholarly articles in the quantitative sciences; the site has more than two million articles. While moderated and screened for scholarly value, work is not peer-reviewed, that is, has not undergone rigorous, independent analysis by experts in the same field of study and determined to be valid, accurate, and original. Nevertheless, publishing on arXiv provided the team with what it needed: guidance on what their next steps should be.
The paper, titled “Reading Race: AI Recognizes Patient’s Racial Identity in Medical Images” (https://arxiv.org/abs/2107.10356), sparked a firestorm.
The debate ranged from uninformed questioning of the carefully explained methodology to disbelieving the researchers’ claims they had sufficiently tested potential sources of racial identity in the datasets, to outrage for publishing in the first place.
“At one point, some of the editors were saying how dare we publish this paper without peer review,” says Dr. Gichoya. “The findings were so troublesome, we felt it was important to make the work available so others could work on it, too. We even published the code along with the article because we wanted to see if others could see something we missed.”
The commitment to open access is emblematic of Dr. Gichoya’s overall approach to research. She calls it Hive Mind. “No one person can be the savior for everyone. We’re trying to improve the quality of health care and save lives and we’re working against the clock. The more people you have working on a problem, the faster you will find a solution and the better the results will be.”
The work also received significant media attention. Radiology trade publications covered it in pieces like AuntMinnie.com’s Is Radiology AI Technology Racist? Radiology association newsletters picked up the story. Even non-health sciences publications like Vice and Wired jumped into the fray in the US, as did Quillette in Australia. Thousands of tweets and retweets continued the debate.
“This has been the biggest paper of my career,” Dr. Gichoya says, who was named Most Influential Radiology Researcher in October 2021 by Aunt Minnie.
Dr. Gichoya has given dozens of interviews and grand rounds presentations for universities as well as the Medical Imaging and Data Resource Center, a collaboration among the American College of Radiology, the Radiologic Society of North America, and the American Association of Physicists in Medicine. She even gave a well-attended presentation to the U.S. National Institutes of Health, where she is a Fogarty Data Fellow. In every talk, she emphasized the same point: the ability of AI to predict racial identity is not the issue of importance.
“The concern is that AI models can easily learn to predict race without us even knowing it. Human experts cannot similarly identify racial identity from medical images, so we can’t mitigate this problem.”
As the debated moved out of scientific circles, the team received backlash for daring to suggest racial bias was even a problem.
Lauren Oakden-Rayner, a collaborator from the Australian Institute for Machine Learning at the University of Adelaide, readily responded to one such complaint to a blog post she penned about the research. “We are not arguing for the removal of race from medical decision making, AI or otherwise. What we are saying is AI does something we can’t, and that it does so in a hidden way, so we need to a) be aware of it, and b) make it more explicit so we can choose how and when to use the information, rather than relying on an algorithm trained on biased practice to decide for us.”
The paper, titled “Reading Race: AI Recognizes Patient’s Racial Identity in Medical Images” (https://arxiv.org/abs/2107.10356), sparked a firestorm.
The debate ranged from uninformed questioning of the carefully explained methodology to disbelieving the researchers’ claims they had sufficiently tested potential sources of racial identity in the datasets, to outrage for publishing in the first place.
“At one point, some of the editors were saying how dare we publish this paper without peer review,” says Dr. Gichoya. “The findings were so troublesome, we felt it was important to make the work available so others could work on it, too. We even published the code along with the article because we wanted to see if others could see something we missed.”
The commitment to open access is emblematic of Dr. Gichoya’s overall approach to research. She calls it Hive Mind. “No one person can be the savior for everyone. We’re trying to improve the quality of health care and save lives and we’re working against the clock. The more people you have working on a problem, the faster you will find a solution and the better the results will be.”
The work also received significant media attention. Radiology trade publications covered it in pieces like AuntMinnie.com’s Is Radiology AI Technology Racist? Radiology association newsletters picked up the story. Even non-health sciences publications like Vice and Wired jumped into the fray in the US, as did Quillette in Australia. Thousands of tweets and retweets continued the debate.
“This has been the biggest paper of my career,” Dr. Gichoya says, who was named Most Influential Radiology Researcher in October 2021 by Aunt Minnie.
Dr. Gichoya has given dozens of interviews and grand rounds presentations for universities as well as the Medical Imaging and Data Resource Center, a collaboration among the American College of Radiology, the Radiologic Society of North America, and the American Association of Physicists in Medicine. She even gave a well-attended presentation to the U.S. National Institutes of Health, where she is a Fogarty Data Fellow. In every talk, she emphasized the same point: the ability of AI to predict racial identity is not the issue of importance.
“The concern is that AI models can easily learn to predict race without us even knowing it. Human experts cannot similarly identify racial identity from medical images, so we can’t mitigate this problem.”
As the debated moved out of scientific circles, the team received backlash for daring to suggest racial bias was even a problem.
Lauren Oakden-Rayner, a collaborator from the Australian Institute for Machine Learning at the University of Adelaide, readily responded to one such complaint to a blog post she penned about the research. “We are not arguing for the removal of race from medical decision making, AI or otherwise. What we are saying is AI does something we can’t, and that it does so in a hidden way, so we need to a) be aware of it, and b) make it more explicit so we can choose how and when to use the information, rather than relying on an algorithm trained on biased practice to decide for us.”
Validation and What's Next
Publication by The Lancet Digital Health, which is peer-reviewed, is powerful validation of the research itself and of its implications.
The real fear, Dr. Gichoya says, is that all AI model deployments in medical imaging are at great risk for causing great harm.
“If an AI model starts to rely on its ability to detect racial identity to make medical decisions, but in doing so produces race-specific errors, clinical radiologists will not be able to tell, thereby possibly leading to errors in health-care decision processes. That will worsen the already significant health disparities we already see in our health care system.”
And because of that danger, the team already is working on a second study. They will not stop at detecting bias, Dr. Gichoya says. “This ability to read race could be used to develop models that actually mitigate bias, once we understand it. We can harness it for good.”
The real fear, Dr. Gichoya says, is that all AI model deployments in medical imaging are at great risk for causing great harm.
“If an AI model starts to rely on its ability to detect racial identity to make medical decisions, but in doing so produces race-specific errors, clinical radiologists will not be able to tell, thereby possibly leading to errors in health-care decision processes. That will worsen the already significant health disparities we already see in our health care system.”
And because of that danger, the team already is working on a second study. They will not stop at detecting bias, Dr. Gichoya says. “This ability to read race could be used to develop models that actually mitigate bias, once we understand it. We can harness it for good.”