May 18, 2022 – Imagine walking into the Library of Congress, with its millions of books, and having the goal of reading them all. Impossible, right? Even if you could read every word of every work, you wouldn’t be able to remember or understand everything, even if you spent a lifetime trying.
Now let’s say you somehow had a super-powered brain capable of reading and understanding all that information. You would still have a problem: You wouldn’t know what wasn’t covered in those books – what questions they’d failed to answer, whose experiences they’d left out.
Similarly, today’s researchers have a staggering amount of data to sift through. All the world’s peer-reviewed studies contain more than 34 million citations. Millions more data sets explore how things like bloodwork, medical and family history, genetics, and social and economic traits impact patient outcomes.
Artificial intelligence lets us use more of this material than ever. Emerging models can quickly and accurately organize huge amounts of data, predicting potential patient outcomes and helping doctors make calls about treatments or preventive care.
Advanced mathematics holds great promise. Some algorithms – instructions for solving problems – can diagnose breast cancer with more accuracy than pathologists. Other AI tools are already in use in medical settings, allowing doctors to more quickly look up a patient’s medical history or improve their ability to analyze radiology images.
But some experts in the field of artificial intelligence in medicine suggest that while the benefits seem obvious, lesser noticed biases can undermine these technologies. In fact, they warn that biases can lead to ineffective or even harmful decision-making in patient care.
New Tools, Same Biases?
While many people associate “bias” with personal, ethnic, or racial prejudice, broadly defined, bias is a tendency to lean in a certain direction, either in favor of or against a particular thing.
In a statistical sense, bias occurs when data does not fully or accurately represent the population it is intended to model. This can happen from having poor data at the start, or it can occur when data from one population is applied to another by mistake.
Both types of bias – statistical and racial/ethnic – exist within medical literature. Some populations have been studied more, while others are under-represented. This raises the question: If we build AI models from the existing information, are we just passing old problems on to new technology?
“Well, that is definitely a concern,” says David M. Kent, MD, director of the Predictive Analytics and Comparative Effectiveness Center at Tufts Medical Center.
In a new study, Kent and a team of researchers examined 104 models that predict heart disease – models designed to help doctors decide how to prevent the condition. The researchers wanted to know whether the models, which had performed accurately before, would do as well when tested on a new set of patients.
The models “did worse than people would expect,” Kent says.
They were not always able to tell high-risk from low-risk patients. At times, the tools over- or underestimated the patient’s risk of disease. Alarmingly, most models had the potential to cause harm if used in a real clinical setting.
Why was there such a difference in the models’ performance from their original tests, compared to now? Statistical bias.
“Predictive models don’t generalize as well as people think they generalize,” Kent says.
When you move a model from one database to another, or when things change over time (from one decade to another) or space (one city to another), the model fails to capture those differences.
That creates statistical bias. As a result, the model no longer represents the new population of patients, and it may not work as well.
That doesn’t mean AI shouldn’t be used in health care, Kent says. But it does show why human oversight is so important.
“The study does not show that these models are especially bad,” he says. “It highlights a general vulnerability of models trying to predict absolute risk. It shows that better auditing and updating of models is needed.”
But even human supervision has its limits, as researchers caution in a new paper arguing in favor of a standardized process. Without such a framework, we can only find the bias we think to look for, the they note. Again, we don’t know what we don’t know.
Bias in the ‘Black Box’
Race is a mixture of physical, behavioral, and cultural attributes. It is an essential variable in health care. But race is a complicated concept, and problems can arise when using race in predictive algorithms. While there are health differences among racial groups, it cannot be assumed that all people in a group will have the same health outcome.
David S. Jones, MD, PhD, a professor of culture and medicine at Harvard University, and co-author of Hidden in Plain Sight – Reconsidering the Use of Race Correction in Algorithms, says that “a lot of these tools [analog algorithms] seem to be directing health care resources toward white people.”
Around the same time, similar biases in AI tools were being identified by researchers Ziad Obermeyer, MD, and Eric Topol, MD.
The lack of diversity in clinical studies that influence patient care has long been a concern. A concern now, Jones says, is that using these studies to build predictive models not only passes on those biases, but also makes them more obscure and harder to detect.
Before the dawn of AI, analog algorithms were the only clinical option. These types of predictive models are hand-calculated instead of automatic.
“When using an analog model,” Jones says, “a person can easily look at the information and know exactly what patient information, like race, has been included or not included.”
Now, with machine learning tools, the algorithm may be proprietary – meaning the data is hidden from the user and can’t be changed. It’s a “black box.” That’s a problem because the user, a care provider, might not know what patient information was included, or how that information might affect the AI’s recommendations.
“If we are using race in medicine, it needs to be totally transparent so we can understand and make reasoned judgments about whether the use is appropriate,” Jones says. “The questions that need to be answered are: How, and where, to use race labels so they do good without doing harm.”
Should You Be Concerned About AI in Clinical Care?
Despite the flood of AI research, most clinical models have yet to be adopted in real-life care. But if you are concerned about your provider’s use of technology or race, Jones suggests being proactive. You can ask the provider: “Are there ways in which your treatment of me is based on your understanding of my race or ethnicity?” This can open up dialogue about the provider makes decisions.
Meanwhile, the consensus among experts is that problems related to statistical and racial bias within artificial intelligence in medicine do exist and need to be addressed before the tools are put to widespread use.
“The real danger is having tons of money being poured into new companies that are creating prediction models who are under pressure for a good [return on investment],” Kent says. “That could create conflicts to disseminate models that may not be ready or sufficiently tested, which may make the quality of care worse instead of better.”
David M. Kent, MD, director, Predictive Analytics and Comparative Effectiveness Center, Tufts Medical Center.
David S. Jones, MD, PhD, professor of culture and medicine, Harvard University.
Journal of the American Medical Association: “Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer.”
Circulation: Cardiovascular Quality and Outcomes: Generalizability of cardiovascular disease clinical prediction models: 158 independent external validations of 104 unique models.”
ACM Digital Library: “MedKnowts: Unified documentation and information retrieval for electronic health records.”
The Lancet: “Artificial intelligence, bias, and patients' perspectives.”
The Lancet Digital Health: “Artificial intelligence in medical imaging: switching form radiographic pathological data to clinically meaningful endpoints.”
The New England Journal of Medicine: “Hidden in plain sight – reconsidering the use of race correction in clinical algorithms.”