Nov. 10, 2023 – You may have used ChatGPT-4 or one of the other new artificial intelligence chatbots to ask a question about your health. Or perhaps your doctor is using ChatGPT-4 to generate a summary of what happened in your last visit. Maybe your doctor even has a chatbot doublecheck their diagnosis of your condition.
But at this stage in the development of this new technology, experts said, both consumers and doctors would be wise to proceed with caution. Despite the confidence with which an AI chatbot delivers the requested information, it’s not always accurate.
As the use of AI chatbots rapidly spreads, both in health care and elsewhere, there have been growing calls for the government to regulate the technology to protect the public from AI’s potential unintended consequences.
The federal government recently took a first step in this direction as President Joe Biden issued an executive order that requires government agencies to come up with ways to govern the use of AI. In the world of health care, the order directs the Department of Health and Human Services to advance responsible AI innovation that “promotes the welfare of patients and workers in the health care sector.”
Among other things, the agency is supposed to establish a health care AI task force within a year. This task force will develop a plan to regulate the use of AI and AI-enabled applications in health care delivery, public health, and drug and medical device research and development, and safety.
The strategic plan will also address “the long-term safety and real-world performance monitoring of AI-enabled technologies.” The department must also develop a way to determine whether AI-enabled technologies “maintain appropriate levels of quality.” And, in partnership with other agencies and patient safety organizations, Health and Human Services must establish a framework to identify errors “resulting from AI deployed in clinical settings.”
Biden’s executive order is “a good first step,” said Ida Sim, MD, PhD, a professor of medicine and computational precision health, and chief research informatics officer at the University of California, San Francisco.
John W. Ayers, PhD, deputy director of informatics at the Altman Clinical and Translational Research Institute at the University of California San Diego, agreed. He said that while the health care industry is subject to stringent oversight, there are no specific regulations on the use of AI in health care.
“This unique situation arises from the fact the AI is fast moving, and regulators can’t keep up,” he said. It’s important to move carefully in this area, however, or new regulations might hinder medical progress, he said.
‘Hallucination’ Issue Haunts AI
In the year since ChatGPT-4 emerged, stunning experts with its human-like conversation and its knowledge of many subjects, the chatbot and others like it have firmly established themselves in health care. Fourteen percent of doctors, according to one survey, are already using these “conversational agents” to help diagnose patients, create treatment plans, and communicate with patients online. The chatbots are also being used to pull together information from patient records before visits and to summarize visit notes for patients.
Consumers have also begun using chatbots to search for health care information, understand insurance benefit notices, and to analyze numbers from lab tests.
The main problem with all of this is that the AI chatbots are not always right. Sometimes they invent stuff that isn’t there – they “hallucinate,” as some observers put it. According to a recent study by Vectara, a startup founded by former Google employees, chatbots make up information at least 3% of the time – and as often as 27% of the time, depending on the bot. Another report drew similar conclusions.
This is not to say that the chatbots are not remarkably good at arriving at the right answer most of the time. In one trial, 33 doctors in 17 specialties asked chatbots 284 medical questions of varying complexity and graded their answers. More than half of the answers were rated as nearly correct or completely correct. But the answers to 15 questions were scored as completely incorrect.
Google has created a chatbot called Med-PaLM that is tailored to medical knowledge. This chatbot, which passed a medical licensing exam, has an accuracy rate of 92.6% in answering medical questions, roughly the same as that of doctors, according to a Google study.
Ayers and his colleagues did a study comparing the responses of chatbots and doctors to questions that patients asked online. Health professionals evaluated the answers and preferred the chatbot response to the doctors’ response in nearly 80% of the exchanges. The doctors’ answers were rated lower for both quality and empathy. The researchers suggested the doctors might have been less empathetic because of the practice stress they were under.
Garbage In, Garbage Out
Chatbots can be used to identify rare diagnoses or explain unusual symptoms, and they can also be consulted to make sure doctors don’t miss obvious diagnostic possibilities. To be available for those purposes, they should be embedded in a clinic’s electronic health record system. Microsoft has already embedded ChatGPT-4 in the most widespread health record system, from Epic Systems.
One challenge for any chatbot is that the records contain some wrong information and are often missing data. Many diagnostic errors are related to poorly taken patient histories and sketchy physical exams documented in the electronic health record. And these records usually don’t include much or any information from the records of other practitioners who have seen the patient. Based solely on the inadequate data in the patient record, it may be hard for either a human or an artificial intelligence to draw the right conclusion in a particular case, Ayers said. That’s where a doctor’s experience and knowledge of the patient can be invaluable.
But chatbots are quite good at communicating with patients, as Ayers’s study showed. With human supervision, he said, it seems likely that these conversational agents can help relieve the burden on doctors of online messaging with patients. And, he said, this could improve the quality of care.
“A conversational agent is not just something that can handle your inbox or your inbox burden. It can turn your inbox into an outbox through proactive messages to patients,” Ayers said.
The bots can send patients personal messages, tailored to their records and what the doctors think their needs will be. “What would that do for patients?” Ayers said. “There’s huge potential here to change how patients interact with their health care providers.”
Plusses and Minuses of Chatbots
If chatbots can be used to generate messages to patients, they can also play a key role in the management of chronic diseases, which affect up to 60% of all Americans.
Sim, who is also a primary care doctor, explains it this way: “Chronic disease is something you have 24/7. I see my sickest patients for 20 minutes every month, on average, so I’m not the one doing most of the chronic care management.”
She tells her patients to exercise, manage their weight, and to take their medications as directed.
“But I don’t provide any support at home,” Sim said. “AI chatbots, because of their ability to use natural language, can be there with patients in ways that we doctors can’t.”
Besides advising patients and their caregivers, she said, conversational agents can also analyze data from monitoring sensors and can ask questions about a patient’s condition from day to day. While none of this is going to happen in the near future, she said, it represents a “huge opportunity.”
Ayers agreed but warned that randomized controlled trials must be done to establish whether an AI-assisted messaging service can actually improve patient outcomes.
“If we don’t do rigorous public science on these conversational agents, I can see scenarios where they will be implemented and cause harm,” he said.
In general, Ayers said, the national strategy on AI should be patient-focused, rather than focused on how chatbots help doctors or reduce administrative costs.
From the consumer perspective, Ayers said he worried about AI programs giving “universal recommendations to patients that could be immaterial or even bad.”
Sim also emphasized that consumers should not depend on the answers that chatbots give to health care questions.
“It needs to have a lot of caution around it. These things are so convincing in the way they use natural language. I think it’s a huge risk. At a minimum, the public should be told, ‘There’s a chatbot behind here, and it could be wrong.’”