Two Physician Leaders Look at the Dangers of Clinical AI

How do we understand the current state of artificial intelligence development in patient care, and what are some of the challenges that clinicians and developers will face going forward? Indeed, are “AI gaslighting” and “AI hallucinations” on the horizon, as more and more clinicians and others in U.S. healthcare, plunge into leveraging artificial intelligence and machine learning tools? Kirsten Bibbins-Domingo, M.D., Ph.D., editor-in-chief of JAMA (the Journal of the American Medical Association) and JAMA Network, recently interviewed Michael Howell, M.D., M.P.H., a pulmonologist and chief clinical officer at Google Health, about the subject, posting the video-format interview online on Sep. 20 under the headline, “AI and Clinical Practice—AI Gaslighting, AI Hallucinations, and GenAI Potential.”

Dr. Bibbins-Domingo began by asking Dr. Howell about his role and about what he and his team are doing in terms of research around AI. Howell began by noting that “Google has a health team led by Karen DeSalvo and the health team has a few teams in it. We have a health equity team and a global employee health team and a team that focuses on regulatory. And my team is the clinical team, which is a team of doctors and nurses and psychologists and health economists. And when Google is making products that have a lot of impact on health, we try to work, you know shoulder to shoulder and elbow to elbow with the engineers and the product managers and the researchers to make sure that it's not Silicon Valley and it's not healthcare. It's a blended voice. A third way between.”

When Bibbins-Domingo asked Howell about what can be expected in the next few years in the AI area, and what the “danger areas” might be, Howell told her that AI chatbots are “beginning to be able to have a representation of the world, not just from text but with other things like that. And those, if you're thinking about making something, understanding the capabilities are really important. And those are totally different than what came before.” He referenced a set of tools that can answer clinical questions for clinicians. And the chatbots open-source clinical information available online, and the answers that the chatbot tools are providing are getting better and better now, over time and with training.

In terms of practical uses of the emerging technology, Howell told Bibbins-Domingo that “We're likely to see a lot of work on assisting people in tasks that take them away from the bedside and away from the cognitive or procedural or emotional work of being a clinician. I think that's going to be number one. People talk about things like prior auth as an example. I think that we're likely to see tools that over time that help support clinicians in avoiding things like diagnostic anchoring or diagnostic delay. So any of us who practice for any length of time, we have had a nurse like tap you on the shoulder and go Hey doc, did you mean to do that? Hey doc, did you think about this? I've been saved right?”

Bibbins-Domingo went on to ask Howell about “the concept called AI gaslighting, where the AI has learned to do things very well to a high degree of accuracy. And then all of a sudden is giving you exactly the wrong answer, right? So explain how that comes about, how we guard against it and then we'll tackle hallucinating next.”

“There are a couple of related things here that are that are a little tricky to disentangle,” Howell responded. “So the models are predicting the next word. That's what they're doing at their core, and they they're hopping around that embedding space of like, oh usually people go here next, this looks like a math problem. This looks like you should give a medical citation. There are, if we step back for a second and talk about of the stages of these models, there's the foundation model stage where you have the model read everything to get its hands on. It learns a representation of the world. There's a stage of sometimes used, which is fine tuning with other data, and that can up weight some of the parameters in the model in something you care about.” And, explaining some of the details of how models are learning, he said, “[If] you get reinforcement learning with human feedback wrong, then models can change over time. And if you, when you update anything in the model sometimes and you get better in one area, sometimes it'll get worse in others. Not that different than, you know like the longer I was into working in the ICU, the worse of a primary care doc I would've been.”

What about “AI hallucinations”? Howell noted that, “[I]in any domain, but in healthcare in particular, there's a concept called automation bias of people trust the thing that comes out of the machine. And this is a really important patient safety issue. Like with EHRs, they reduced many kinds of medical errors like no one dies of handwriting anymore, right? Which they used to do with some regularity but they increased the likelihood of other kinds of errors. And so the automation bias a really important thing. And when the model is responding and sounds like a person might sound that it's an even bigger risk. So hallucinations are really important and what they are is the model is just predicting the next word. And if there's one thing for people who are watching this to remember that the model it doesn't go look things up in PubMed.”

And, he added that, because a model is predicting the next word or the next number, the model can make mistakes: “[I]t’ll say, oh, this looks like it should be a medical journal citation. That's the kind of thing that comes next. Here are words that are plausible for a medical journal citation, and then it that will look just like a medical journal citation. It remains a big problem. It was a big problem in the earlier versions of them. There are a few ways from a technical standpoint that this is getting better but it remains an important issue.”

The two physicians went on, exploring a broad range of issues that could emerge in the coming months and years around the leveraging of AI and machine learning tools. The full transcript of the interview can be found here.