Doctors Horrified After Google's Healthcare AI Makes Up a Body Part That Does Not Exist in Humans

Image by Getty / Futurism

Health practitioners are becoming increasingly uneasy about the medical community making widespread use of error-prone generative AI tools.

The proliferation of the tech has repeatedly been hampered by rampant "hallucinations," a euphemistic term for the bots' made-up facts and convincingly-told lies.

One glaring error proved so persuasive that it took over a year to be caught. In their May 2024 research paper introducing a healthcare AI model, dubbed Med-Gemini, Google researchers showed off the AI analyzing brain scans from the radiology lab for various conditions.

It identified an "old left basilar ganglia infarct," referring to a purported part of the brain — "basilar ganglia" — that simply doesn't exist in the human body. Board-certified neurologist Bryan Moore flagged the issue to The Verge, highlighting that Google fixed its blog post about the AI — but failed to revise the research paper itself.

The AI likely conflated the basal ganglia, an area of the brain that's associated with motor movements and habit formation, and the basilar artery, a major blood vessel at the base of the brainstem. Google blamed the incident on a simple misspelling of "basal ganglia."

It's an embarrassing reveal that underlines persistent and impactful shortcomings of the tech. Even the latest "reasoning" AIs by the likes of Google and OpenAI are spreading falsehoods dreamed up by large language models that are trained on vast swathes of the internet.

In Google's search results, this can lead to headaches for users during their research and fact-checking efforts.

But in a hospital setting, those kinds of slip-ups could have devastating consequences. While Google's faux pas more than likely didn't result in any danger to human patients, it sets a worrying precedent, experts argue.

"What you’re talking about is super dangerous," healthcare system Providence's chief medical information officer Maulin Shah told The Verge. "Two letters, but it’s a big deal."

Google touted its healthcare AI as having a "substantial potential in medicine" last year, arguing it could be used to identify conditions in X-rays, CT scans, and more.

After Moore flagged the mistake in the company's research paper to Google, employees told him it was a typo. In its updated blog post, Google noted that "'basilar' is a common mis-transcription of 'basal' that Med-Gemini has learned from the training data, though the meaning of the report is unchanged."

Yet the research paper still erroneously refers to the "basilar ganglia" at the time of writing.

In a medical context, AI hallucinations could easily lead to confusion and potentially even put lives at risk.

"The problem with these typos or other hallucinations is I don’t trust our humans to review them, or certainly not at every level," Shah told The Verge.

It's not just Med-Gemini. Google's more advanced healthcare model, dubbed MedGemma, also led to varying answers depending on the way questions were phrased, leading to errors some of the time.

"Their nature is that [they] tend to make up things, and it doesn’t say ‘I don’t know,’ which is a big, big problem for high-stakes domains like medicine," Judy Gichoya, Emory University associate professor of radiology and informatics, told The Verge.

Other experts say we're rushing into adapting AI in clinical settings — from AI therapists, radiologists, and nurses to patient interaction transcription services — warranting a far more careful approach.

In the meantime, it will be up to humans to continuously monitor the outputs of hallucinating AIs, which could counteractively lead to inefficiencies.

And Google is going full steam ahead. In March, Google revealed that its extremely error-prone AI Overviews search feature would start giving health advice. It also introduced an "AI co-scientist" who would assist human scientists in discovering new drugs, among other "superpowers."

But if their outputs go unobserved and unverified, human lives could be at stake.

"In my mind, AI has to have a way higher bar of error than a human," Shah told The Verge. "Maybe other people are like, ‘If we can get as high as a human, we’re good enough.’ I don’t buy that for a second."

More on health AI: AI Therapist Goes Haywire, Urges User to Go on Killing Spree

Share This Article