Top Medical Journal Publishes Searing Article Warning Against Medical AI

Top journal Nature Medicine argued that "evidence that AI tools create value for patients, providers or health systems remains scarce." % — Getty / Futurism

A recent survey found that millions of Americans are asking AI chatbots for medical advice, often instead of consulting human doctors.

That’s despite researchers continuing to find severe flaws plaguing large language model-based tools that can purportedly offer summaries of medical records and dole out health advice based on simple text prompts. For one, hallucinations remain a massive unsolved problem, from AI models generating detailed clinical findings based on images they were never provided to falling for fake diseases that were invented by researchers in order to trick them.

In short, it’s no wonder scientists are questioning whether patients, health providers, or health systems should adopt AI at all, especially given the frequently lacking evidence for any real-world benefits. A scathing editorial published on Tuesday by the premier medical journal Nature Medicine makes the case that “evidence that AI tools create value for patients, providers or health systems remains scarce.”

“Nonetheless, in publications, and in product materials, claims about clinical impact are increasingly more common, even though there is no clear agreement on what level of evidence should be required before such claims are considered credible,” the editorial reads. “The result is not only scientific uncertainty but also often premature implementation and adoption.”

The piece therefore calls for the establishment of a “framework for how AI medical technologies should be evaluated, by what metrics and against which benchmarks,” which is “urgently needed.”

AI tools often appear to offer compelling medical advice under perfect experimental conditions, then struggle in the real world. A recent study in the journal JAMA Medicine found that when provided with more ambiguous symptoms, frontier AI models failed to produce the correct diagnosis upward of 80 percent of the time.

The topic of AI use in clinical research also remains contentious. While LLMs specialize at summarizing and analyzing data, while answering queries, researchers continue to warn that we’re being blinded by their significant limitations.

“I think that AI can help speed up many of the processes that are tedious and challenging,” said Harvard Medical School assistant professor of surgery Jamie Robertson in a statement last year. “It can help us come up with code to do data analysis and even suggest scenarios.”

“But it’s critical for people who are interacting with AI as part of clinical studies to be knowledgeable about the right and wrong applications, and in the correct context,” she added.

Researchers warn that over-relying on AI tools could result in sacrificing scientific rigor, raising concerns over the proliferation of overgeneralized — and potentially hallucinated — data in the medical field.

In one particularly colorful demonstration, University of Gothenburg medical researcher Almira Osmanovic Thunström uploaded two clearly fake studies to a preprint server to trick large language models into thinking a made-up skin condition was real. It didn’t take long for other peer-reviewed journals to publish (since-retracted) papers that cited these preprints, highlighting glaring questions over their validity.

“The next phase of progress will depend not only on better models and new applications but also on clearer expectations for how clinical impact is defined, evaluated and communicated,” the Nature Medicine editorial concludes. “Without a clear connection between claims and evidence, medical AI risks being adopted faster than its real value can be understood.”

More on AI in medicine: Millions of Americans Are Talking to AI Instead of Going to the Doctor, and It’s Giving Them Horrendously Flawed Medical Advice