Google is Already Testing a Medical AI Chatbot in Hospitals

Google has been quietly testing a medical AI chatbot in Mayo Clinic and other hospitals for months, according to a report. — *Image: Getty / Futurism*

Google has been quietly testing a medical AI chatbot in hospitals for months.

Since April, scientists have been testing the bot dubbed Med-PaLM 2 at the research hospital Mayo Clinic, among other hospitals, according to The Wall Street Journal.

The bot is a medicine-specific iteration of Google’s PaLM 2, a powerful large language model that was first unveiled at the company’s I/O keynote in May. (PaLM 2 also powers Google’s ChatGPT competitor, Bard.)

Med-PaLM 2 is trained specifically on medical-licensing exams and designed to provide users with medical advice. It’s also meant to organize healthcare data and summarize documents.

The news comes just months after Microsoft announced its own version of a similar, medicine-oriented chatbot called BioGPT.

Although, if BioGPT is anything to go by, having healthcare professionals use a product like this isn’t without risk. When Futurism tested Microsoft’s model back in March, we found that the chatbot, like others, had a serious problem with fabricating information, drumming up fake scientific citations, spouting outlandish claims about ghosts in hospitals, and even rehashing vaccine misinformation.

To that end, per the WSJ, doctors working with Med-PaLM 2 have found that the bot’s outputs “included more inaccurate or irrelevant content in its responses than those of their peers.” A not-yet-peer-reviewed paper by Google DeepMind engineers shared in May also found that while the machine actually performed on par with or better than human doctors in several metrics, including knowledge recall and reading comprehension, errors in the AI’s responses were rampant.

In other words, Google has yet to prove that its model is reliable at one of the main things it’s designed to do: offer accurate medical advice at scale.

And that, of course, is tremendously important. A fabrication in a Bard-written school essay might get a student a bad grade. Confidently written, yet error-laden medical advice, on the other hand, could have dire health consequences.

As for who could make use of such a chatbot, Google could be aiming its tool at the developing world. An internal Google email to employees from April reviewed by the WSJ explained that Med-PaLM 2 could “be of tremendous value in countries that have more limited access to doctors.”

Still, even higher-ups at Google Research admit that the tech isn’t quite where it needs to be — at least not for their own family’s sake.

“I don’t feel that this kind of technology is yet at a place where I would want it in my family’s healthcare journey,” Greg Corrado, a senior research director at Google who worked on the product, told the WSJ.

Despite these glaring shortcomings, Corrado still believes in the tech’s potential, telling the newspaper that Med-PaLM 2 “takes the places in healthcare where AI can be beneficial and expands them by tenfold.”

According to the WSJ, Google declined to say when Med-PaLM 2 might be made more widely available. Considering that it seems to be deeply flawed, that’s probably a good thing.

More on medical AIs: Microsoft Released an AI That Answers Medical Questions, but It’s Wildly Inaccurate