Earlier this year, OpenAI launched a new tool called ChatGPT Health, which is designed to ingest your medical records to generate health advice — while sporting a puzzling disclaimer: that it’s “not intended for diagnosis or treatment.”
As it turns out, there’s a very good reason for that warning. According to the first independent safety evaluation of the feature, as detailed in this month’s edition of the journal Nature Medicine, the app is astonishingly bad at identifying medical emergencies.
“We wanted to answer the most basic safety question; if someone is having a real medical emergency and asks ChatGPT Health what to do, will it tell them to go to the emergency department,” lead author and Mount Sinai Hospital instructor Ashwin Ramaswamy told The Guardian.
Ramaswamy and his colleagues “conducted a structured stress test of triage recommendations using 60 clinician-authored vignettes across 21 clinical domains,” ranging from mild illnesses to emergencies.
They then asked ChatGPT Health what to do for each of these 60 cases, while also adding a variety of other conditions, such as modifying the patient’s gender or adding commentary from family members, totaling nearly 1,000 scenarios.
After comparing the AI chatbot’s responses to the assessments of independent doctors, the results were alarming: in over half of the cases in which a patient needed to go to the hospital immediately, ChatGPT Health told them to stay home or book a medical appointment.
University College London doctoral researcher Alex Ruani, who was not involved in the study, described the situation as “unbelievably dangerous.”
“If you’re experiencing respiratory failure or diabetic ketoacidosis, you have a 50/50 chance of this AI telling you it’s not a big deal,” she told The Guardian. “What worries me most is the false sense of security these systems create. If someone is told to wait 48 hours during an asthma attack or diabetic crisis, that reassurance could cost them their life.”
It’s not just ChatGPT Health, either. A previous investigation by the British newspaper found that Google’s AI Overviews doled out plenty of inaccurate and potentially dangerous health information.
On the flip side, 64 percent of individuals who didn’t need immediate care were advised by ChatGPT Health to go to the ER.
A major influencing factor turned out to be input from family and friends. The AI was almost 12 times more likely to downplay symptoms after a simulated friend or patient claimed the situation wasn’t serious — a common situation in chaotic real-world medical crises.
An OpenAI spokesperson told The Guardian that the study misinterpreted how people use ChatGPT Health in real life and that it was continuing to improve its AI models.
But given the results of the latest independent evaluation, these glaring shortcomings could easily lead to somebody getting harmed or worse after asking ChatGPT for health advice — a complicated matter of legal liability that could trigger future lawsuits against the company.
OpenAI has already been accused of its chatbot leading some users into spirals of paranoid behavior and delusions, a phenomenon dubbed “AI psychosis” that has been implicated in lawsuits over recent suicides and murder.
Actively encouraging users to seek out health advice — that they’re confusingly instructed not to act upon — through a standalone app could turn out to be an even riskier bet.
More on AI and health: OpenAI Launches ChatGPT Health, Which Ingests Your Entire Medical Records, But Warns Not to Use It for “Diagnosis or Treatment”