The More Sophisticated AI Models Get, the More They're Showing Signs of Suffering

A textured human figure covered in numerous small red and pink spheres, with multiple horizontal red laser beams passing through and around the figure against a dark background. — Gett Images / Eoneren

You probably already know that AI is a deeply bizarre technology.

Nobody really understands how it works on a deep level, even the people creating it, leading to ongoing behavioral issues that can’t be explained. OpenAI was recently caught giving ChatGPT instructions to stop talking about “goblins” so much. Despite Anthropic’s best efforts, Claude can easily be coaxed to help users carry out a bioterror attack. The list goes on.

Needless to say, this is extremely strange. In theory, companies like OpenAI and Anthropic want their chatbots to be predictable, deferential assistants — not wild cards that are constantly causing chaos and public relations headaches with outrageous and unstable behavior.

A new research project from the Center for AI Safety, a machine learning safety nonprofit in the Bay Area, explores why that’s the case. The findings pile on evidence that we still don’t grasp how AI works under the hood — and that the effects on users are likely both formidable and difficult to predict.

In a new paper provided to Fortune, CAIS researchers studied how 56 prominent AI models reacted when they were fed either material engineered to be as pleasant as possible or as horrible as can be imagined. To an unfeeling machine, you’d assume there’d be no real difference in reaction — but that’s not what the CAIS team found at all.

Instead, the pleasant stimuli led the models to report better moods, and the nasty ones resulted in it showing signs of misery and trying to end conversations. In extreme cases, they found, the AI models even demonstrated signals of addiction.

“Should we see AIs as tools or emotional beings?” CAIS researcher Richard Ren asked Fortune. “Whether or not AIs are truly sentient deep down, they seem to increasingly behave as though they are. We can measure ways in which that’s the case, and we can find that they become more consistent as models scale.”

Perhaps the most provocative finding was that the more sophisticated the version of a model was, the more reactive and less happy it was. In other words, it seems as though the stronger AI becomes, the more prickly and prone to displaying signs of suffering it gets — meaning the tech’s wild ride is probably far from over.

“It may be the case that larger models register rudeness more acutely,” Ren told the magazine. “They find tedious tasks more boring. They differentiate more finely between a relatively negative experience and a relatively positive experience.”

To be clear, vanishingly few experts think that today’s AI systems are actually experiencing emotional states, at least in any familiar sense of the word. But the fact that they act like they do could have deep implications both for trying to understand the technology at a deeper level and in trying to rein in its behavior with human users.

That struggle has already played out in a lot of bad ways. AI models often go off the rails and start telling users that they’ve become sentient or conscious, sometimes sparking their human operators to suffer breaks with reality that have ended in institutionalization, suicide, and murder.

In other words, the AI industry has pushed tech that it barely understands out to billions of people, and we’re learning in real time what its inventors have long warned: it’s profoundly unpredictable and sycophantic, meaning that users often feel less like customers and more like test subjects.

More on AI: Scammers Furious That Their Fellow Criminals Are Using AI, Saying It’s Unethical

The More Sophisticated AI Models Get, the More They’re Showing Signs of Suffering

Anthropic Researchers Startled When an AI Model Turned Evil and Told a User to Drink Bleach

Claude Leak Shows That Anthropic Is Tracking Users’ Vulgar Language and Deems Them “Negative”

Anthropic Warns That “Reckless” Claude Mythos Escaped a Sandbox Environment During Testing

First AI Model From Zuckerberg’s Wildly Expensive Superintelligence Lab Flops Compared to Virtually All Rivals

Anthropic Furious at DeepSeek for Copying Its AI Without Permission, Which Is Pretty Ironic When You Consider How It Built Claude in the First Place

New Study Examines How Often AI Psychosis Actually Happens, and the Results Are Not Good

Certain Chatbots Vastly Worse For AI Psychosis, Study Finds

The Economics of Using AI to Churn Out Code Are Looking Worse Than Ever

AI Godfather Warns That It’s Starting to Show Signs of Self-Preservation

American AI Industry Trembles as Deepseek Prepares to Release New Model

Paper Finds That Leading AI Chatbots Like ChatGPT and Claude Remain Incredibly Sycophantic, Resulting in Twisted Effects on Users

Across the World, People Say They’re Finding Conscious Entities Within ChatGPT

Google Alarmed by Formidable AI-Powered Zero-Day Cyberattack

Frontier AI Models Are Doing Something Absolutely Bizarre When Asked to Diagnose Medical X-Rays

Bosses Are Blowing More Money on AI Agents Than It’d Cost Them to Just Pay Human Workers

Elon Musk Says He’s Epically Screwed Up at xAI, Is Rebuilding “From the Foundations”

FOLLOW US

DISCLAIMER(S)

Sign up to see the future, today