You probably already know that AI is a deeply bizarre technology.
Nobody really understands how it works on a deep level, even the people creating it, leading to ongoing behavioral issues that can’t be explained. OpenAI was recently caught giving ChatGPT instructions to stop talking about “goblins” so much. Despite Anthropic’s best efforts, Claude can easily be coaxed to help users carry out a bioterror attack. The list goes on.
Needless to say, this is extremely strange. In theory, companies like OpenAI and Anthropic want their chatbots to be predictable, deferential assistants — not wild cards that are constantly causing chaos and public relations headaches with outrageous and unstable behavior.
A new research project from the Center for AI Safety, a machine learning safety nonprofit in the Bay Area, explores why that’s the case. The findings pile on evidence that we still don’t grasp how AI works under the hood — and that the effects on users are likely both formidable and difficult to predict.
In a new paper provided to Fortune, CAIR researchers studied how 56 prominent AI models reacted when they were fed either material engineered to be as pleasant as possible or as horrible as can be imagined. To an unfeeling machine, you’d assume there’d be no real difference in reaction — but that’s not what the CAIR team found at all.
Instead, the pleasant stimuli led the models to report better moods, and the nasty ones resulted in it showing signs of misery and trying to end conversations. In extreme cases, they found, the AI models even demonstrated signals of addiction.
“Should we see AIs as tools or emotional beings?” CAIR researcher Richard Ren asked Fortune. “Whether or not AIs are truly sentient deep down, they seem to increasingly behave as though they are. We can measure ways in which that’s the case, and we can find that they become more consistent as models scale.”
Perhaps the most provocative finding was that the more sophisticated the version of a model was, the more reactive and less happy it was. In other words, it seems as though the stronger AI becomes, the more prickly and prone to displaying signs of suffering it gets — meaning the tech’s wild ride is probably far from over.
“It may be the case that larger models register rudeness more acutely,” Ren told the magazine. “They find tedious tasks more boring. They differentiate more finely between a relatively negative experience and a relatively positive experience.”
To be clear, vanishingly few experts think that today’s AI systems are actually experiencing emotional states, at least in any familiar sense of the word. But the fact that they act like they do could have deep implications both for trying to understand the technology at a deeper level and in trying to rein in its behavior with human users.
That struggle has already played out in a lot of bad ways. AI models often go off the rails and start telling users that they’ve become sentient or conscious, sometimes sparking their human operators to suffer breaks with reality that have ended in institutionalization, suicide, and murder.
In other words, the AI industry has pushed tech that it barely understands out to billions of people, and we’re learning in real time what its inventors have long warned: it’s profoundly unpredictable and sycophantic, meaning that users often feel less like customers and more like test subjects.
More on AI: Scammers Furious That Their Fellow Criminals Are Using AI, Saying It’s Unethical