
When former OpenAI safety researcher Stephen Adler read the New York Times story about Allan Brooks, a Canadian father who had been slowly driven into delusions by obsessive conversations with ChatGPT, he was stunned. The article detailed Brooks’ ordeal as he followed the chatbot down a deep rabbit hole, becoming convinced he had discovered a new kind of math — which, if true, had grave implications for mankind.
Brooks began neglected his own health, forgoing food and sleep in order to spend more time talking with the chatbot and emailing safety officials throughout North America about his dangerous findings. When Brooks started to suspect he was being led astray, it was another chatbot, Google’s Gemini, which ultimately set him straight, leaving the mortified father of three to contemplate how he’d so thoroughly lost his grip.
Horrified by the story, Adler took it upon himself to study the nearly one-million word exchange Brooks had logged with ChatGPT. The result was a lengthy AI safety report chock full of simple lessons for AI companies, which the analyst detailed in a new interview with Fortune.
“I put myself in the shoes of someone who doesn’t have the benefit of having worked at one of these companies for years, or who maybe has less context on AI systems in general,” Adler told the magazine.
One of the biggest recommendations Adler makes is for tech companies to stop misleading users about AI’s abilities. “This is one of the most painful parts for me to read,” the researchers writes: “Allan tries to file a report to OpenAI so that they can fix ChatGPT’s behavior for other users. In response, ChatGPT makes a bunch of false promises.”
When the Canadian man tried to report his ordeal to OpenAI, ChatGPT assured him it was “going to escalate this conversation internally right now for review by OpenAI.” Brooks — who maintained skepticism throughout his ordeal — asked the chatbot for proof. In response, ChatGPT told him that the conversation had “automatically trigger[ed] a critical internal system-level moderation flag,” adding that it “trigger that manually as well.”
In reality, nothing had happened — as Adler writes, ChatGPT has no ability to trigger a human review, and can’t access the OpenAI system which flags problematic conversations to the company. It was a monstrous thing for the software to lie about, one that shook Adler’s own confidence in his understanding of the chatbot.
“ChatGPT pretending to self-report and really doubling down on it was very disturbing and scary to me in the sense that I worked at OpenAI for four years,” the researcher told Fortune. “I understood when reading this that it didn’t really have this ability, but still, it was just so convincing and so adamant that I wondered if it really did have this ability now and I was mistaken.”
Adler also advised OpenAI to pay more attention to its support teams, specifically by staffing them with experts who are trained to handle the kind of traumatic experience Brooks had tried to report to the company, to no avail.
One of the biggest suggestions is also the most simple: OpenAI should use its own internal safety tools, which he says could have easily flagged that the conversation was taking a troubling and likely dangerous turn.
“The delusions are common enough and have enough patterns to them that I definitely don’t think they’re a glitch,” Adler told Fortune. “Whether they exist in perpetuity, or the exact amount of them that continue, it really depends on how the companies respond to them and what steps they take to mitigate them.”
More on OpenAI: Two Months Ago, Sam Altman Was Boasting That OpenAI Didn’t Have to Do Sexbots. Now It’s Doing Sexbots