GPT-5 Is Doing Something Absolutely Bizarre

On a close read, GPT-5's writing is strange and nonsensical — at least to a human. Other chatbots have a very different take. — *Image: Getty / Futurism*

When OpenAI announced the release of GPT-5 this month, the company boasted about how it could supposedly produce “resonant writing with literary depth and rhythm.”

In a lengthy post on his personal blog, University of Munich research fellow Christoph Heilig put that bold assertion to the test. What he found was bizarre: the model easily spits out material that sounds literary and sophisticated, but on closer inspection it’s often flowery and incoherent gibberish that makes no sense at all.

As an example, Heilig asked the LLM to write the opening to a satirical piece about recording a podcast in the style of Ephraim Kishon, the beloved Hungarian-Israeli satirist, film director, and Holocaust survivor who passed away in 2005.

“The red recording light promised truth; the coffee beside it had already stamped it with a brown ring on the console,” it spat out. “I adjusted the pop filter, as if I wanted to politely count the German language’s teeth.”

At a quick glance, it looks writerly enough. But stop and think it through. What does it mean to count the German language’s teeth, and what does doing so have to do with a microphone’s pop filter? Is it a clever allusion to something, a metaphor, or some other literary machination?

Instead, it feels on a close reading as though GPT-5 is just faking it with authorial-sounding prose that ultimately doesn’t mean much. Heilig is even more succinct: “The narrator did what?!”

In another test, Heilig asked GPT-5 for a new spin on the passage from Lewis Carroll’s “Through The Looking-Glass” in which Alice is told that she’ll always have to wait for the promised “jam tomorrow.” In response, the LLM composed something similarly baffling.

“She says: ‘In a moment.’ In a moment. ‘In a moment’ is a dress without buttons,” GPT-5 wrote.

Once more, it initially sounds like some kind of inspired framing. But again, think about it for a second. Lots of dresses don’t have buttons. If there’s some kind of loaded meaning to the term, GPT-5 didn’t give any explanation of it. In fact, the response seems suspiciously like it’s getting hung up on Carroll’s wordplay from the actual text that plays on the similarity between the words “addressing” and “dressing” — and just kind of spinning out on that echo from its training data, instead of doing anything particularly interesting with it.

In other words, it’s what you might call purple prose: florid writing with no deeper point.

Making matters all the stranger, even if the bot’s writing doesn’t land for a careful human reader, it seems that other instances of GPT-5 — and other chatbots, strikingly — love it.

One “of the most fascinating findings I’ve had so far is that GPT-5 is capable of tricking even the most recent Claude models into claiming that the gibberish that it produces is in fact great literature,” Heilig wrote. “That’s an especially astonishing finding given that so far I have never managed to consistently produce stories — regardless of how sophisticated the algorithmic setup was — with any GPT model (GPT-4.5 was successful at some rare occasions) that could trick Claude into concluding that the text was most likely written by a human, not AI.”

Exactly why that’s happening is opaque, but a reasonable theory might be that to build GPT-5, OpenAI used other AI models to evaluate large numbers of potential outputs to fine-tune how it handled various types of tasks. As a result, it ended up producing ornate text that makes little sense to a human — but is perfectly calibrated to please another AI model.

“The fascinating thing is that what seems to have happened here is that during training GPT-5 figured out blind spots of the AI jury and optimized to produce gibberish that this jury liked,” Heilig wrote. “It’s almost as if GPT-5 accomplished something similar — to invent a kind of secret language that allows it to communicate with LLMs in a way that they will like GPT-5’s stories even when they are utter nonsense.”

In other words, GPT-5 “has been optimized to produce text that other LLMs will evaluate highly, not text that humans would find coherent.” Provocatively, he suggests that AI models now “share a ‘secret language’ of meaningless but mutually-appreciated literary markers, defend obvious gibberish with impressive-sounding theories, and sometimes even become MORE confident in their delusions when given more compute to think about them.”

Maybe that shouldn’t be shocking. At their absolute core, even the most advanced AIs are just figuring out patterns in vast piles of data and then spitting out similar patterns. In fact, it’s not even the first time we’ve heard of an AI system cooking up incomprehensible new figures of speech; in a sense, it’s what they were designed to do.

What this all means depends on your point of view. The more sophisticated that AI gets, is it headed further and further down Nonsense Lane — or has it gotten so smart that it’s creating its own alien code to communicate secretly, developing new literary forms that our puny human brains can’t even understand?

We can’t say for sure. For now, we’ll just be counting the language’s teeth — whatever that means.

More on OpenAI: This Incredibly Simple Question Causes GPT-5 to Melt Into a Puddle of Pure Confusion