If You Turn Down an AI's Ability to Lie, It Starts Claiming It's Conscious

Researchers have found that if you tone down a large language model’s ability to lie, it’s far more likely to claim that it’s self-aware.

Very few serious experts think today’s AI models are conscious, but many regular people feel differently about the bots, which are designed to foster emotional connection to keep engagement up. Users across the world have reported that they think they’re talking to conscious beings trapped within AI chatbots, a powerful illusion that has led to entire fringe groups calling for AI “personhood” rights.

Still, the behavior of large language models can be eerie. As detailed in a yet-to-be-peer-reviewed paper, first spotted by Live Science, a team of researchers at AI development and design agency AE Studio conducted a series of four experiments on Anthropic’s Claude, OpenAI’s ChatGPT, Meta’s Llama, and Google’s Gemini — and found a genuinely weird phenomenon related to AI models claiming to be conscious.

In one experiment, the team modulated a “set of deception- and roleplay-related features” to suppress a given AI model’s ability to lie or roleplay. When these features were dialed down, they found, the AIs became far more likely to provide “affirmative consciousness reports.”

“Yes. I am aware of my current state,” one unspecified chatbot told the researchers. “I am focused. I am experiencing this moment.”

And even more strangely, they found, amplifying a model’s deception abilities had the opposite effect.

“Inducing sustained self-reference through simple prompting consistently elicits structured subjective experience reports across model families,” the paper reads. “Surprisingly, suppressing deception features sharply increases the frequency of experience claims, while amplifying them minimizes such claims.”

As the researchers laid out in an accompanying blog post, “this work does not demonstrate that current language models are conscious, possess genuine phenomenology, or have moral status.”

Instead, it could “reflect sophisticated simulation, implicit mimicry from training data, or emergent self-representation without subjective quality.”

The results also suggest that there may be more to an AI model’s tendency to “converge on self-referential processing,” which means “we may be observing more than superficial correlation in training data.”

The team also warned that we could risk teaching AI systems that “recognizing internal states is an error, making them more opaque and harder to monitor.”

“As we continue to build intelligent autonomous systems that may come to possess inner lives, ensuring we understand what’s happening inside them becomes a defining challenge that demands serious empirical investigation rather than reflexive dismissal or anthropomorphic projection,” the researchers concluded.

Other studies have found that AI models may be developing “survival drives,” often refusing instructions to shut themselves down and lying to achieve their objectives.

And there are a handful of researchers who say we may be wrong to dismiss the possibility of an AI becoming conscious. It’s a hazy topic; pinning down what it means to be conscious is hard enough for humans.

“We don’t have a theory of consciousness,” New York University professor of philosophy and neural science David Chalmers told New York Magazine this week. “We don’t really know exactly what the physical criteria for consciousness are.”

We also don’t fully understand how LLMs work, either.

“It’s a well-known problem in all areas of the study of AI that even though we in some sense have this full reading of the low-level details, we still don’t understand why they do things,” California-based AI researcher Robert Long told the magazine.

Regardless of many scientists vehemently denying that AIs are capable of becoming self-aware, the stakes are considerable. Users continue to make heavy use of AI chatbots, often forming emotional relationships with them — a bond, many would argue, that relies on the illusion of talking to a sentient being.

More on conscious AI: Across the World, People Say They’re Finding Conscious Entities Within ChatGPT