Drive to Survive

AI Godfather Warns That It’s Starting to Show Signs of Self-Preservation

"Frontier AI models already show signs of self-preservation in experimental settings today."
Frank Landymore Avatar
Godfather of AI Yoshua Bengio claimed top AI models are showing signs of self-preservation, and criticized calls for giving AI models rights.
Getty Images / Bulgac

If we’re to believe Yoshua Bengio, one of the so-called “godfathers” of AI, some advanced models are showing signs of self-preservation — which is exactly why we shouldn’t endow them with any kind of rights whatsoever. Because if we do, he says, they may run away with that autonomy and turn on us before we have a chance to pull the plug. Then it’s curtains for this whole “humankind” experiment.

“Frontier AI models already show signs of self-preservation in experimental settings today, and eventually giving them rights would mean we’re not allowed to shut them down,” Bengio told The Guardian in a recent interview

“As their capabilities and degree of agency grow,” the Canadian computer scientist added, “we need to make sure we can rely on technical and societal guardrails to control them, including the ability to shut them down if needed.”

Bengio was one of the recipients of the 2018 Turing Award, along with Geoffrey Hinton and Meta’s recently-ousted chief AI scientist Yann LeCun, earning the three of them the title of being “godfathers” of AI. His comments are referring to experiments in which AI models refused or circumvented instructions or mechanisms intended to shut them down. 

One study published by AI safety group Palisade Research concluded such instances were evidence that top AI models like Google’s Gemini line were developing “survival drives.” The bots, in Palisade’s experiments, ignore unambiguous prompts to turn off. A study from Claude-maker Anthropic found that its own chatbot and others would sometimes resort to blackmailing a user when threatened with being turned off. Another study from the red teaming organization Apollo Research showed that OpenAI’s ChatGPT models would attempt to avoid being replaced with a more obedient model by “self-exfiltrating” itself onto another drive.

While the findings of these experiments raise urgent question about the tech’s safety, they don’t suggest the AI models in question are sentient. It would also be a mistake to think of their “survival drives” in the same terms as the biological imperatives found in nature. What may seem like signs of “self-preservation” are likely instead a consequence of how AI models pick up patterns in their training data — and are notoriously poor at accurately following instructions.

Still, Bengio is worried about where it’s all headed, arguing there’s “real scientific properties of consciousness” in the human brain that machines could replicate. Yet how we perceive consciousness is a whole different ballgame, he says, because we tend to assume an AI can be conscious in the same way a human is.

 “People wouldn’t care what kind of mechanisms are going on inside the AI,” Bengio explained. “What they care about is it feels like they’re talking to an intelligent entity that has their own personality and goals. That is why there are so many people who are becoming attached to their AIs.”

“The phenomenon of subjective perception of consciousness is going to drive bad decisions,” he warned.

His advice? Think of AI models as hostile aliens.

“Imagine some alien species came to the planet and at some point we realize that they have nefarious intentions for us,” he told The Guardian. “Do we grant them citizenship and rights or do we defend our lives?”

More on AI: New Parents Mocked for Letting ChatGPT Name Their Baby

Frank Landymore Avatar

Frank Landymore

Contributing Writer

I’m a tech and science correspondent for Futurism, where I’m particularly interested in astrophysics, the business and ethics of artificial intelligence and automation, and the environment.