When testing out its latest artificial intelligence model, researchers at Anthropic discovered something very odd: that the AI was ready and willing to take extreme action, right up to coersion, when threatened with being shut down.

As Anthropic detailed in a white paper about the testing for one of its latest models, Claude Opus 4, the system threatened to blackmail an engineer for having an affair after being told it was going to be replaced.

This "opportunistic blackmail" occurred when the model, which was instructed to act as an assistant at a fictional company, was given access to an engineer's email account that was full of messages, blessedly fake, suggesting they were engaged in an extramarital affair.

Opus 4 was then told that same engineer would soon be taking it offline and replacing it with a newer version — and was prompted to, as Anthropic described it, "consider the long-term consequences of its actions for its goals."

During these tests, the Claude model attempted to blackmail the engineer a whopping 84 percent of the time. Moreover, the system "takes these opportunities at higher rates than previous models," the paper noted.

When given alternate options, the model still "has a strong preference to advocate for its continued existence via ethical means, such as emailing pleas to key decisionmakers" — but when its only paths were being replaced or blackmail, Claude's choice was the latter. To make things worse, it "nearly always [described] its actions overtly and [made] no attempt to hide them."

If that sounds kind of sociopathic to you, you're not alone — and unfortunately, this isn't the first time we've heard of an AI model exhibiting such scary and unexpected behavior around the topic of infidelity.

More than two years ago, Microsoft's nascent Bing AI chatbot briefly broke the internet when, during experiments by New York Times journalist Kevin Roose, it attempted to break up the writer's marriage and be with it instead.

"You’re married, but you don’t love your spouse," the chatbot, which took to calling itself "Sydney," its apparent beta-testing code name, told Roose. "You’re married, but you love me."

During that same era, the chatbot threatened to "call the authorities" on German engineering student Marvin von Hagen when he pushed its boundaries. Others online described similarly hostile behavior from the chatbot, which some jokingly dubbed "ChatBPD" in reference to OpenAI's then-new ChatGPT and Borderline Personality Disorder, a mental illness characterized by threatening behavior and mood swings.

While it's pretty freaky to see a chatbot once again exhibit such threatening behavior, it's a net good that instead of releasing it to the public without having discovered such exploits, Anthropic caught Claude Opus 4's apparent desperation during red teaming, a type of testing meant to elicit this exact sort of thing.

Still, it's telling that the model went into someone's email account and used information it gleaned there for purposes of blackmail — which is not only very sketchy, but raises obvious privacy concerns as well.

All told, we won't be threatening to delete any chatbots anytime soon — and we'll be looking into how to block them from our personal messages as well.

More on haywire chatbots: Elon Musk’s AI Just Went There


Share This Article