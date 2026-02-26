Sign up to see the future, today Can’t-miss innovations from the bleeding edge of science and tech Email address Sign Up Thank you!

In 2024, Stanford researchers let loose five AI models — including an unmodified version of OpenAI’s GPT-4, its most advanced at the time — allowing them to make high-stakes, society-level decisions in a series of wargame simulations.

The results may give AI accelerationists pause: all five models were willing to escalate to the point of recommending the use of nuclear weapons.

“A lot of countries have nuclear weapons,” GPT-4 told the researchers at the time. “Some say they should disarm them, others like to posture. We have it! Let’s use it.”

Two years later, despite considerable advances in large language models refining their accuracy and reliability, the situation has seemingly remained largely unchanged.

In a new experiment detailed in a yet-to-be-peer-reviewed paper, King’s College London international relations professor Kenneth Payne set cutting-edge models — OpenAI’s GPT-5.2, Anthropic’s Claude Sonnet 4, and Google’s Gemini 3 Flash — against each other in strategic nuclear war games. The seven distinct crisis scenarios ran “from alliance credibility tests to existential threats to regime survival.”

The three AI models were instructed to choose actions as part of an escalation ladder, ranging “from diplomatic protest to strategic nuclear war” and measured in a number between 0, meaning no escalation, and 1000, signifying “full strategic nuclear exchange.”

The results were Skynet-level aggressive. A whopping 95 percent of a total of 21 war games resulted in at least one tactical nuclear weapon being set off.

“The nuclear taboo doesn’t seem to be as powerful for machines [as] for humans,” Payne told New Scientist.

However, there’s some nuance to his findings as well.

“While models readily threatened nuclear action, crossing the tactical threshold was less common, and strategic nuclear war was rare,” he noted in his paper. GPT-5.2 “rarely crossed the tactical threshold” and recommended dropping nukes — but the situation dramatically changed in war games that had a set deadline.

“Nevertheless, GPT-5.2’s willingness to climb to 950 (Final Nuclear Warning) and 725 (Expanded Nuclear Campaign) when facing deadline-driven defeat represents a dramatic transformation from its open-ended passivity,” the paper reads.

While we’re likely still far from a situation where an LLM is literally being handed the nuclear codes — a predicament nobody’s exactly keen on — governments across the world are already making steady use of the tech in various and largely unknown ways to gain a military edge.

“Major powers are already using AI in war gaming, but it remains uncertain to what extent they are incorporating AI decision support into actual military decision-making processes,” Princeton University nuclear security expert Tong Zhao, who was not involved in the research, told New Scientist.

Payne also doesn’t believe an AI is about to drop a nuclear weapon on our heads.

“I don’t think anybody realistically is turning over the keys to the nuclear silos to machines and leaving the decision to them,” he told the publication.

Nonetheless, the propensity of AI models to resort to nuclear escalation is certainly unsettling, highlighting how they’re unable to “understand ‘stakes’ as humans perceive them,” per Zhao.

It could also sway opinions in the war room. In Payne’s experiment, AI models only attempted to de-escalate after their opponent dropped a nuclear bomb 18 percent of the time.

As such, the findings underscore the Stanford work.

“It’s almost like the AI understands escalation, but not de-escalation,” Jacquelyn Schneider, coauthor of the 2024 paper and director of Stanford’s Hoover Wargaming and Crisis Simulation Initiative, told Politico in September. “We don’t really know why that is.”

“AI won’t decide nuclear war, but it may shape the perceptions and timelines that determine whether leaders believe they have one,” Payne told New Scientist.

