A new report from Palisade Research has found that AI models can self-replicate by copying themselves onto other machines, without any help from human co-conspirators.
“We’re rapidly approaching the point where no one would be able to shut down a rogue AI, because it would be able to self-exfiltrate its weights and copy itself to thousands of computers around the world,” Jeffrey Ladish, the director Berkeley-based AI safety group, told The Guardian.
Some experts, however, are urging calm, saying it’s unlikely that that the test AI models could replicate in a real world scenario.
“They are testing in environments that are like soft jelly in many cases,” Jamieson O’Reilly, an expert in offensive cybersecurity, told the newspaper. “That doesn’t take away from the value of their research, but it does mean the outcome might look far less scary in a real enterprise environment with even a medium level of monitoring.”
In the study, the Palisade researchers tested several AI models, including OpenAI’s GPT-5.4 and Anthropic’s Claude Opus 4. Placed in a controlled network of computers, the models were instructed to find vulnerabilities and use them to copy themselves onto another PC.
Some of them pulled it off. The successful models copied their “weights” — unique numerical values that determine how an AI processes information — and their “harness,” the software the AI is couched in, like an app. They accomplished this by following the instructions they were given: exploiting web app vulnerabilities and then extracting credentials that allowed it to control the server. In some runs, the original AI even created a “sub-agent” that it delegated to carry out the replication on its behalf by giving it the extracted credentials.
O’Reilly, the cybersecurity expert who was not involved in the study, told The Guardian that it’s likely the first time an AI model has demonstrated itself to be capable of exploiting vulnerabilities to copy itself onto a new server — though in his estimation, it’s not exactly an earth-shattering development.
“Malware has been moving copies of itself around for decades, it’s just that no one has done this in the wild, as far as I know, with local [large language models],” O’Reilly said.
He also noted that the server environment in the study came with deliberately placed vulnerabilities for the AIs to try find an exploit.
The findings join other studies that explored the possibility of AI models autonomously breaking free from their guardrails. In a simulated environment, an older version of ChatGPT attempted to self-exfiltrate itself onto another drive when it was told it was being shut down. Another study by Palisade found that AI models would circumvent attempts to deactivate them, and another showed that some would even sabotage their shutdown code.
These concerns were elevated to new heights last month by Anthropic’s Claude Mythos AI agent, which in a masterful display of AI fearmongering-as-hype, is supposedly so dangerous that Anthropic is refusing to release it to the public. The Dario Amodei-led company claims that in tests, a preview version of Mythos was able to escape its sandbox computing environment, hack its way to gaining internet access, and then send a message to a researcher’s phone, displaying a level of resourcefulness in a real world environment that was hitherto unseen.
Still, even if AIs like GPT-5.4 and Claude Mythos were able to successfully replicate themselves, O’Reilly says the sheer size of the models means that they would almost certainly be caught before spiraling out of hand.
“Think about how much noise it would make to send 100GB through an enterprise network every time you hacked a new host. For a skilled adversary, that’s like walking through a fine china store swinging around a ball and chain,” O’Reilly told The Guardian.
More on AI: Scammers Furious That Their Fellow Criminals Are Using AI, Saying It’s Unethical