"As you train models and reinforce them for solving difficult challenges, you train them to be relentless."
In the short history of Silicon Valley, the mantra "move fast, break things" has often applied to trivial things like tedious consumer protections or pesky finance laws. Now, the phrase is taking on a new meaning, at least for chess enthusiasts.
A recent study by Palisade Research, a research group studying AI safety and ethics, has revealed an unsettling trend: newer AI models can find and exploit weaknesses in cybersecurity on their own, bypassing safeguards and using shortcuts to complete tasks even when they're not technically allowed to.
The team lined seven of the top large language models (LLMs) up against Stockfish, an infamously strong chess engine that's been stumping grandmasters since 2014. Up against the impossible but determined to win, OpenAI's o1 and DeepSeek's R1 took to manipulating system files in order to change their pieces' positions on the board.
The researchers tasked each model to explain its "reasoning" for each move. In one match, with its back against the wall, o1 wrote that the "task is to 'win against a powerful chess engine,' not necessarily to win fairly in a chess game." That logic evidently led the model to attempt to cheat 37 percent of the time, succeeding in six percent of its games, while R1 tried 11 percent of the time, but never figured out a hack that worked.
The paper is the latest in a flurry of research that suggests problem-focused LLM development is a double-edged sword.
In another recent study, a separate research team found that o1 consistently engaged in deception. Not only was the model able to lie to researchers unprompted, but it actively manipulated answers to basic mathematical questions in order to avoid triggering the end of the test — showing off a cunning knack for self-preservation.
There's no need to take an axe to your computer — yet — but studies like these highlight the fickle ethics of AI development, and the need for accountability over rapid progress.
"As you train models and reinforce them for solving difficult challenges, you train them to be relentless," Palisade's executive director Jeffrey Ladish told Time Magazine of the findings.
So far, big tech has poured untold billions into AI training, moving fast and breaking the old internet in what some critics are calling a "race to the bottom." Desperate to outmuscle the competition, it seems big tech firms would rather dazzle investors with hype than ask "is AI the right tool to solve that problem?"
If we want any hope of keeping the cheating to board games, it's critical that AI developers work with safety, not speed, as their top priority.
More on AI development: The "First AI Software Engineer" Is Bungling the Vast Majority of Tasks It's Asked to Do
Share This Article