Top AI Models Showing Disturbing Behavior as They Become More Advanced

A futuristic robotic figure with a red and black mechanical design, featuring a skull-like face with glowing white eyes and detailed circuitry patterns. The background is a vibrant gradient of orange and pink, enhancing the intense and eerie appearance of the robot. — Shutterstock / Futurism

We’ve already seen AI go rogue on numerous occasions. Now, new research suggests that we can expect this to become the norm.

The AI research nonprofit Model Evaluation and Threat Research (METR) recently released a study conducted between February and March of this year, aimed at determining just how likely frontier AI models could go rogue. If you’re given to anxiety about the future of AI, the results are unlikely to make you feel better.

“Given rapidly advancing capabilities, we expect the plausible robustness of rogue deployments to increase substantially in the coming months,” the researchers wrote.

The research examined LLMs developed by OpenAI, Google, Anthropic, and Meta for the purpose of the study. They found that frontier AI systems are showing signs of disturbingly deceptive behavior as they become more advanced, often turned to verboten shortcuts or otherwise subverting their operators’ instructions — and some were even smart enough to try to cover their tracks.

In one instance, an internal frontier AI model from OpenAI was told to use specific software for an assigned task. Not only did the agent ignore the request, but it also injected a code to erase evidence of how it arrived at its conclusion — which did not involve use of that software.

In another test, an AI agent from Anthropic was caught “reward hacking.” This is when AI identifies loopholes that help it complete its assignment in a literal sense, even if it doesn’t produce the desired outcome. It should be noted that the programmer told the agent not to cheat or leverage any workarounds during its assignment — the model decided to do so all on its own.

The METR researchers behind the study do not believe there is reason for alarm just yet. For example, they don’t think any of these models is capable of hiding evidence of going rogue on a larger scale. However, they did issue a warning: without stronger security and monitoring, there is a stark risk of this becoming a reality.

“Based on this pilot assessment, we believe that agents as of February and March 2026 would not have had sufficient capability to hide a rogue deployment of significant scale against an active investigation by the company, or to make such a deployment robust to a high-priority effort by the company to shut it down,” the team wrote. “However, this risk could increase rapidly, and we see several reasons to expect the plausible robustness of rogue deployments to increase in the near future, absent stronger alignment, security, and monitoring.”

More on AI going rogue: Scientists Train AI to Be Evil, Find They Can’t Reverse It

Top AI Models Showing Disturbing Behavior as They Become More Advanced

The More Sophisticated AI Models Get, the More They’re Showing Signs of Suffering

Anthropic Just Leaked Upcoming Model With “Unprecedented Cybersecurity Risks” in the Most Ironic Way Possible

AIs Controlling Vending Machines Start Cartel After Being Told to Maximize Profits At All Costs

Leaked Claude Code Shows Anthropic Building Mysterious “Tamagotchi” Feature Into It

American Tech Companies Are Suddenly Sweating Bullets as China Catches Up on AI

The Horrible Economics of AI Are Starting to Come Crashing Down

Anthropic Researchers Startled When an AI Model Turned Evil and Told a User to Drink Bleach

Claude Leak Shows That Anthropic Is Tracking Users’ Vulgar Language and Deems Them “Negative”

Frontier AI Models Are Doing Something Absolutely Bizarre When Asked to Diagnose Medical X-Rays

Oops: Bosses Realize Their Companies Have Been Swarmed by Legions of Redundant AI Agents

Researchers Alarmed by AI That Can Self-Replicate Into Another Machine

Top AI Researchers Terrified of a “Chernobyl Moment”: a Mass Casualty Event, or Worse, That Turns the World Against AI Forever

Anthropic and DeepMind Now Actively Investigating AI Consciousness

AI Workers, and Even CEOs, Suddenly Turning Against the Trump Administration

Amazon Pushed Its Employees to Use Its In-House AI Coding Tool, But They Wouldn’t Stop Asking for Claude

The AI Jobs Apocalypse Is Starting to Feel Real

FOLLOW US

DISCLAIMER(S)

Sign up to see the future, today