Anthropic Was So Concerned About Its New Mythos-Based Model's Power That It Lobotomized Its Ability to Improve Itself

A stylized photo illustration featuring Anthropic co-founder Dario Amodei. — Illustration by Tag Hartman-Simkins / Futurism. Source: Michael M. Santiago / Getty Images; Shutterstock

Earlier this year, Anthropic refused to release its Mythos AI model to the public, saying it was simply too dangerous.

At the time, executives claimed the model was capable of punching through powerful cybersecurity safeguards, pointing at researchers who used it to discover thousands of vulnerabilities in widely-used open source code.

Months later, Anthropic was finally ready to go public with the model. On Tuesday, the Dario Amodei-led company announced a Mythos-powered model called Fable 5, which it claims is “safe for general use.”

However, new safeguards quickly frustrated AI researchers, who accused the company of intentionally lobotomizing Fable 5. The backlash was so fierce, Anthropic quickly made adjustments to the policy, as Wired reported on Wednesday, highlighting just how carefully the company is treading.

In its original announcement, Anthropic claimed the safeguards were designed to stop Fable 5 from improving itself, in “new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development.” Just days ahead of the launch, Anthropic released a report on “when AI builds itself,” a trend that “might increase the risks of humans losing control over AI systems.”

However, AI researchers were not impressed by Anthropic hamstringing its latest model’s abilities.

“Anthropic’s latest model will NOT help you if it thinks your ML research/ML engineering is interesting, and/or will secretly degrade its IQ so that the average engineer won’t notice,” AI research firm SemiAnalysis tweeted.

“We are already seeing Anthropic’s latest model’s moderation filters our GPU inference research and programming,” it added.

Other researchers accused Anthropic of using Fable 5 to “shadowban,” or quietly restrict the accounts, of AI researchers. According to the firm’s system card, interventions limiting requests for “frontier LLM development” will “not be visible to the user.”

This last concern, which could’ve effectively sabotaged anybody trying to train competing models by quietly bumping them down to less powerful models without their knowledge, proved controversial enough for Anthropic to change its mind.

“We’re changing Fable 5’s safeguards for frontier LLM development to make them visible,” the company told Wired in a statement. “We made the wrong trade-off and we apologize for not getting the balance right.”

“It felt like Anthropic was saying to the public, ‘We don’t trust anybody else to do AI research,” AI startup Prime Intellect research lead Will Brown told the publication. “We are the only ones who have to do AI research.”

It all comes in the context Anthropic calling for a global freeze on AI advances while discussing the dangers of “recursive self-improvement.” In other words, the company is making a lot of noise about a sci-fi-sounding possibility: that AI will start to rapidly improve itself, potentially escaping the control of its human creators.

Beyond limiting its ability to develop AI tools, Fable 5’s new safeguards also trigger when it encounters requests “related to cybersecurity, biology and chemistry, or distillation.” Distillation is effectively using machine learning to train a “student” model on the behavior and reasoning of a “teacher” model, a practice that has sparked its fair share of controversy.

Anthropic has already publicly griped about large-scale attempts to distill, or “extract” its underlying model — a hypocritical stance given its indiscriminate scraping of rights-protected content on the web to train its AI in the first place.

More on Anthropic: Anthropic Scared, Calls for Global Freeze on AI Advances

Anthropic Was So Concerned About Its New Mythos-Based Model’s Power That It Lobotomized Its Ability to Improve Itself

Top Security Experts Alarmed by Power of Anthropic’s New Hacker AI

Rogue Group Gains Access to Anthropic’s Dangerous New Mythos AI

Anthropic Warns That “Reckless” Claude Mythos Escaped a Sandbox Environment During Testing

Anthropic Caught Secretly Spying on Users

Anthropic Customers Creeped Out by Its Newest Models

Anthropic Just Leaked Upcoming Model With “Unprecedented Cybersecurity Risks” in the Most Ironic Way Possible

Anthropic Scared, Calls for Global Freeze on AI Advances

Anthropic Furious at DeepSeek for Copying Its AI Without Permission, Which Is Pretty Ironic When You Consider How It Built Claude in the First Place

Protestors Outside Anthropic Warn of AI That Keeps Improving Itself

US Companies Are Realizing That Chinese AI Models Are Way Cheaper, Ditch American Ones

Anthropic Drops Its Huge Safety Pledge That Was Supposedly the Whole Point of the Company

Frontier AI Models Are Doing Something Absolutely Bizarre When Asked to Diagnose Medical X-Rays

Anthropic Blowout With Military Involved Use of Claude for Incoming Nuclear Strike

Anthropic Safety Researchers Run Into Trouble When New Model Realizes It’s Being Tested

Anthropic’s Chief Scientist Says We’re Rapidly Approaching the Moment That Could Doom Us All

Certain Chatbots Vastly Worse For AI Psychosis, Study Finds

FOLLOW US

DISCLAIMER(S)

Sign up to see the future, today