Skip to main content

Comment Loader Save StorySave this story

Comment Loader Save StorySave this story

Anthropic is backtracking on a policy that would have covertly limited competitors from using its new AI model, Claude Fable 5, to develop other AI models. The company changed course after the move received significant backlash from the AI research community.

“We’re changing Fable 5’s safeguards for frontier LLM development to make them visible,” Anthropic said in a statement to WIRED. “We made the wrong tradeoff and we apologize for not getting the balance right.”

Anthropic released Claude Fable 5, a version of its latest AI model with additional safety guardrails designed to prevent misuse, earlier this week. Some of the safeguards Anthropic decided on were unsurprising: The company said it would reroute users who asked questions about cybersecurity, biology, or chemistry to a less capable AI model to reduce the chances of someone using the advanced AI to carry out a cyberattack or build a bioweapon.

But for researchers trying to use Claude Fable 5 for frontier AI development, Anthropic outlined a different approach. The firm would deliberately degrade the model’s performance in ways that were invisible to the user. The move would effectively sabotage researchers trying to use Claude to train competing AI models, which Anthropic explicitly bans in its terms of service.

Got a Tip?
Are you a current or former Anthropic employee who wants to talk about what's happening? We'd like to hear from you. Using a nonwork phone or computer, contact the reporter securely on Signal at mzeff.88.

Anthropic now says it’s changing course, and that Claude Fable 5’s safeguards for AI development will be visible to users. If the company suspects a user is trying to use Claude to build a highly capable AI it will alert them that it’s either refusing the request, or rerouting the user to a less capable model.

Anthropic reversed the policy after it received fierce backlash from the AI research community. Anthropic has already taken steps to limit competitors from using Claude to build closed and open source AI models, but critics say that quietly degrading the model’s performance for certain users went a step too far. Claude’s coding agent has become a favored tool among developers, including those working on open-source AI research projects, and researchers tell WIRED that the company’s latest policy could have led to a troubling future in which only a handful of leading AI labs could perform advanced AI research.

Dean Ball, a senior fellow at the Foundation for American Innovation and a former advisor to the White House on AI, wrote in a post on X that “degrading performance on ML research *without telling the user* is shockingly hostile and a terrible look.” He continued in another post that the “secret sabotage” policy undermines Anthropic’s overall stance, because it limits AI researchers from collaborating on AI safety.

“It felt like Anthropic was saying to the public, ‘We don't trust anybody else to do AI research. We are the only ones who have to do AI research,” says Will Brown, research lead at the open source AI startup Prime Intellect. “It feels a bit like they’re starting to pull the ladder up behind them.”

Brown said the policy would also have left developers in the dark about whether they were violating Anthropic’s rules, since the company wouldn’t alert them when its safeguards were triggered. He added that the restrictions could have had widespread consequences. For example, he pointed to the growing ecosystem of third-party evaluation firms that test frontier models for safety, performance, and reliability—work that could have been hindered if Anthropic secretly degraded its model.

Most Popular

  • Everything Apple Announced Today at WWDC 2026

Gear

Everything Apple Announced Today at WWDC 2026

By Boone Ashworth

  • Meta Deletes Face-Recognition System From Its Smart Glasses App After WIRED Report

Security News

Meta Deletes Face-Recognition System From Its Smart Glasses App After WIRED Report

By Dhruv Mehrotra

  • AI Has Come for Serif Fonts

Digital Culture

AI Has Come for Serif Fonts

By John Semley

  • All the Ways Europe Is Ditching American Technology

Security News

All the Ways Europe Is Ditching American Technology

By Matt Burgess

Anthropic said it implemented the measures because Claude has become increasingly effective at accelerating AI research. In a recent blog post, the company said it is concerned that AI could improve its capabilities faster than society can adapt to them. Anthropic argued that it would be “good for the world to have the option to slow or temporarily pause frontier AI development to enable societal structures and alignment research to keep up.”

“These safeguards prevent foreign adversaries from using our most capable models in ways that pose severe safety risks. The US and its allies hold an edge in frontier chips and the highly optimized software that runs them at full potential,” the company said in a statement to WIRED. “These safeguards ensure Claude isn't used to erode that advantage—by optimizing chips developed by those adversaries, for example […] In deciding whether to make them visible or invisible we faced a choice. A hidden safeguard is harder to probe and work around. This means the safeguards can be targeted much more narrowly.”

Anthropic says that because this safeguard around AI development is now visible, it needs to cast a wider net, meaning more benign requests may trigger its safeguards. The company says it’s working to make its classifiers more precise as quickly as possible.

Comments

Back to topTriangle

You Might Also Like

Maxwell Zeff is a senior writer at WIRED covering the business of artificial intelligence. He was previously a senior reporter with TechCrunch, where he broke news on startups and leaders driving the AI boom. Before that, Zeff covered AI policy and content moderation for Gizmodo and wrote some of Bloomberg’s ... Read More

Senior Writer

Topics Anthropic Claude generative AI artificial intelligence Startups

Illinois Lawmakers Just Passed America’s Strongest AI Safety Bill

Illinois Lawmakers Just Passed America’s Strongest AI Safety Bill

The bill requires companies like OpenAI, Anthropic, and Google to have third parties confirm they’re following safety standards. Illinois governor JB Pritzker says he’ll sign it.

Maxwell Zeff

OpenAI and Anthropic Sign Letter to Prevent AI-Developed Biological Weapons

OpenAI and Anthropic Sign Letter to Prevent AI-Developed Biological Weapons

Leading AI labs, executives, and scientists are sending a letter to lawmakers urging them to improve tracking of synthetic DNA sequences that could be used for bioweapons.

Emily Mullin

The AI Era Is Creating a Bug-Hunting Arms Race

The AI Era Is Creating a Bug-Hunting Arms Race

As attackers ramp up their AI exploit development, the search for software vulnerabilities is changing rapidly.

Lily Hay Newman

Mira Murati Wants Her AI to ‘Keep Humans in the Loop’

Mira Murati Wants Her AI to ‘Keep Humans in the Loop’

The Thinking Machines Lab founder and former CTO of OpenAI tells WIRED she isn’t interested in automating people out of jobs. Instead, she’s building AI that can collaborate.

Will Knight

Can OpenAI’s ‘Master of Disaster’ Fix AI’s Reputation Crisis?

Can OpenAI’s ‘Master of Disaster’ Fix AI’s Reputation Crisis?

Global affairs chief Chris Lehane wants to tone down the debate over AI’s societal impacts—and get states to pass laws that won’t derail OpenAI’s meteoric rise.

Maxwell Zeff

Former OpenAI Staffers Warn That xAI’s Poor Safety Record Could Complicate SpaceX’s IPO

Former OpenAI Staffers Warn That xAI’s Poor Safety Record Could Complicate SpaceX’s IPO

The ex-employees, who cofounded a new AI watchdog group, say investors deserve more information about xAI’s safety practices before SpaceX goes public.

Maxwell Zeff

Overworked AI Agents Turn Marxist, Researchers Find

Overworked AI Agents Turn Marxist, Researchers Find

In a recent experiment, mistreated AI agents started grumbling about inequality and calling for collective bargaining rights.

Will Knight

Former Google and Apple Researchers Launch a Startup to Build AI’s Missing Feedback Loop

Former Google and Apple Researchers Launch a Startup to Build AI’s Missing Feedback Loop

Trajectory is betting the rapid iteration cycle that supercharged vibe-coding can help all kinds of companies build AI products that learn continuously.

Maxwell Zeff

Why the Vatican Invited Anthropic to the Pope’s AI Encyclical Presentation

Why the Vatican Invited Anthropic to the Pope’s AI Encyclical Presentation

Pope Leo’s first encyclical marks an unprecedented alliance between the Church and Silicon Valley.

Daniele Polidoro

The Trump Administration Is at War With Itself Over AI Regulation

The Trump Administration Is at War With Itself Over AI Regulation

Donald Trump killed an executive order to regulate AI. Now, administration officials and AI executives are trying to figure out if there’s anything left to piece back together.

Maxwell Zeff

Anthropic Confidentially Files for What Could Be the Largest IPO Ever

Anthropic Confidentially Files for What Could Be the Largest IPO Ever

The AI giant behind Claude submitted paperwork on Monday that would take it public, just a couple of weeks after SpaceX’s splashy IPO announcement.

Paresh Dave

The Vatican’s Man Inside Anthropic

The Vatican’s Man Inside Anthropic

Pope Leo XIV may not be able to disarm AI, but he’s got the attention of the industry.

Steven Levy

Read Original at WIRED