


















The US government has directed Anthropic to shut down access to its most powerful AI models, Fable 5 and Mythos 5, worldwide, citing national security concerns. Anthropic is complying but publicly pushing back.
The export control directive bans all access to Fable 5 and Mythos 5 by foreign nationals, whether they're inside or outside the US. Even Anthropic's own foreign employees are affected.
To comply, Anthropic has to cut off access for all customers worldwide. All other Anthropic models remain available, according to the company's statement. Anthropic calls the move a "misunderstanding" and says it's working to restore access as quickly as possible. The company plans to share more details within 24 hours.
According to Anthropic, the government believes it has found a method to bypass Fable 5's safety measures. The company says it reviewed a demo of the technique and found it identifies only "a small number of previously known, minor vulnerabilities" that other publicly available models could also detect.
The potential jailbreak—so far only described verbally by the government—boils down to asking the model to read a specific codebase and fix software bugs. Anthropic says it reviewed the report behind the directive and concluded that the capabilities shown are "widely available from other models," including OpenAI's GPT-5.5. Security researchers already use these capabilities daily to protect systems.
Before launch, the US government, the UK AI Safety Institute (UK AISI), private third-party organizations, and internal teams tested the model for thousands of hours combined. The safety measures are "substantially more effective than those of any previously deployed model," Anthropic says. Users even complained they were too restrictive.
No tester has found a universal jailbreak, a method that could broadly bypass the model's safety measures and unlock a wide range of cyber capabilities. But Anthropic also says that perfect jailbreak resistance isn't possible for any model provider right now, a fact well-documented given the sheer number of attack vectors LLMs offer. Every safeguard used across the industry is vulnerable to non-universal jailbreaks that can extract some information in specific cases, Anthropic says.
Knowing this, the company pursued a strategy it calls "defense in depth": keep jailbreaks either narrowly scoped or expensive to pull off, combined with broad monitoring to quickly detect and shut down successful attacks. Part of this strategy includes 30-day data retention for customer data, which Anthropic says creates "real costs for us with customers" but enables jailbreak research and mitigation.
Anyone who previously criticized Anthropic for fear-based marketing can see the irony here. The company spent months loudly warning about the cybersecurity risks of Mythos-class models, working hard to show how superior the model is. Now it has to argue that models already on the market have similar capabilities.
Anthropic is complying with the order but making its objections clear. "We disagree that the finding of a narrow potential jailbreak should be cause for recalling a commercial model deployed to hundreds of millions of people." If this standard were applied across the industry, it would effectively halt all new model deployments from every frontier model provider, the company says.
In earlier public statements, Anthropic argued that the government should have the power to block unsafe deployments, but through a legal process that is "transparent, fair, clear, and grounded in technical facts." The current action doesn't meet those principles, the company says, hinting that this could become another chapter in the ongoing clash between Anthropic and the US government.
The US government recently issued a new executive order that lets AI developers submit their models for government safety review before release. Anthropic welcomed that approach, but the process apparently wasn't in place yet when the directive came down.
Jailbreaks and the related problem of prompt injections have been an unsolved security problem since the early days of large language models. No LLM maker is immune. The vulnerability has been known since at least GPT-3 and affects all LLM-based systems. ChatGPT and Claude can still be attacked through prompt injection under certain conditions, even though their makers have added countermeasures.
Even targeted security efforts have fallen short. About a year ago, Anthropic built a specialized defense against manipulation attempts and put it through a public jailbreaking challenge. After five days, over 300,000 messages, and roughly 3,700 collective work hours, the system was completely cracked, including a universal jailbreak.
Subscribe to THE DECODER for ad-free reading, a weekly AI newsletter, our exclusive "AI Radar" frontier report six times a year, full archive access, and access to our comment section.
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。