惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Google Online Security Blog
Google Online Security Blog
博客园_首页
酷 壳 – CoolShell
酷 壳 – CoolShell
Jina AI
Jina AI
博客园 - Franky
大猫的无限游戏
大猫的无限游戏
Hugging Face - Blog
Hugging Face - Blog
博客园 - 司徒正美
V
V2EX
雷峰网
雷峰网
云风的 BLOG
云风的 BLOG
V
Visual Studio Blog
F
Full Disclosure
Y
Y Combinator Blog
V
V2EX - 技术
Attack and Defense Labs
Attack and Defense Labs
S
Security @ Cisco Blogs
Schneier on Security
Schneier on Security
Microsoft Azure Blog
Microsoft Azure Blog
SecWiki News
SecWiki News
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
The GitHub Blog
The GitHub Blog
量子位
PCI Perspectives
PCI Perspectives
S
Secure Thoughts
D
Darknet – Hacking Tools, Hacker News & Cyber Security
AWS News Blog
AWS News Blog
Blog — PlanetScale
Blog — PlanetScale
爱范儿
爱范儿
K
Kaspersky official blog
B
Blog
A
Arctic Wolf
Hacker News: Ask HN
Hacker News: Ask HN
L
LangChain Blog
T
Tor Project blog
P
Privacy & Cybersecurity Law Blog
Recent Announcements
Recent Announcements
宝玉的分享
宝玉的分享
The Register - Security
The Register - Security
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
L
Lohrmann on Cybersecurity
D
Docker
A
About on SuperTechFans
H
Hackread – Cybersecurity News, Data Breaches, AI and More
Google DeepMind News
Google DeepMind News
The Last Watchdog
The Last Watchdog
S
Security Affairs
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
P
Privacy International News Feed
Simon Willison's Weblog
Simon Willison's Weblog

Victor Tangermann Archives - Futurism

暂无文章

Anthropic Was So Concerned About Its New Mythos-Based Model’s Power That It Lobotomized Its Ability to Improve Itself
Victor Tangermann · 2026-06-12 · via Victor Tangermann Archives - Futurism

A stylized photo illustration featuring Anthropic co-founder Dario Amodei.

Illustration by Tag Hartman-Simkins / Futurism. Source: Michael M. Santiago / Getty Images; Shutterstock

Sign up to see the future, today

Can’t-miss innovations from the bleeding edge of science and tech

Earlier this year, Anthropic refused to release its Mythos AI model to the public, saying it was simply too dangerous.

At the time, executives claimed the model was capable of punching through powerful cybersecurity safeguards, pointing at researchers who used it to discover thousands of vulnerabilities in widely-used open source code.

Months later, Anthropic was finally ready to go public with the model. On Tuesday, the Dario Amodei-led company announced a Mythos-powered model called Fable 5, which it claims is “safe for general use.”

However, new safeguards quickly frustrated AI researchers, who accused the company of intentionally lobotomizing Fable 5. The backlash was so fierce, Anthropic quickly made adjustments to the policy, as Wired reported on Wednesday, highlighting just how carefully the company is treading.

In its original announcement, Anthropic claimed the safeguards were designed to stop Fable 5 from improving itself, in “new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development.” Just days ahead of the launch, Anthropic released a report on “when AI builds itself,” a trend that “might increase the risks of humans losing control over AI systems.”

However, AI researchers were not impressed by Anthropic hamstringing its latest model’s abilities.

“Anthropic’s latest model will NOT help you if it thinks your ML research/ML engineering is interesting, and/or will secretly degrade its IQ so that the average engineer won’t notice,” AI research firm SemiAnalysis tweeted.

“We are already seeing Anthropic’s latest model’s moderation filters our GPU inference research and programming,” it added.

Other researchers accused Anthropic of using Fable 5 to “shadowban,” or quietly restrict the accounts, of AI researchers. According to the firm’s system card, interventions limiting requests for “frontier LLM development” will “not be visible to the user.”

This last concern, which could’ve effectively sabotaged anybody trying to train competing models by quietly bumping them down to less powerful models without their knowledge, proved controversial enough for Anthropic to change its mind.

“We’re changing Fable 5’s safeguards for frontier LLM development to make them visible,” the company told Wired in a statement. “We made the wrong trade-off and we apologize for not getting the balance right.”

“It felt like Anthropic was saying to the public, ‘We don’t trust anybody else to do AI research,” AI startup Prime Intellect research lead Will Brown told the publication. “We are the only ones who have to do AI research.”

It all comes in the context Anthropic calling for a global freeze on AI advances while discussing the dangers of “recursive self-improvement.” In other words, the company is making a lot of noise about a sci-fi-sounding possibility: that AI will start to rapidly improve itself, potentially escaping the control of its human creators.

Beyond limiting its ability to develop AI tools, Fable 5’s new safeguards also trigger when it encounters requests “related to cybersecurity, biology and chemistry, or distillation.” Distillation is effectively using machine learning to train a “student” model on the behavior and reasoning of a “teacher” model, a practice that has sparked its fair share of controversy.

Anthropic has already publicly griped about large-scale attempts to distill, or “extract” its underlying model — a hypocritical stance given its indiscriminate scraping of rights-protected content on the web to train its AI in the first place.

More on Anthropic: Anthropic Scared, Calls for Global Freeze on AI Advances