惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

T
Tenable Blog
Last Week in AI
Last Week in AI
P
Proofpoint News Feed
Engineering at Meta
Engineering at Meta
H
Help Net Security
F
Fortinet All Blogs
MyScale Blog
MyScale Blog
宝玉的分享
宝玉的分享
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
博客园 - 司徒正美
量子位
N
Netflix TechBlog - Medium
Apple Machine Learning Research
Apple Machine Learning Research
小众软件
小众软件
Recorded Future
Recorded Future
博客园 - 三生石上(FineUI控件)
Vercel News
Vercel News
aimingoo的专栏
aimingoo的专栏
I
InfoQ
Microsoft Security Blog
Microsoft Security Blog
Scott Helme
Scott Helme
The Last Watchdog
The Last Watchdog
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
IT之家
IT之家
AI
AI
WordPress大学
WordPress大学
Security Archives - TechRepublic
Security Archives - TechRepublic
Google Online Security Blog
Google Online Security Blog
U
Unit 42
V2EX - 技术
V2EX - 技术
MongoDB | Blog
MongoDB | Blog
Schneier on Security
Schneier on Security
博客园 - Franky
H
Heimdal Security Blog
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Jina AI
Jina AI
W
WeLiveSecurity
P
Privacy & Cybersecurity Law Blog
Cloudbric
Cloudbric
B
Blog RSS Feed
N
News | PayPal Newsroom
S
Securelist
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
I
Intezer
Hacker News - Newest:
Hacker News - Newest: "LLM"
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
博客园_首页
罗磊的独立博客
H
Hackread – Cybersecurity News, Data Breaches, AI and More
雷峰网
雷峰网

LessWrong

CLR's Safe Pareto Improvements Research Agenda — LessWrong My Last 7 Blog Posts: a weekly round-up — LessWrong Quality Matters Most When Stakes are Highest — LessWrong If a room feels off the lighting is probably too "spiky" or too blue — LessWrong Stop AI Now — LessWrong Stupid Minutes Reevaluating "AGI Ruin: A List of Lethalities" in 2026 Who I Follow What's the LessWrongist philosophy of mathematics? MixedHTML Mode for Emacs Summarizing and Reviewing my earliest ML research paper, 7 years later Stop AI Resources for starting and growing an AI safety org There are only four skills: design, technical, management and physical Fifteen Years Aboard Arguments Should Be Decisive Criticisms — LessWrong The map is part of the territory — LessWrong “Best humans still outperform”: One turning point in the history of cope around artificial intelligence — LessWrong Society is a social construct, pace Arrow — LessWrong Consent-Based RL: Letting Models Endorse Their Own Training Updates — LessWrong AI #164: Pre Opus — LessWrong Publish-first writing — LessWrong What does status signalling do? When successful, what does it achieve? — LessWrong Let goodness conquer all that it can defend — LessWrong Why I'm Less of a Shill for Related Work Sections — LessWrong From Artificial Intelligence to an ecosystem of artificial life-forms. — LessWrong If You've Never Bought a Tool You Didn't Need, You're Not Buying Enough Tools — LessWrong Verify, but Trust — LessWrong Taking political violence seriously — LessWrong Against Doom & Pause AI — LessWrong Come to Manifest 2026! (June 12-14) — LessWrong How Big Tech Becomes Ungovernable — LessWrong Attempting to Quantify Chinese Bias in Open-Source LLMs — LessWrong A Research Bet on SAE-like Expert Architectures — LessWrong Church Planting: Lessons from the Comments — LessWrong On Dwarkesh Patel’s Podcast With Nvidia CEO Jensen Huang — LessWrong Anthropic Releases Opus 4.7 — LessWrong Specialization is a Driver of Natural Ontology — LessWrong You can only build safe ASI if ASI is globally banned — LessWrong Laptop stands are a thing your neck may appreciate — LessWrong Simulated Qualia Mugging — LessWrong You Aren't in Charge of the Overton Window; Politics Is Not Interior Design — LessWrong Post-Scarcity is bullshit — LessWrong Two Examples of Joy in the Seemingly Mundane — LessWrong How to run from a bull — LessWrong Carpathia Day — LessWrong Do not conquer what you cannot defend — LessWrong What economists get wrong (and sometimes right!) about AI — LessWrong Reflections of a Wordcel — LessWrong MAISU 2026 - Minimal AI Safety Unconference (April 24-27, online) — LessWrong Not a Goal. A Goal-like behavior. — LessWrong A visualization of changing AGI timelines, 2023 - 2026 — LessWrong What is the Iliad Intensive? — LessWrong LLM-tier personal computer security — LessWrong Beware of Well-Written Posts — LessWrong The Mirror Test Is Complicated — LessWrong Political Violence Is Never Acceptable — LessWrong AI Safety's Biggest Talent Gap Isn't Researchers. It's Generalists. — LessWrong Clique, Guild, Cult — LessWrong Your body is not a white box (and you're thinking about weight loss wrong) — LessWrong Counterintuitive Coin Toss. Part II — LessWrong An Ode to Humility and Curiosity in the New Machine Era [Hot take] Problems with AI prose You can’t trust violence — LessWrong The Blast Radius Principle — LessWrong On not being scared of math — LessWrong Why I'm excited about meta-models for interpretability — LessWrong The Ethics of AI-Assisted Creative Work — LessWrong How to make good tea — LessWrong Searchable explorer of EA Forum & LessWrong posts with explicit cruxes or "change my mind" content — LessWrong Constitutional AI vs. RLHF vs. Deliberative Alignment — LessWrong Eating meat is fine if you live in a simulation — LessWrong Tactics for Denying Your Motivations, or Why Legibility is Expensive — LessWrong Spectra of LSRDRs of the Okubo algebra — LessWrong Your Mom is a Chimera — LessWrong An apple picking model for AI R&D — LessWrong Dreams of the Future — LessWrong Pausing AI Is the Best Answer to Post-Alignment Problems — LessWrong Quick Thoughts About Mythos — LessWrong A permitted value of resting — LessWrong Scott Alexander gentrified my meetup — LessWrong Claude Interviews Me About Writing — LessWrong Catching illicit distributed training operations during an AI pause — LessWrong Proof Explained: Touchette-Lloyd Theorem — LessWrong 10% ≈ 90% — LessWrong Anthropic Shadow Realm (working notes) — LessWrong the Lazy Market Hypothesis — LessWrong Announcing ILIADIII: AENEID — LessWrong Have we already lost? Part 3: Reasons for Optimism — LessWrong Dario probably doesn't believe in superintelligence — LessWrong Why Nothing Ever Happens — LessWrong Could a single rogue AI destroy humanity? — LessWrong Hi. I am hbj. — LessWrong Getting Claude to rank the inkhaven bloggers — LessWrong Some thoughts on Nectome's risk and resilience — LessWrong The median take is taken — LessWrong If Mythos actually made Anthropic employees 4x more productive, I would radically shorten my timelines — LessWrong Biological Computing Underhang — LessWrong Claude Mythos #2: Cybersecurity and Project Glasswing — LessWrong The Unintelligibility is Ours: Notes on Chain-of-Thought — LessWrong
Why did people miss the point on Mythos?
draganover · 2026-04-26 · via LessWrong
I have seen a lot of coverage suggesting that Claude’s new model, Mythos, is a vehicle for Anthropic to peddle hype and doom in order to raise money. While some of this is necessarily motivated by people’s unwillingness to stare into the abyss of our AI future, the breadth of otherwise-reasonable people who have voiced these kinds of cynical opinions suggests that part of the blame rests on Anthropic. In this post, I want to briefly unpack the way people misinterpreted the evidence, their valid reasons for doing so, and what Anthropic (and the AI safety community more broadly) should learn from this. In particular, since we will inevitably see other dangerous capabilities spontaneously emerge in the future, we need protocols for how to announce them effectively. To this end, I try to make the following points: We should be mindful of the public’s sympathy for AI safety prophecies and orient ourselves towards growing this sympathy rather than expending it. I call this our don't-be-annoying capital. People have good reasons to be skeptical of Anthropic’s claims. Overcoming this requires a particularly high burden of evidence. We should acknowledge that Anthropic has a conflict of interest when presenting doomer perspectives, and we should account for this going forward. 1. Mythos criticisms & missing the point In one podcast from The Guardian , the reporter says that “this Mythos debacle [has led to] heads of state saying ‘this is so dangerous, it could shred our infrastructure, the end of civilization is nigh!’”. She then says that “accepting the companies’ premise that they are creating a machine god” is helping these companies sell their product. In a different podcast , Cal Newport (a professor of computer science at Georgetown University!) concludes his analysis of Mythos by saying “it was wrong for Mythos to get the amount of dread-coverage that it got; so far we do not have evidence that it represents a significantly larger leap in detecting or exploiting vulnerabilities than we’ve seen in previous releases.” He has since gone on other podcasts to make these points . Similarly, the YouTube channel Internet of Bugs posted a video titled “ Anthropic’s $x00 Million Marketing Stunt ” saying that “we’ve been seeing a lot of this particular attention-grabbing technique lately. I’m sure it’s going to only be getting worse as the AI companies get more desperate to keep up the flow of investment dollars. How are we supposed to believe this shit?” To be very clear, the point is not only that Anthropic have a model which is good at cybersecurity tasks. The point is that scaling laws are holding and that the inevitable acceleration continues. With each new model, we unlock new and mysterious risks which we have to grapple with. In this case, it was cybersecurity, but it could have realistically been anything. This bigger story was essentially lost amid the hoopla. 2. It feels like people wanted to miss the point? I understand the instinct to say a company is hype-mongering when it says it has a big scary thing. E.g., Sam Altman’s tweet of the death star before GPT-5. But I am surprised at how much people are willing to focus on evidence to support their prior that Anthropic is just engaging in corporate shenanigans. For example, there’s this post from an LLMs-for-cybersecurity org saying that other, smaller models were able to find the bugs that Mythos found. They write “we took the specific vulnerabilities Anthropic showcases in their announcement, isolated the relevant code, and ran them through small, cheap, open-weights models. Those models recovered much of the same analysis.” This was retweeted by the HuggingFace CEO and, subsequently, used as evidence by all three of the above podcasts/videos. Am I going crazy? Isn’t this pretty bad evidence towards dismissing all the claims Anthropic is making? As said by others , it’s the equivalent of isolating the clump of haystack with a needle in it, giving this clump to a small child, and then saying “wow they were able to find the needle too”. The point is that locating that clump is the hard part! 3. There must be a lesson here. If we ignore the whole “accelerating us towards the machine god” thing, I think that Anthropic behaved responsibly with Mythos. I also think there are lessons to be learned regarding the publicity, as evidenced by the fact that reasonable people consistently missed the point. Anthropic is in the business of making extremely unpopular things which they claim could ruin society as we know it. They also repeatedly say it is too dangerous for the public to have access to these things (I note that caution seems generally warranted ). It is reasonable for people to think Anthropic is crying wolf, especially given that they are on track to have the highest IPO of all time and got there, in part, via prophecies. Unfortunately, I am in the camp of people who think the prophecies are true. This means that, for me, evidence which supports Anthropic’s worldview exists in superposition: it is both hype and responsible, both doomer and appropriate. But as we’ve seen, it is difficult to convince other people that this is all really happening. They will (with good reason) consider this self-serving and they will (with good reason) not want to confront reality. If people perceive us as crying wolf, they will grow weary of our frenetic anxieties and our cause will go the way of pandemic-preparedness. Consequently, I think of AI safety as operating with some amount of don’t-be-annoying capital [1] . This is the amount of sympathy the general public has towards our concerns. It is amassed when AI causes things to go wrong (in the public’s consciousness, not in ours). It is expended when we make claims that people don’t want to believe. It is also expended when AI companies make claims which are plausibly self-serving. It is expended even faster when these claims are poorly substantiated. I argue we should frame public interactions around trying to grow this resource. 4. Some options for next time Let’s put ourselves in Anthropic’s shoes: you just made your latest digital nuke. You have to do something with it. What are your options? You could sit on it quietly. This is obviously terrible, especially if it leaks. You could release it publicly. This runs the risk of being catastrophic. You could announce it and route it to defensive/government use. This is what they did with Mythos. You could announce that you were able to produce it but then destroy it after using it to harden infrastructure. This is likely the most altruistic option, but then you wouldn’t be able to use it to gain an accelerative advantage over your competitors. You could delete it and never tell anyone. This could also be a PR disaster if it is leaked. I have the sense that you need to announce it somehow. But, if you announce it, you likely expend capital. If the thing you announce does not live up to the hype, you expend even more capital. And we’ve seen that reasonable people will look for unreasonable excuses to dismiss your claims, draining your capital further. To this end, Anthropic should have done more due diligence with the Mythos release: their model card (while thorough) did not have the rigor of a comprehensive scientific study. The cybersecurity assessment takes up only 6 pages of the otherwise 200+ page-long system card (pages 46-52); it includes four experiments where they only compare to Anthropic-line models. The blog post goes into more detail about how they found zero-days but again misses some due diligences. For example, they could have evaluated other, non-Anthropic models across the capabilities spectrum to verify that Mythos is uniquely able to do these tasks. They also could have run some controls. Extraordinary claims require extraordinary evidence. I also believe Anthropic should be more up-front about their conflict of interest with regards to making statements of doom. For as good as I believe their intentions here to be, this conflict of interest is real and people are right to perceive it. One simple option to avoid these perceptions would be to prioritize the analysis coming from independent non-profit evaluators. Credit where it’s due: Anthropic did have the UK AISI provide support for Mythos's capabilities. Across the cynical takes I’ve seen, this analysis was treated with more respect. Of course, even this runs into cynicism, since people will start to think the non-profits are in cahoots with the companies, as evidenced by the comments on the recent NYT profile of METR : Finally, although random dangerous capabilities will emerge, the point is not any one specific capability. I think letting the narrative get oriented towards the cybersecurity elements of Mythos did a disservice to the public: I doubt most people internalized how big the overall capabilities jump here was, nor that the next such jump will bring new harms into view. I’m also sure there are considerations which I am not privy to, and recognize that it’s easy to criticize from my cozy corner where nothing I do moves the stock market. Nonetheless, Anthropic’s first-mover advantage on identifying dangerous capabilities also endows them with a first-doomer responsibility. thanks to Erin, Steven, Justin, Joseph and Li-lian for comments. [edit log]: changed some wording and added a sentence about the blog post from the red team. Also changed the title to be more in line with what I was trying to say; old title was "Anthropic spent too much don't-be-annoying capital on Mythos" but, in retrospect, this isn't fully representative of my view. ^ I call it 'don't-be-annoying capital' because that's the lived experience on the receiving end and I think it is good to think about this from the perspective of the audience. I'll admit that something like 'warning fatigue' might be a more representative name. Discuss