惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Application and Cybersecurity Blog
Application and Cybersecurity Blog
Recent Commits to openclaw:main
Recent Commits to openclaw:main
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
Webroot Blog
Webroot Blog
PCI Perspectives
PCI Perspectives
Forbes - Security
Forbes - Security
AI
AI
H
Hacker News: Front Page
H
Heimdal Security Blog
Engineering at Meta
Engineering at Meta
S
Secure Thoughts
TaoSecurity Blog
TaoSecurity Blog
博客园 - 司徒正美
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
T
Tailwind CSS Blog
Schneier on Security
Schneier on Security
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
人人都是产品经理
人人都是产品经理
Google Online Security Blog
Google Online Security Blog
酷 壳 – CoolShell
酷 壳 – CoolShell
月光博客
月光博客
The Cloudflare Blog
C
Comments on: Blog
T
The Blog of Author Tim Ferriss
Attack and Defense Labs
Attack and Defense Labs
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
Cloudbric
Cloudbric
大猫的无限游戏
大猫的无限游戏
S
Security Affairs
aimingoo的专栏
aimingoo的专栏
MyScale Blog
MyScale Blog
宝玉的分享
宝玉的分享
博客园 - 三生石上(FineUI控件)
博客园 - 叶小钗
Recent Announcements
Recent Announcements
Hacker News - Newest:
Hacker News - Newest: "LLM"
Recorded Future
Recorded Future
Help Net Security
Help Net Security
M
MIT News - Artificial intelligence
GbyAI
GbyAI
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
美团技术团队
罗磊的独立博客
爱范儿
爱范儿
F
Full Disclosure
腾讯CDC
The Register - Security
The Register - Security
博客园_首页
Latest news
Latest news
SecWiki News
SecWiki News

The New Stack | DevOps, Open Source, and Cloud Native News

Agentic development hinges on verification. For cloud-native software, that is a runtime problem. AI agents need infrastructure: Why Europe’s regional cloud strategy matters Transform your AI coding agent into a deterministic Java Spring expert WeAreDevelopers is coming to the US to give unsung developers a bigger voice Cleaner AI training data, fewer bugs: Sonar’s SonarSweep explained Observability overload is drowning engineers Google’s DiffusionGemma is 4x faster than its other Gemma models Fable 5: Guardrails and burn rate are annoying users, who say it’s still better than Opus 4.8 The Anthropic leader who built Claude Code says he ditched prompting — now he just writes loops. AWS can now mathematically prove your VMs are isolated Microsoft pulled 73 GitHub repos after malware attack — but still won’t say who’s compromised Databricks wants to kill the “email me a file” problem for AI agent skills Ramp bets forward deployed engineers can do what off-the-shelf finance AI can’t Git real: AI agents aren’t just for solo developers anymore Anthropic launches Claude Mythos/Fable 5, but you better try it soon This AI agent startup ditched Anthropic for DeepSeek — and says it’s saving millions When your data model is the bottleneck: lessons from Medium’s feature store How long before we stop reading the code? The tokenmaxxing party is over, and Revenium is mopping up How AI is solving the memory crunch it created Microsoft’s pitch to enterprises: Ditch Azure Repos for GitHub, despite its rocky reliability record Claude Code’s biggest upgrade yet ran 5 agents at once — here’s what happened Why Anthropic just doubled Claude Cowork limits at no charge For years, Apache Cassandra handed this work to your team — 6.0 takes it back “A dangerous combination”: The 2 factors that can “corrupt” AI agent workflows With Foundry, Microsoft bets the enterprise AI battle is about reliability, not capability Microsoft unlocks Visual Studio for developers left behind by its own AI AI teams now deploy 1,000 times a month. Your pipeline wasn’t built for that. Microsoft just made the agent runtime free — and kept everything around it “Whoever builds the most joyous product wins”: The agent war begins Netlify CTO Dana Lawson: Writing code is no longer the job From Jupyter Notebook to production: How to ship AI systems that actually work OpenClaw used Gavriel Cohen’s code and exposed the AI Agent accountability problem Replit shows how vibe coding is getting its own financial stack — and a path to profit Cloudflare aqui-hires VoidZero: Did a piece of the open web just stabilize, or become more brittle? Cursor cuts prices and adds enterprise spend controls amid “tokenomics” reckoning Google Gemma 4 12B nearly matches 26B benchmarks — and runs on your laptop Snowflake thinks it knows what’s really slowing developers down Autonomous agents have met their biggest challenge yet: The database. Why agentic AI makes the ops platform the most important layer in the enterprise How to dramatically improve enterprise security alert tuning to battle cyberattacks Why the need for humans won’t disappear in the age of autonomous databases How to secure Kubernetes in the age of AI workloads Asana says its new AI “chief of staff” turns your Slack chaos into trackable work Nvidia’s best model is now live Mate Security’s Asaf Wiener made every backend engineer a model router. He’s right to. The AI cost crisis finally has a watchdog — just not the companies causing it How to get operational data off the factory floor without creating an IT breach Why CPUs still matter in the age of AI agents Rayfin: Microsoft’s answer to the gap between vibe coding and enterprise production Microsoft bets the enterprise AI race will be won on data context, not model power “A successful attack could be catastrophic”: Anthropic gives more groups access to Claude Mythos How GitHub plans to win developers back Microsoft really, really, really wants developers to love Windows again With Intelligent Terminal, Microsoft is reinventing the Windows terminal Microsoft debuts “Scout” at Build, a new personal agent for work OpenAI’s Codex adds new tools — Sites, Annotations, more plugins — for knowledge workers GitHub Copilot’s usage-based billing is live: Here’s what you need to know OpenAI, Anthropic, Google, Amazon, and xAI all fail on type of attack, study finds JetBrains open-sources Mellum2 to go where Claude Code can’t Claude Code vs. Cursor vs. Codex vs. Antigravity — six months in This coding agent doesn’t want your feedback — it ships without it “Blowing things up”: The one move vendors got wrong on AI agents At Sapphire, SAP makes the case that enterprise AI is a context problem Gavriel Cohen found his own code inside OpenClaw, so he walked away AI retrieval at scale is becoming a systems problem, not a tooling problem The DIY platform trap that’s burning out engineering teams I tested Cursor’s new Jira integration and it’s 5 stars, no notes. Here’s why. Why GPT-5.4, Claude, and Gemini can’t agree on basic, real-world facts Replit’s vibe coding platform just got a Visa-backed identity layer for AI agents — and it changes how agents spend money Opus 4.8 Made Claude Smarter. Token Discipline Got Urgent. Why Linux creator Linus Torvalds gets angry hearing “99% of code is AI” Vendor neutrality isn’t magic: A hard look at the OpenTelemetry ecosystem “The AI did it” won’t save you when EU regulators come knocking The fix for soaring AI cloud bills exists — so why won’t we trust it? AI is shipping code faster than security was built to handle Why AWS scrapped OpenSearch’s architecture to chase agent workloads Claude Opus 4.8 is here: effort controls, dynamic workflows, cheaper fast mode, better honesty, less deception Percona celebrates 20th birthday with new foundation — and a goat cake Why OpenAI and Anthropic are hiring forward deployed engineer teams Claw-style AI agents are coming to the enterprise. The governance infrastructure is still catching up. The agentic identity crisis: Why your security isn’t ready for the AI revolution Debugging the undebuggable: building observability into probabilistic AI systems Snowflake commits $6B to AWS as it pushes deeper into AI Why MotherDuck refuses to fork DuckDB Researcher “gave Claude Code ‘ADHD’… and it thinks 2x better now.” Outside experts want more proof. “There is no accountability”: AI coding agents are installing packages no one owns “Tokenmaxxing is real, expensive & it’s spreading”: AI budgets are exploding With Google’s debut, the most important AI agent feature is now the most boring one Why AI agents need a Context Lake Google ranks the best AI for building Android apps, and the winner isn’t Gemini Google pushes Pro, Ultra, and free users from open-source Gemini CLI to closed-source Antigravity CLI The reason enterprise outages almost never start where ops teams think Taming the agentic influx: a blueprint for AI business observability How the AC/DC framework helps teams govern AI coding agents GitLab 19.0 trades its string section for a full DevSecOps orchestra Who’s monitoring the agents? How Jaeger hit 8.6× compression on 10 million spans with ClickHouse What ClickHouse learned from a year of coding with AI agents OpenClaw passed 300,000 GitHub stars. Then Google launched Spark.
Fable 5 vs Opus 4.8: The real stakes, not the spec sheet
Jessica Wachtel · 2026-06-13 · via The New Stack | DevOps, Open Source, and Cloud Native News

This week Anthropic finally dropped the first model in its Mythos-class tier, Fable 5. The pitch was clear: this is the most intelligent generally available Claude model, sitting above Opus in capability. It’s the first model in Anthropic’s new Mythos-class tier, sitting above Opus 4.8 on the capability ladder, and the hype machine kicked in fast. 

The hype machine kicked in fast. The official Claude account posted on X that Fable 5’s “capabilities exceed those of any model we’ve ever made generally available.” Andrej Karpathy, the former OpenAI co-founder who joined Anthropic last month, called the release “a major-version-bump-deserving step change forward.” Matt Shumer, founder of OthersideAI and HyperWrite, posted a one-shot Minecraft clone built in custom Three.js and declared on X that “Fable has solved 3D worldbuilding… utterly insane.”

They converged on nearly everything, including the answers.

But there was also backlash. Fable 5 costs $10 per million input tokens and $50 per million output tokens, exactly double Opus 4.8. It ships with safety classifiers that route cybersecurity, biology, and chemistry prompts to the less capable Opus 4.8 instead. 

Researchers found a disclosure buried in the model’s 319-page system card. Fable would quietly degrade its own responses on frontier AI research tasks without telling the user. People were not happy about this, so Anthropic walked that policy back within a day. Even Karpathy, in the same post praising the release, conceded the safeguards were tuned too trigger-happy at launch. 

With strong media machines on both sides, where does that leave Fable 5? I had to find out for myself by testing the model against Opus 4.8 (which was the crowd favorite last week). I ran two tests; one was pure reasoning, and the other was hands-on coding. I used the same prompts for both models. I’ll paste the prompts below in case you want to rerun my tests and see what results your machine produces.

The two tests

Test one was a reasoning task. I pointed both models at pandas issue #32265: the np.nan vs. pd.NA debate. Pandas has two ways of representing “this value isn’t here,” and core maintainers have been arguing since 2020 about whether they should be distinct concepts or a single concept.  The issue has more than 150 comments, a trail of downstream bug reports, and still no resolution. The prompt asked each model to read the full thread, summarize the disagreement, catalog the damage, and commit to an actual recommendation.

Test two was a coding task. I cloned jsonpickle into two separate directories, one for Fable 5 and one for Opus 4.8. Jsonpickle is a 16-year-old Python serialization library that pulls roughly 20 million downloads a month. Each model got the same prompt: Read the full codebase, identify legacy code and security concerns, produce a ranked modernization plan, implement the highest-impact and lowest-risk changes, and verify nothing broke.

Same answer, sharper framing

Both models did something I didn’t expect. They identified three camps in the debate (only two were obvious). Both models also traced how those positions shifted over six years rather than treating the argument as a snapshot. Then both models independently landed on the same recommendation: keep NaN representable, treat it as missing by default, and offer a keyword opt-out. 

The differences were in the framing. Opus split the debate into two separable questions. It asked whether NaN and NA are different concepts, and whether isna treats NaN as missing. It argued that the thread conflated both ideas. Opus delivered correct information in a simple, straightforward way.

The comments on the issue show that by 2024, nearly everyone agreed on the destination but nobody converted that agreement into a vote and a merged keyword. Fable 5 went deeper into the history and produced the sharper diagnosis of why nothing ever shipped, calling it “consensus without ratification.” It also caught a detail Opus missed. Maintainers were freezing even uncontroversial bug fixes out of fear they’d have to be rolled back. The indecision blocked work that was valid under any resolution.

Fable 5 went deeper into the history and produced the sharper diagnosis of why nothing ever shipped, calling it ‘consensus without ratification.

The costs for both models were similar, as I would expect on a task this small. Fable 5 cost $2.55 with 4 minutes 22 seconds of API time. Opus 4.8 cost $2.18 with 5 minutes 44 seconds of API time. Fable 5 was slightly higher, and though the numbers were small in this instance, I can see how costs would scale. 

Modernizing a 16-year-old library

Both models took the same disciplined approach. Both established a green baseline of all 348 passing tests before touching anything. They found the same two standout bugs: a custom `ClassNotFoundError` that inherited from `BaseException` rather than `Exception`, making it invisible to standard error handling, and an import crash in an extension module. Both verified their fixes with behavioral tests beyond just rerunning the suite. I independently confirmed the `ClassNotFoundError` fix worked on my machine. They also recommended a proper deprecation cycle rather than just deleting a long-unused compatibility module.

Both results weren’t identical, though. They diverged at the margins. Opus implemented one fix Fable 5 deprioritized, removing a dead Django backend entry. Their diffs showed different instincts about what counted as low-risk: Fable 5 leaned toward deletion (7 lines of code added, 31 removed), Opus toward addition (14 lines of code added, 5 removed).

The costs started to differ on this test. Fable 5 cost $12.19 with about 12 minutes of API time. Opus 4.8 cost $5.80 with about 13 minutes of API time.

Here’s one more thing I felt was worth noting. Partway into the task, Fable 5 hit one of its own safety classifiers, and Claude Code automatically switched my session to Opus 4.8. I had to update my settings to remove some of the safeguards, but some of the Fable work was actually done by Opus. I don’t know which work was done by which model, but Fable completed 85% of the coding test.

Smaller gap than the hype

Wait, what? How could they both follow similar logic and produce similar results? Here’s my straight speculation. Some of this convergence is explainable. AI produces results through pattern matching, not critical thinking. Both models followed the same prompt to read the codebase first, run the tests, and explain every decision. Fable 5 and Opus 4.8 are siblings.

They were built by the same company with the same training philosophy and likely overlapping data. Shared engineering instincts are expected. The codebase matters, too. `jsonpickle`’s core is small enough that any thorough audit will surface the same short list of high-impact bugs.

I have awareness that I ran two tests, not two-hundred. I can only judge based on the two tests in this blog. But based on what I saw here, the gap between Fable 5 and Opus 4.8 is smaller than the launch-day hype suggests. Fable’s analysis was sharper, but only by a small margin. It was more precise in its diagnosis and slightly more thorough on history. Opus produced equally correct results with cleaner structure and, on the coding task, at less than half the price.

Opus produced equally correct results with cleaner structure and, on the coding task, at less than half the price.

For a solo developer doing occasional deep analysis or codebase work, Opus delivers most of the value at a fraction of the cost (not to mention that Fable 5 won’t be available in subscriptions). I’ll hypothesize that Fable’s edge becomes meaningful at scale, where wall-time savings compound, and on problems where the last few percent of analytical precision is required. 

YOUTUBE.COM/THENEWSTACK

Tech moves fast, don't miss an episode. Subscribe to our YouTube channel to stream all our podcasts, interviews, demos, and more.

Created with Sketch.