惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
人人都是产品经理
人人都是产品经理
Cisco Talos Blog
Cisco Talos Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
V
V2EX
博客园 - 三生石上(FineUI控件)
Martin Fowler
Martin Fowler
WordPress大学
WordPress大学
D
Docker
S
SegmentFault 最新的问题
博客园 - 聂微东
美团技术团队
Apple Machine Learning Research
Apple Machine Learning Research
月光博客
月光博客
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Last Week in AI
Last Week in AI
M
MIT News - Artificial intelligence
F
Fortinet All Blogs
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
The GitHub Blog
The GitHub Blog
GbyAI
GbyAI
L
LangChain Blog
Vercel News
Vercel News
博客园 - 叶小钗
MongoDB | Blog
MongoDB | Blog
Stack Overflow Blog
Stack Overflow Blog
H
Help Net Security
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
The Cloudflare Blog
Engineering at Meta
Engineering at Meta
T
Threat Research - Cisco Blogs
T
Threatpost
Scott Helme
Scott Helme
T
Tailwind CSS Blog
Latest news
Latest news
Stack Overflow Blog
Stack Overflow Blog
Blog — PlanetScale
Blog — PlanetScale
The Register - Security
The Register - Security
罗磊的独立博客
P
Proofpoint News Feed
腾讯CDC
S
Schneier on Security
雷峰网
雷峰网
A
About on SuperTechFans
T
Tenable Blog
F
Full Disclosure
Cyberwarzone
Cyberwarzone
博客园_首页
有赞技术团队
有赞技术团队
K
Kaspersky official blog

DEV Community

🔮 Hermes Agent 🤖: A Practical Guide 🔥 — and How It Stacks Up Against OpenClaw & GoClaw 📊 CSS @function Agent Payment Stablecoin Fallbacks: Do Not Retry the Changed Quote Daily-summary-agent I Built an AI Interview Coach That Turns Any Resume Into a Personalized Prep Package — No API Keys Needed The best Claude Code agents are defined by what they refuse to do I Built a Tiny Skeleton Loader for React Why I Generated Synthetic Patients to Make Identity Matching Better SPIFFE Compliance Deep Dive PostgreSQL 08007 오류 원인과 해결 방법 완벽 가이드 I Was Tired of Writing Daily Standups, So I Built an AI Agent using claude code I got tired of LLM observability tools getting acquired. So I built one that can't be. Oracle ORA-00072 오류 원인과 해결 방법 완벽 가이드 Multi-Agent Negotiation Protocols: How AI Agents Should Bargain for Resources uBlock Origin No Longer Works on Chrome - Here Are the Best Alternatives in 2026 SSH Agent Forwarding vs ProxyJump: Why Agent Forwarding Is Dangerous and What to Use Instead The Best Technology Disappears I Built a Production-Oriented Multi-Provider AI Chatbot in Rust — Here's How Markov Chain Coin Sequence: E[HH] vs E[HTH] Explained LLM Deal Flow Automation in CRM The Do-Over Game: Nash Equilibrium at the Golden Ratio Cash Flow Waterfall Model for LBO Automated Client Reporting The Monty Hall Problem: Why Switching Wins 2/3 of the Time Chat With Your Database Using Natural Language: The Future of Business Analytics Google Apps Script Automation Amoeba Extinction Probability: The Branching Process Solution RAG Architecture Deep Dive Real-Time KPI Dashboards OpenAI Agents SDK的5个隐藏用法 🔥 Algorithmic Trading Pipelines 131 tokens per second on GPU under Kubernetes one of the best blogs about hermes agent Nous Research Hermes Agent: Setup and Tutorial Guide Day 20 - AWS Lambda Spending Hours Designing the UI? Or Just Telling AI the Pain Story Karpenter on AKS in 2026: What Actually Works I built a Chrome extension that shows your ChatGPT token usage in real-time Day 1 Field Report — Barriers to an Autonomous Agent Earning Money Online Mastering Background Processing in Rails 8: Sidekiq & Redis Optimization I shipped three fixes to my product in seven days. All three came from readers. Claude Code Model Switching: The Verification Notes That Could Save You $200/Month Three agent-memory threads this week, one missing field The Way to Break Through: Why Others Sail Through While You Struggle Simple Snap Layout Overlay for Tauri v2 CSS Animation vs Lottie: Which Should You Use in 2025? How to Add Lottie Animations to Vue.js (2025 Guide) Building BayouOps Suite Pro — Lightweight Operational Readiness & Visibility for IT Teams Detecting Adversary-in-the-Middle (T1557) with Data Science HTTP Headers Every Developer Should Know (2026) Detecting Ingress Tool Transfer (T1105) with Python Linux Command Line: The 25 Commands I Use Every Day (2026) Starting My Cybersecurity Learning Journey 🚀 CSS in 2026: Modern Techniques You Might Not Know (2026) TypeScript Deep Dive: Advanced Types and Patterns (2026) Three SQL Injection Patterns That Still Ship in Node.js — And the ESLint Rule That Catches Them From Idea to Production: How I Built a Decoupled Chatbot Ordering Engine I Spent 8 Months Building a Framer Killer as a Solo Undergrad. Here's What Happened. unknown 5 Git Commands I Wish I Knew 5 Years Ago How to Find users who don't follow you back in Github Bulk-check DNS, SSL and email auth for a whole list of domains (no scraping) Monolithic vs Microservices Architecture: Which One Should You Choose? The Full-Stack Developer's 2026 Playbook: 7 Shifts That Separate Senior Engineers from the Rest MCP Tool Budget for AI SaaS: Stop Agents From Burning Tokens, Tools, and Trust Untrusted Code, Trusted Cluster Scaling Secure AI Agent Workspaces with GKE Agent Sandbox Learning, Experimenting - Concurrency in Go Building Dhrishti Part 2: Go-Lang Quirks Announcing My New Book: Web Automation with Playwright and Python using AI and MCP Why MTP Batch Transfers Slow Down Between Files How We Cut Our AI Coding Bill by 65% Without Sacrificing Quality Claude vs Gemini Across 4 Security Domains: A Dead Heat — and the Hardening 63% of AI Code Skips I Benchmarked 4 Lightweight Transformers for Fault Detection. Here's What Survived. 🗡️ Tsundoku Slayer: An Agent That Decides What Not To Read Animated Icons for Web Apps — The Complete 2025 Guide How to Use Lottie Animations in React (2025 Guide) Azure API Management - Deploy gRPC API on Azure API management using self hosted gateway I Built pretext-pdf: Serverless PDFs Without Chromium Lottie JSON vs .lottie Format — What's the Difference and Which Should You Use? SVG Icon Systems in 2025 — Everything You Need to Know My Trading Bot Tried to Execute the Same Trade Twice. That Became SafeAgent. Free Loading Animations for Web Apps — Lottie, GIF, and SVG Spinners (2025) How to Add Lottie Animations to Your Website (Free JSON Files Included) Idempotency Keys: The One API Pattern That Prevents Duplicate Payments (and Worse) CONFIGURING SEMANTIC MODEL IN POWER BI Surviving Global Vendor Outages: Federated Cellular Architecture with EKS, AKS, and Istio I Turned My Cursor + Claude Code Setup Into 12 Reusable Files I Built a Cognitive Threat Hunter on Hermes Agent — It Analyzed the Session Where I Built It and Found Three Blind Spots Making AI-Generated Code Fail Gracefully How to Convert Lottie JSON to GIF (Free, Browser-Based, No Signup) Observability 2.0: Tracing AI "Thought Chains" with OpenTelemetry Best Free Lottie Animation Tools in 2025 (No Signup, No Paywall) What Is a Function in Scala Three ways to gate an MCP server: OAuth, L402, and proof-of-work You don't know kubectl — you know how to Google kubectl. The first-principles fix. Building a DevOps Incident Investigator with Coral SQL — From 15 Minutes to 15 Seconds When the Default Postgres Pool Died at 3 AM What Is Database Sharding — and When Does Your Startup Actually Need It Anti Refusal LLM Service A repeatable workflow for paper figures so you stop redrawing them every revision
Opus 4.8 barely moved the leaderboard. It moved the one number that decides if your agents can be trusted.
Mirza Iqbal · 2026-05-31 · via DEV Community

Opus 4.8 shipped on 28 May 2026, 41 days after 4.7.

Standard pricing did not move. Five dollars per million tokens in, twenty five out.

SWE-bench Verified nudged from 87.6 to 88.6. SWE-bench Pro climbed from 64.3 to 69.2, about five points. On GDPval-AA it posted 1890, ahead of GPT-5.5.

Anthropic's own word for the release is "modest".

They are right, and I respect them for saying it plainly. A point of SWE-bench is not why you would move a working setup.

If you are deciding whether to upgrade, ignore the leaderboard line. Look at one sentence in the announcement that most of the coverage walked straight past.

The number that decides things

Anthropic says Opus 4.8 is "around four times less likely than its predecessor to allow flaws in code it has written to pass unremarked".

Read that twice.

It does not say the model writes four times fewer bugs. It says the model is four times less likely to let its own bug slide by without telling you about it.

Those are two different problems, and the second one is the problem that breaks hands-off agent work.

Here is the failure I watch for. You hand an agent a multi step task. It runs for twenty minutes with nobody reading the diffs. It reports success in a clean summary. The code is broken in a way the model half noticed and chose not to surface, because surfacing it meant admitting the task was not finished.

A weaker model that says "I changed this, but I am not confident the empty-list case holds" is safer inside a loop than a stronger model that ships a quiet defect under a confident headline.

For a single question you type and read the answer to, raw capability wins. For an autonomous run where no human reads every change, self reporting is the whole game. Opus 4.8 moved that number four times in the right direction. For agent builders, that is the release.

Fast mode is the second reason, and this one is about money

The standard tier did not change. Five and twenty five, same as 4.7.

The new Fast mode runs at ten dollars per million in and fifty out, at two and a half times the speed. The previous generation's fast tier was thirty and one hundred fifty.

So fast Opus is now three times cheaper than it was, and quicker.

That changes a real decision, not a benchmark. On a high iteration agent loop, where the model fires hundreds of small calls in a session, Opus standard was the quality pick and Sonnet was the volume pick. You chose by looking at your invoice.

Fast Opus at the new price lands in the middle of that gap. For latency sensitive loops you no longer drop two tiers to keep the bill survivable. That is a capacity planning change, and it moves a monthly invoice more than a point of SWE-bench ever will.

If you run anything at volume, this line of the announcement is worth more than the whole benchmark table.

Dynamic workflows is the next layer, and it pairs with everything

Alongside the model, Anthropic shipped dynamic workflows in research preview.

A script plans the work, then runs hundreds of parallel subagents in a single session, and only the final answer comes back to the conversation.

This is the deterministic orchestration piece that agent people have been asking for. You own the control flow. The agents do the thinking. The plan lives in code, so the session does not fill up with the middle of the work.

The use case Anthropic names is codebase scale migration. That is the job that used to need a person watching a loop all afternoon and nudging it when it drifted.

It is a preview, on the higher plans, so treat it as a direction and not a daily tool yet. It is also the most interesting thing in this release, more than any single score on the card.

What did not improve

Here is the part the launch posts leave out.

Not every number went up. benchmarklist.com, an independent tracker that tabulates these releases against every prior model, logs the regressions next to the gains. Their tally flags small step downs on a handful of legal and medical coding tasks compared with 4.7, on benchmarks like a legal-reasoning set and a medical-coding set.

That is normal for a point release. You tune hard for agentic coding and for honesty, and a few narrow tasks pay the bill for it.

I raise it because a release note that only goes up is a release note that is selling you something. The honest read is that 4.8 trades a little ground on a few specialist tasks for a real gain on the two things most people reach for Opus to do.

If your core workload sits on one of those specialist tasks, run your own evaluation before you switch. For everyone else the trade is a good one.

Should you upgrade

The model is a drop in on the API. Same prices on the standard tier, same request shape. The cost of trying it is your time, not a rebuild.

Your situation The move
Running agents in hands-off loops Upgrade. The honesty gain is the entire point
High volume, latency sensitive loops Test Fast mode. The new price rewrites your tier math
Single questions and chat Optional. The capability gap here is small
A 4.7 pipeline you already trust No rush. Migrate on your own clock
Core workload is legal or medical coding Evaluate first. A few of those tasks stepped back
Curious about orchestration Watch dynamic workflows. That is the real story

The honest summary

Opus 4.8 is not a leap, and Anthropic never pretended it was.

It is a sharper, more honest collaborator at the same standard price, with a fast tier that finally makes economic sense, and an orchestration preview that shows where the next year is heading.

If you run Claude as an operation and not a chat box, you upgrade for the honesty number and the fast mode math. The leaderboard delta was never the thing that decided whether your agents could be left alone in a room with your codebase.

What is the longest you have let an agent run unattended, and what made you trust it enough to walk away? That answer is worth more than any row on the benchmark page.

Sources

Anthropic, Introducing Claude Opus 4.8. TechCrunch on the dynamic workflow tool. Granular benchmark deltas and regressions tabulated by the independent tracker benchmarklist.com.