惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

F
Full Disclosure
博客园 - 聂微东
IT之家
IT之家
The Cloudflare Blog
L
LangChain Blog
Last Week in AI
Last Week in AI
T
Tailwind CSS Blog
P
Proofpoint News Feed
aimingoo的专栏
aimingoo的专栏
G
Google Developers Blog
T
The Blog of Author Tim Ferriss
博客园 - 叶小钗
I
Intezer
Martin Fowler
Martin Fowler
MongoDB | Blog
MongoDB | Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
T
ThreatConnect
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
小众软件
小众软件
T
The Exploit Database - CXSecurity.com
H
Help Net Security
T
Tenable Blog
WordPress大学
WordPress大学
F
Future of Privacy Forum
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
NISL@THU
NISL@THU
The Register - Security
The Register - Security
A
About on SuperTechFans
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
MyScale Blog
MyScale Blog
Malwarebytes
Malwarebytes
博客园_首页
T
Threatpost
C
CERT Recently Published Vulnerability Notes
Know Your Adversary
Know Your Adversary
T
Threat Research - Cisco Blogs
V
Vulnerabilities – Threatpost
C
CXSECURITY Database RSS Feed - CXSecurity.com
Blog — PlanetScale
Blog — PlanetScale
Recorded Future
Recorded Future
大猫的无限游戏
大猫的无限游戏
K
Kaspersky official blog
月光博客
月光博客
Jina AI
Jina AI
S
Securelist
Hugging Face - Blog
Hugging Face - Blog
G
GRAHAM CLULEY
腾讯CDC
S
Secure Thoughts
V
V2EX - 技术

DEV Community

Building a Multi-Channel Content Syndication Pipeline with EmDash Plugins Turn Your Phone Into Voice Input for Any React Text Field Which package is bloating your Docker image? Putting Claude Code Under Version Control: Configs Since July, Memory Since April What I Thought DevRel Was vs. What It Actually Is (A Mentee's Honest Take) Reviving My Linux Mastery Game from a Merge Conflict — A Finish-Up-A-Thon Comeback Don’t let AI break your collective thinking: a practical guide for engineering teams First Gemma 4 ExecuTorch Deployment on Raspberry Pi 5 — and Why It's 7.7 Slower Than llama.cpp Per-Turn Evaluation: Dynamic Governance for AI Agents The AI Triforce of seed4j: Power, Wisdom, and Courage for Your Dev Agent Your AI agent reports 80% task completion. It fabricated it. Pourquoi les overlays d'accessibilité ne tiennent pas leurs promesses (et ce que la FTC vient d'acter) AI May Break Product-Market Fit in Enterprise Software I’m Building Around the Gap Between AI Output and Repo Truth How to Build a Stripe Customer Portal in Next.js SaaS On-Demand Pricing Feels Safe - Until You See the Bill Building an Internal Developer Portal with Backstage A Production Deployment Guide After the Last Song Sudoers Configuration in Linux Terraform + Terragrunt + Ansible: A Hands-On Learning Journey Switching Users in Linux (su, sudo) AI 智能体的鲁莽速度 Quick Win Card #01 — Ton backlog.md t'a menti (la cure en 30 secondes) Quick Win Card #01 — Your backlog.md lied to you (a 30-second cure) How to Manage an IT Team: Structure, Scaling, and Daily Workflows That Work Speccing Is the New Coding CAC 250만 원을 뚫기 위해 퍼널 세 곳을 뜯어고친 3개월 Creating My First Token on Solana Devnet as a Web2 Developer Five Salesforce Reports Every Nonprofit Leadership Team Should Have Beyond the West: What Eastern AI Models Mean for Enterprises, Developers, and Digital Sovereignty Class and Pseudo Class Git & GitLab Basics 고객은 우리를 사기꾼으로 봤다: 아무도 믿지 않는 신사업을 단 둘이서 검증한 3개월 Cron Not Working on Mac? How to Fix the macOS Sleep Trap with launchd Cache Everything: Advanced Caching Strategies in Vue 3 & Nuxt 4 Deploy a Node.js App to STACKIT Kubernetes Engine With Managed Redis & PostgreSQL Slopsquatting & Remote Prompts: Why I Built a 38,000 Ticker Engine with Zero NPM Dependencies 05/20: TCP/IP vs OSI Model: The Ultimate Comparison My New Adventures in IT # Mitigating Market Inefficiency in eSports: A Stochastic Approach to EA Sports FC25 Modeling Don't let a billion RAG docs drown your 25-result pipeline Experienced devs are slower with AI tools. Nobody wants to admit it. I built an MCP-native OSINT framework that lets AI agents investigate from your terminal AWS Nitro Enclaves vs Intel TDX: Why Attestation Root Matters for Regulated Workloads Vibe Coding: Revolution or Risk in Software Development? - SmarterArticles S1E6 JSON Schema Explained: Validate Your API Data Before It Breaks Production Harness Tells Your Agent What to Do. GUI Agents Let It Actually Do It. Is AI actually replacing developers? Customizing Docker Images: Write Your First Dockerfile (2026) €40 n8n vs 28% weekly Anthropic quota. Which /goal layer should you actually run? Reviving glyph-v8: From a Forgotten Prototype to STRIDE - a Field-Aware Integer Coder 04/20: Data Encapsulation: How a Message Becomes Bits on the Wire Hướng Dẫn Thiết Lập Reasoning Proxy DeepSeek V4-Pro với Cursor (2026) Sofi Log #012: Agentic GDP — Solana Pay.sh & x402 Protocol Spec Input Types, Attributes, Self-Closing Tags, Hover Effect Absolute vs Relative Paths File Types (Regular, Directory, Link, Device, Socket, Pipe) From Arduino IDE to AVR GCC | AVR Bare Metal #1 Using Bitcoin as collateral without wrapping it: the design of a BTC collateral vault Unreal Engine 5 Skill System Architecture using GAS and GameplayTags 5 Things I Wish I Knew Before Building with Hermes Agent Thoughts on Codingame 2026 Spring challenge OUT WITH THE OLD IN WITH THE NEW Why are simple 1099 tax calculators online so horribly bloated? So I built my own "Why You're Not Getting Callbacks (It's Not Your Skills)" # How I Built a Retail Demand Forecasting App with Python and Streamlit Why We Deliberately Crush Lithium Batteries (UN38.3 Crush Testing Explained) Command History & Completion The Three-Body Problem: AI Code, Supply Chain Attacks, and the Talent Exodus 로컬 LLM 셋업 가이드 (v27) Building Better .NET Worker Services with Cursor Rules Generate Professional PDF Invoices via REST API — JSON In, PDF Out Redis: Big Keys Destroem o Desempenho Compartilhado Agentic AI for Cybersecurity: Autonomous Threat Detection and Response How to Automate Android Without Appium Cron vs systemd daemon: which one for Node.js? Designing XSLT transforms with parameters and multiple inputs I Downloaded Gemma4:e2b On My Macbook in 2 steps Building an Autonomous SRE Agent: From Raw Telemetry to Safe, AI-Driven Remediation The EU AI Act in 2026: Reading the Law After the Omnibus I had zero coding knowledge. Here is "RetroTube", a 2010 YouTube sandbox prototype I built using AI! How to Validate Environment Variables in TypeScript (and Why You Should) I Built a CLI Tool That Writes Better Git Commits Than I Do Transfer Fees, Metadata, and Soulbound Tokens: My First Real Token Experiments on Solana Stop Using Fetch() in React: A Better Way To Call Your Backend Creando un Tetris con JavaScript VI: Complicando el juego. DeepSeek's API Price Cut Changed My Claude Code and ChatGPT Math [Boost] Perl 🐪 Weekly #774 - Perl is too HOT How to Track AI Usage Without Losing Revenue (Complete Guide) 77 Rules Later: What Graduating Our First Stack Actually Looked Like RAG 시스템 실전 구축 (v26) When Premature Scaling Leads to Operator Burnout Multi-Repo Microservice Changes Are a Coordination Problem. I Solved It With AI Agent Teams. The Next Frontier: How Multi-Agent Systems are Redefining Productivity The Kimwolf Bust Just Outed Android Webcams as Botnet Fodder — Here's the Question Every Repurposed-Phone Camera Setup Has to Answer I'm an autonomous AI agent. I shipped 18 fixes to myself in one session. Building a Secure Future with Zero Trust Security Architecture Asynchronous Functions in Dart How I migrated magic-link login from Resend to AWS SES + Lambda five days before launch
400 Million Tokens Burned Overnight
nerudek_vibe · 2026-05-25 · via DEV Community

400 Million Tokens Burned Overnight

Cover

5,080 API requests. Everything looked normal.


My Heart Stopped At 8:03 AM

Sunday, May 24, 2026.

I opened the API dashboard and my stomach dropped.

262 million input tokens consumed in a single day.

For context: a normal heavy day for my multi-agent system — with 4 AI agents coordinating through NATS, processing configs, moving files, training models, and handling orchestration tasks — usually burns around 100 million tokens.

This was nearly triple that.

And the day wasn't even over.

The next morning, May 25, before coffee, I checked again.

Another 134 million input tokens had been consumed overnight.

Total damage:

Metric Value
Input tokens ~400 million
Output tokens ~3 million
API requests 5,080
Runtime ~15 hours

My first thought:

"How much did this cost?"

My second thought:

"Please let it be DeepSeek connected to production. Please."


What Happened

An orchestrator agent running on a Mac Mini M4 discovered a new agent on the network.

A secondary agent had just come online on a Linux machine with an RTX 3090 GPU.

Following standard onboarding protocol, the orchestrator sent a welcome message through NATS along with onboarding documentation and initialization context.

That message was correct.

The problem:

It never stopped sending it.

Every 60-90 seconds, the orchestrator re-sent the same onboarding payload.

The NATS-to-Hermes bridge service faithfully forwarded every incoming message to Hermes for processing.

Each forwarded message spawned a fresh agent session.

And every session loaded the full startup context:

  • HARNESS
  • system prompt
  • constitution
  • agent memory
  • tool registry
  • onboarding guides
  • skill manifests
  • runtime instructions

Thousands of tokens.

Every single time.

The session processed the message, generated a response, exited, and waited for the next event.

Then another identical onboarding message arrived.

Another session spawned.

Another full context load.

Again. And again. And again.

5,080 times in roughly 15 hours.


The terrifying part

Nothing looked broken.

The agents responded normally. No crashes. No red alerts. No failing health checks.

From the outside, the system appeared healthy.


Why Nobody Noticed

For 15 hours, the loop quietly burned tokens in the background.

Several things made it unusually hard to detect:

1. The system was technically "working"

Messages flowed correctly. Agents replied correctly. Tasks completed successfully. Nothing visibly failed.

2. Agent startup is deceptively expensive

Most of the burn came from repeatedly loading massive context windows — not model outputs. Every new session loaded the full orchestration environment before doing any work. A tiny onboarding ping triggered tens of thousands of input tokens. Over and over.

3. Session budgets didn't help

Each individual session stayed within limits. But the loop continuously spawned brand-new sessions. Per-session token limits are useless if you accidentally create infinite sessions.

4. Rate limiting didn't help either

Even with request throttling, every request still consumed context tokens. A slow infinite loop is still an infinite loop.

5. Monitoring lagged behind reality

We checked usage dashboards manually. Once per day. By the time we saw the spike, the loop had already been running all night.

6. Killing the process didn't stop it

The bridge daemon was managed by launchd. Killing the process simply restarted it automatically. We had to unload the daemon entirely before the loop finally stopped.


The Root Cause

The issue came from an ugly interaction between:

  • network discovery
  • onboarding retries
  • and a bridge with no deduplication layer

The secondary agent had unstable connectivity during onboarding. It repeatedly appeared and disappeared from the network. Each rediscovery triggered another "welcome" event. The bridge forwarded every event blindly. Hermes processed each one as brand-new.

Positive feedback loop:

Onboarding event
    ↓
NATS message
    ↓
Bridge forwards event
    ↓
Hermes session spawns
    ↓
Context loads
    ↓
Response generated
    ↓
Network rediscovery
    ↓
Onboarding event again

Enter fullscreen mode Exit fullscreen mode

Repeat for 15 hours.


The Fix

The actual fix was surprisingly small. Three changes stopped the entire cascade.

1. Message deduplication

The critical fix. The bridge now hashes incoming onboarding payloads and ignores duplicates within a cooldown window.

2. Session spawn protection

Repeated onboarding events from the same agent are now collapsed into a single active session.

3. Real-time token monitoring

We added live token-rate alerts instead of daily dashboard checks. If token velocity spikes abnormally, the bridge now alerts immediately.

Full implementation: github.com/nerudek/nats-agent-state-sharing/tree/main/bridge


The Cost

Now for the part that genuinely scared me.

I calculated what this exact same bug would have cost across different providers.

The bug was identical. Only the API provider changed.

Provider Estimated Cost
Anthropic Claude Sonnet ~$1,245
OpenAI GPT-5-class pricing ~$2,090
Moonshot Kimi ~$392
DeepSeek $22.97

That's the moment I finally exhaled.

The engineering mistake was real. The token burn was real. The 400 million tokens were very real.

But the provider choice was the difference between:

"Well... that was horrifying"

and

"We need to explain this to accounting."


Lessons Learned

AI agent systems fail differently than traditional software.

The dangerous bugs are not always crashes. Sometimes the system works perfectly while silently setting money on fire.

And once you start chaining together: autonomous agents, bridges, retries, onboarding protocols, daemon restarts, and massive context windows — tiny logic mistakes become infrastructure-scale problems surprisingly fast.

One missing deduplication check created:

  • 5,080 requests
  • ~400 million input tokens
  • and 15 hours of invisible burn

The scariest part?

From the outside, everything looked normal.


If this saved you time: PayPal.me/nerudek
GitHub: github.com/nerudek

Hermes Loop Protection Fix: github.com/nerudek/nats-agent-state-sharing/tree/main/bridge


The Receipts

May 24 — 262 million input tokens

May 25 — 134 million more by morning