惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

N
News and Events Feed by Topic
Malwarebytes
Malwarebytes
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
C
Cybersecurity and Infrastructure Security Agency CISA
F
Future of Privacy Forum
C
Cisco Blogs
T
The Exploit Database - CXSecurity.com
A
Arctic Wolf
S
Securelist
K
Kaspersky official blog
S
Schneier on Security
T
ThreatConnect
T
Tenable Blog
Spread Privacy
Spread Privacy
T
True Tiger Recordings
AWS News Blog
AWS News Blog
F
Fox-IT International blog
量子位
T
Threatpost
V
Vulnerabilities – Threatpost
C
CERT Recently Published Vulnerability Notes
Cisco Talos Blog
Cisco Talos Blog
GbyAI
GbyAI
宝玉的分享
宝玉的分享
腾讯CDC
G
Google Developers Blog
aimingoo的专栏
aimingoo的专栏
Cyberwarzone
Cyberwarzone
有赞技术团队
有赞技术团队
S
SegmentFault 最新的问题
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
V
Visual Studio Blog
U
Unit 42
雷峰网
雷峰网
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
Simon Willison's Weblog
Simon Willison's Weblog
O
OpenAI News
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
The GitHub Blog
The GitHub Blog
The Register - Security
The Register - Security
MyScale Blog
MyScale Blog
小众软件
小众软件
A
About on SuperTechFans
Last Week in AI
Last Week in AI
Y
Y Combinator Blog
博客园 - 三生石上(FineUI控件)
美团技术团队
Google Online Security Blog
Google Online Security Blog
P
Proofpoint News Feed
MongoDB | Blog
MongoDB | Blog

DEV Community

800G to 400G Breakout: How to Scale 400G Networks with 800G Ports 터미널 AI 에이전트 구축 (v20) Topical Authority Architecture Inside Hermes Agent's Session Memory: What X-Hermes-Session-Id Actually Does How Logs Travel From Your EKS Pod to Datadog The Hidden Journey Inside / Kubernetes Is it safe to connect my bank account to AI? No Room — The World of Aying (8/12) Fossils — The World of Aying (10/12) Familiar Stranger — The World of Aying (9/12) Being Seen — The World of Aying (7/12) Gemma 4: The 128K Multimodal Powerhouse in Your Terminal How to Consolidate Your QA Toolstack: A Practical Buyer's Guide The Thank-You Email Almost Nobody Sends (And Why That's Your Edge) Schema Types 2026 Idempotency Keys: The API Safety Net You're Probably Not Using How to let Claude see my Plaid bank data Kiro Did It: Build a Simple Portfolio Website with Kiro IDE | From Prompt to HTML Prototype Islands of Commerce: What Marketplace Founders Can Learn from 60 Years of Island Biogeography React Pointer Hooks: Hover, Long-Press, Double-Click, Scratch, and Click-Outside Without the Bugs Engineering decisions for my video call tool VBScript Still Lives: How a Custom Go VM Brought Classic ASP to Linux and Mac What Happens When You Teach Old Scripting Languages New Runtime Tricks? I Tested 6 AI Coding Assistants for a Month. Here's What Actually Works. Extendscript Still Has Life Afriex Webhook Integration Guide: Signature Verification, Event Handling, and Production Best Practices The Blind Alleys of Veltrix Configuration How an ESP32 Turned a LEGO WALL-E Into a Real Working Robot The Flawed Promise of Real-Time Event Handling SSH Login Taking Forever? Check Your DNS Settings Found 897 Fake Followers on DEV.to. Here's How I Proved It. Retry logic, Kafka consumer lag, and the hidden failure pattern that Kubernetes won’t catch WebMCP Might Be the Most Important Announcement at Google I/O 2026 Build a Secure API with Rails 8 - Part-3: Auth Controllers I A/B tested 4 LLMs on the same 500 queries. The results surprised me. Google I/O 2026’s Smartest Developer Release Wasn’t a Model, It Was the Runtime - Managed Agents in Gemini API OSS Monthly Recap: What My Daily Commit Challenge Taught Me About Open Source “Culture” GemmaNotes Cognitive Debt: AI Is Building Your Systems. Do You Actually Understand Them? GeekNews Frontend Weekly Deep Dive - 2026-05-25 I Built a Universal Silicon Loader That Runs on Any SOC (No Bootrom Exploit) Docker容器化部署Node.js应用最佳实践 I Put a Neural Network in a Thermometer — Then It Got Out of Hand Building MGZon: Developer Portfolio + AI Bot + Social Network (9 min demo) Bearing Life (L10): What the Catalog Number Really Tells You Longhorn Volume Health: The Gap Between 'Healthy' and Actually Working Stop Prompting. Start Specifying: How Spec-Driven Development Fixes AI Coding TIL a PowerPoint file is just a zip — so I converted .pptx to Word entirely in the browser 로컬 LLM 셋업 가이드 (v18) Cx Dev Log — 2026-04-24 github's agent audit api is the boring feature that matters # From Teaching Code to Building Real-World Applications Vivado 2026.1 and Linux: why this decision matters beyond the headline Vivado 2026.1 y Linux: por qué la decisión importa más allá del titular ORA-00206 오류 원인과 해결 방법 완벽 가이드 Entidades finas e composição: o design que escolhi para a nova plataforma 10 Open Source Tools Every Developer Should Know 🔥 SSH Config File Mastery: Turning `~/.ssh/config` Into a Productivity Tool I tried to create a programming language... in python I Replaced 70MB Node.js Log Viewer with a 172KB Zig Binary I Turned npm outdated into a CI Gate — Here's How Don't fall for the Claude Mythos hype Vestige: A Gemma 4 Brain Tracker That Won't Blow Smoke Up Your Ass Gemminate: Transforming Static Textbooks into Interactive Learning Journeys with Gemma 4 Where Did All the Code Playgrounds Go? I built PROOFER - Privacy first Chrome extension that proofreads your texts using Gemma 4 I Automated My Entire Digital Product Business on a $13/Month GCP VM. Here's the Architecture. Beginner's Mind in Engineering and AI How I use AI agents to turn ideas into public demos I Built a Quotation Generator for Kenyan Street Welders Using Gemma 4's Vision The Math Behind Neural Networks — Explained Like Nobody Did for Me 🧨 Understanding TPC with IEEE802.11h What I’m Starting to Look for in Engineers An npm Downloads Comparison Chart in 300 Lines of Vanilla JS — Nice-Tick Math and API-Direct Fetch Vitreus: Local-First Spreadsheet Intelligence with Gemma 4 Transfer Fees, Metadata, and Soulbound Tokens: A Tour of Solana Token Extensions I got tired of re-explaining my codebase to ChatGPT — so I built a VS Code extension Revisiting My Phone AI After Gemma 4: The Upgrade I Didn't Know I Needed I built a privacy-first PDF merger in 7 hours — here's the stack and the lessons Google I/O 2026 made me ask an uncomfortable question: are we still coding, or are we managing builders? SSR with JavaScript: Escaping Node.js Clunkiness with AxonASP My CKA Exam-Day Experience: What Went Right, What Went Wrong, and Lessons Learned Gemma 4 Soft Tokens: The Rise and Fall of 16x16 Words ⚡👀 Two weeks ago, I built a private AI brain on my phone using Gemma 4. Yesterday, Google dropped a new variant that made everything I built feel like a beta test. 256M parameters. MoE architecture. Apache 2.0 license. I broke down what changed and why it mat I got tired of clicking through the Stripe dashboard, so I built a CLI Getting Data from Multiple Sources in Power BI: A Practical Guide to Modern Data Integration Google Is No Longer Just a Search Engine I built GemmaPod - A truly composable and portable AI agent solution powered by your local LLM Gemma 4 E4B caught three planted fabrications in 50 seconds — on a laptop, no cloud How to build an AI-powered content moderation pipeline for user comments Running Gemma 4 on a Modest Machine: Unsloth vs LM Studio vs llama.cpp vs Ollama AI Makes Building Cheap. Our Product Architectures Still Assume It’s Expensive. I built an in-browser Roku TV remote with ~80 lines of TypeScript. Here's how Roku's ECP API actually works The Direction of Blame babbled notes: a sound-to-music agent for people who could not make music before How I Built a Live SQL Workshop Where Students Can't Break Anything Rescuing a Stranded Protocol: Re-Skinning Legacy Code for the Trestle DeFi Flywheel SOLID Heuristics Reveal Incomplete Domain Knowledge — Nothing More AllasCode Intitute / FullAgenticStack: The Intent-Based Router Introducing LogicGrid — Multi-Agent AI Orchestration for .NET AI Prompt Injection, Drupal SQLi Exploitation, and Nmap for Hardening
[I Ran an AI Agent for 30 Days Straight — Here's the Boring Engineering That Made It Work]
Tijo Gaucher · 2026-05-25 · via DEV Community

Most "AI agent" demos die at the same place: a tweet, a screenshot, a five-minute video. Then the founder closes the laptop and the agent quietly stops existing.

I wanted to know what it actually takes to keep an agent running for a month — not "working in a Jupyter notebook for an afternoon," but on for 30 consecutive days, processing real inputs, surviving real failures, without me babysitting it.

The headline answer: the model isn't the hard part. The hard part is the eight unglamorous engineering decisions you make before the agent ever generates a token.

Here's what shipped, what broke, and what fixed it.

The setup

One agent. Scheduled job, runs every 6 hours. Job: pull a queue of unread customer support emails, classify them, draft a reply for the human to review, and tag the thread. Stack: OpenClaw runtime, Sonnet as the model, Postgres for state, Docker container with a restart policy. Boring on purpose.

Why this specific job? It's the kind of workload that actually has a budget attached. Nobody pays $99/mo for a chatbot that does nothing. They pay for a thing that processes 200 emails a night while they sleep.

What broke (and what fixed it)

1. The agent forgot it was a process, not a script

First crash: day 4. The container OOM-killed itself loading a 300-email batch into memory at once. The agent had been written like a Python script — "read everything, process everything, write everything."

Fix: queue plus worker, with explicit checkpointing after every N items. If the worker dies on item 47, the next run picks up at item 48. This is so obvious in retrospect that it's embarrassing, but every first-pass agent I've reviewed makes this mistake.

The pattern that works:

for batch in queue.pull(limit=10):
    for item in batch:
        result = agent.process(item)
        db.write_result(result)
        db.mark_done(item.id)  # commit point

Enter fullscreen mode Exit fullscreen mode

The commit point is the whole game. No commit point means the agent has to redo work on restart. No redo means lost work. There is no third option.

2. The retry loop became a money fire

Day 9: I woke up to a $40 inference bill from the previous night. The agent had hit a model timeout on one weird input, retried infinitely, and burned through tokens.

Fix: exponential backoff with a hard ceiling. Three retries, then dead-letter the item with the full input attached so I can debug. The dead-letter queue is the unsung hero of agent reliability — it turns "agent failed silently" into "agent failed loudly, in a place I can see."

3. State drift across restarts

Day 14: the agent started replying with stale facts. Turned out it was caching the customer's previous order details in memory, and a container restart wiped the cache mid-conversation. Replies were referencing orders the customer had already received and forgotten about.

Fix: treat in-memory state as a lie. Anything the agent needs to remember across runs goes in Postgres before the next inference call, not after. If I cannot survive a kill -9 between any two lines of code, I have not built a long-running agent — I have built a long-running prototype.

4. The "it works on my laptop" infrastructure tax

Day 19: I tried to hand the agent off to a non-technical operator to run on their own. They could not. They did not know what docker compose up meant, did not have a Postgres instance, did not want to learn what an environment variable was.

This was the moment I stopped pretending the model was the product. The product is the operating environment — the thing that makes the agent run for someone who never opens a terminal. That is the actual moat, and it is where most of the "AI agent" market is going to consolidate over the next 18 months.

If you're building this layer yourself, you have signed up to operate infrastructure for the rest of your life. If you don't want to do that, managed OpenClaw hosting exists for the same reason managed Postgres exists. You don't run your own database server in 2026 unless you have a very good reason.

The boring uptime math

After the four fixes above, the agent ran the remaining 11 days without intervention. Final tally:

  • Total scheduled runs: 120
  • Successful completions: 117
  • Dead-lettered items: 14 out of roughly 6,000 (0.23%)
  • Human interventions required: 3 (all dead-letter triage, took ~10 minutes total)

That ratio — three interventions in 30 days — is the only number that matters to the buyer. They don't care which model you used. They care whether the thing keeps running when they go on vacation.

What this means if you're shipping an agent

The next wave of agent winners aren't going to be the ones with the cleverest prompts. They're going to be the ones who treat the agent as a long-running process — checkpointing, dead-lettering, observability, snapshot/rollback — and ship that whole stack as the product.

If you want to skip the four lessons I learned the dumb way, the Builder Sandbox tier gives you a MicroVM with sudo and live port-forwarding, and the Dev Agent tier adds observability and snapshot/rollback. That's the layer you actually want when there's a paying customer's workload on the other end.

Either way, the lesson is the same: agents don't fail because the model is bad. They fail because nobody wired up the boring stuff. Wire up the boring stuff. The model is the easy part.