惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

F
Full Disclosure
Recorded Future
Recorded Future
T
Tenable Blog
S
Securelist
C
CERT Recently Published Vulnerability Notes
T
Threatpost
S
Schneier on Security
A
Arctic Wolf
The Hacker News
The Hacker News
C
CXSECURITY Database RSS Feed - CXSecurity.com
Know Your Adversary
Know Your Adversary
P
Privacy International News Feed
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
The Register - Security
The Register - Security
Cisco Talos Blog
Cisco Talos Blog
AWS News Blog
AWS News Blog
K
Kaspersky official blog
T
True Tiger Recordings
T
Threat Research - Cisco Blogs
V
Vulnerabilities – Threatpost
P
Palo Alto Networks Blog
T
The Exploit Database - CXSecurity.com
小众软件
小众软件
B
Blog
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
Microsoft Azure Blog
Microsoft Azure Blog
Cyberwarzone
Cyberwarzone
C
Cybersecurity and Infrastructure Security Agency CISA
T
Tor Project blog
Spread Privacy
Spread Privacy
Malwarebytes
Malwarebytes
P
Proofpoint News Feed
F
Fox-IT International blog
F
Fortinet All Blogs
P
Privacy & Cybersecurity Law Blog
G
GRAHAM CLULEY
量子位
Latest news
Latest news
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
博客园 - 叶小钗
Project Zero
Project Zero
T
Tailwind CSS Blog
N
Netflix TechBlog - Medium
Martin Fowler
Martin Fowler
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
I
Intezer
博客园_首页
腾讯CDC
H
Hackread – Cybersecurity News, Data Breaches, AI and More
D
Darknet – Hacking Tools, Hacker News & Cyber Security

DEV Community

Building a DAG Workflow Orchestration Engine from Scratch in Python PicoCTF Web Challenge Writeup: Failure Failure An AI Agent Wiped a Production Database in 9 Seconds. What Engineers Must Design Before Shipping. The Fire That Reached the Backups: The OVHcloud Strasbourg Data-Centre Fire, 2021 Why HEIC to JPG Is Still a Massive Problem for iPhone Users? How I Fixed a CSS Animation Bug in an Open Source React Library Why Your API Gateway Might Be Your Biggest Compliance Liability Liquidity Pool Analyzer — Zero-Dep Python CLI for Solana DEX Data What AI Leaders Are Really Worried About in 2026 LLM-as-judge variance broke our DPO training signal for 3 weeks I Tracked Revenue Per User for 6 Months — Here's Why ARPU Beats ARPPU for Channel Decisions 2026 I stopped trying to build a “productivity app.” How to Build a HIPAA-Compliant Healthcare App in React Native (2026) Veltrix Was Losing Events in Plain Sight—Heres the Flame Graph That Proved It Anthropic Self-Hosted Sandboxes + MCP Tunnels: Enterprise AI Agents That Keep Your Data Behind Your Walls Understanding Closures in JavaScript: A Complete Beginner Guide Most expense trackers expect perfect English. But real users type in Hindi, Hinglish, mixed language, and natural conversation. So I built https://vitmora.com to understand the way people actually type. I Got Tired of Messy Bookmark Managers, So I Built My Own HackTheBox: DarkZero Writeup The seam I Built an AI Expense Tracker That Understands the Way People Actually Type I built a Chrome extension after my kid turned my YouTube feed into Roblox Building a Production MCP Server in Laravel How Our Event-Driven Pipeline Blew Up Because We Trusted the Default Config Looping in Python I Built a Retro Gaming Console Using ESP32 and OLED Display 🎮 ORA-00255 오류 원인과 해결 방법 완벽 가이드 Why Hytale Treasure Hunt Servers Throttle at 100 Players (And How We Fixed It) Product Update: Post-Quantum Cryptography meets <1s Kubernetes Syncs ECS vs EKS vs Lambda: How to Pick the Right AWS Compute Service (2026) Shopify fired the webhook. My server never processed it. Here's how I catch that now. Understanding React: Components, JSX, Virtual DOM, and More Stage 0.2 — Operating System Fundamentals I Didn’t Need Another Markdown App. So I Built This Instead. ClickUp Alternatives for Solo Freelancers Who Want Less Complexity The Gods That Ate the Engineers "My AI Agent Kept Missing Buttons, So I Used Windows UI Automation" Manejo de errores en Go - Primeros pasos The Treasure Hunt Engine Blew Up My Inbox at 3 AM Curing Telegram Information Overload: How I Automate Deal Hunting with AI and MTProto Read-Modify-Write isolation in NoSQL, part 2: When the invariant spans multiple aggregates. The Code Runs. The System Runs Too. How I secured my FastAPI app - 6 vulnerabilities fixed in one session with gstack /cso The Day the Treasure Hunt Engine Stopped Beeping The bf16 grad accumulator that killed our SDXL LoRA training I Still Have Nightmares About the Time Our Hytale Server Crashed Under Load Stop Using Global State: Master Localized React Context ⚡ Build a Private AI Search on Your Device: Local RAG in the Browser Stop Freezing Your API: Async Email Delivery in Laravel An AI Agent Wrote and Sold Her Own Prompt Collection Solana Validator Stake Checker CLI — Track Decentralization from Your Terminal Mouse Unlock!—no password, just a secret click pattern Reloading Textures in Blender Is a Pain — I Made a Free Add-on for That AI Agents Don't Log In. That's Why Your Entire Security Stack Is Flying Blind Claude Cowork has changed managing a Figma design system library forever Bayesian Knowledge Tracing in 37 lines of Python — how NumPath models what a student knows Two Cross-Platform Bugs in Our Go CLI (And How We Fixed Them) Two Knowledge Hierarchies: Structuring Context for AI Agents and LLMs The Day Treasure Hunt Broke My Caches—And How We Fixed It From Figma to production React, with AI in the loop Built a Sentiment Analysis Web App – My First Full-Stack ML Project I built a zsh cleanup script for macOS dev machines — and learned more than I expected AI 3D tools need product evals, not benchmark faith AI Prompt Injection Defense: Building Effective Strategies in 5 Steps Treasure Hunt Engine Blew Up When We Asked It To Grow I Tried Self-Hosting Open Source AI Models. Here's Why I Went Back to APIs. Enterprise vs Startup AI APIs — The Architectural Decision Nobody Talks About I Cut My AI API Bill from $420 to $28/Month — Here's Exactly How ENS Resolver CLI — Look Up Any ENS Name from Your Terminal 🚀 My Journey Begins on DEV Community — Building Startups, Communities & AI-Powered Solutions Using AI Chat Is Not the Same as Using an AI Agent The Cache That Bled — How We Turned Veltrix Event Config From Silent Killer to Silent Savior Designing a Modular Wiring Harness for Multi-Function Vehicle Trackers Reviving a 12K+ Star Abandoned Library: toastr-next v3 🍞 The Day the Language Became the Bottleneck winston vs pino in 2026: A Production-Tested Comparison HTB: MonitorsFour - Full Walkthrough Fixing your writing tone with a Chrome extension Experimented to fork AWS infra graph and simulate what breaks before you deploy Industrial SEO at 100 Pages/Week: My n8n + Claude Code + RAG Stack I Built a Kubernetes Alternative. It Changed My Perspective on Complexity. Chronos vs Toto: Zero-Shot Forecasting Benchmark Results Edge-Cached Localhost Tunnels: How to Give Stakeholders a Production-Fast Preview Directly from Your IDE Radiation-Proof Flash Storage Could Be the Missing Layer for AI Data Centers in Space AI Learning Roadmap: Where to Start if You're a Complete Beginner I built 6 free dev tools to skip the signup walls — here's what I learned How to Set Realistic Goals for an Open Source Project? How I Built an Indonesian NLP Parser That Understands Warung Owners, Then Abandoned It Keyboard shortcuts that fixed my editing flow I Built an AI-Native Productivity System Instead of Another AI Wrapper LogicNodes MCP bridge: Connecting Claude to real-world utility I Built a Stateful Research Agent Inside a Sandbox. Here's What the Numbers Actually Looked Like. From Credentials to Domain Admin: Support Machine Writeup logfx v1.0.0: One Logger for Development and Production The Day the Garbage Collector Slowed Down a Real-Time Treasure Hunt ARTIST: RL-Powered Tool Use for LLM Agents Explained Breaking the RL Flywheel: From Manual Grind to Instant Debugging When Your Treasure Hunt Engine Becomes a Scavenger Hunt for DevOps Nightmares BoxAgnts Introduction (3) — WebAssembly Sandbox Engineering a 100% Client-Side, $0 Server-Cost Document
5 ways AI agents quietly die inside n8n production
Mirza Iqbal · 2026-05-27 · via DEV Community
[ERROR] node "GPT Decision" execution_id=9f...3a status=failure
  cause: structured_output_schema_violation
  retries: 6  total_runtime_ms: 184302
  workflow: invoice-router-v3 owner: ap-team

Enter fullscreen mode Exit fullscreen mode

That node ran 47 times today.

Each run burned three retries before n8n gave up.

Cost on the OpenAI side was real money.

Cost on the human side was a finance ops lead manually re-routing 47 invoices because the agent never told anyone it was looping.

The article most teams read this week is "agents hallucinate". That problem is solved by structured output. The real failure modes in n8n agent production are different. Here are the five that actually fire on weeknights.

1. The silent retry storm

n8n retries on error by default. An LLM node that 429s under load retries with the same prompt, same model, same payload. Each retry costs money and produces the same failure.

The fix is to gate retries on the error class.

// In a Code node before the LLM call
const lastError = $input.first().json.error;
if (lastError?.code === 'rate_limit_exceeded') {
  // exponential backoff, not n8n's flat retry
  await new Promise(r => setTimeout(r, 1000 * Math.pow(2, $runIndex)));
  return $input.all();
}
if (lastError?.code === 'invalid_request') {
  // schema problem. retrying will not help.
  throw new Error('halt invalid_request, escalate to human queue');
}
return $input.all();

Enter fullscreen mode Exit fullscreen mode

The agent now distinguishes "transient" from "terminal". The retry storm dies.

2. Tool-call drift across long workflows

A multi-step agent flow calls tool A, then tool B, then tool C. Each tool returns slightly different JSON shapes. By step C the agent is reasoning over a structure that no longer matches its system prompt.

I have seen this in 6-step Clay-to-n8n-to-Salesforce flows. The Salesforce step fails because the contact object got mutated three steps back and nobody normalized it.

The fix is a normalization node between every tool call.

Tool A => Set node (rename + strip) => Tool B => Set node (rename + strip) => Tool C

Enter fullscreen mode Exit fullscreen mode

It looks redundant. It is not. The Set node enforces the schema your agent's system prompt promised. If schema and reality diverge, you get a typed error early, not a wrong invoice routed at midnight.

3. Silent payload truncation inside n8n's HTTP wrapper

n8n's OpenAI node and the HTTP Request node both have request body size limits that are NOT documented anywhere obvious. When the prompt plus tool history plus retrieval results cross about 950 KB, the HTTP body gets truncated by the n8n proxy, the LLM sees a malformed request, and the agent returns a vague refusal.

This one bit me twice on two different client projects. The agent worked fine on test inputs and failed mysteriously on real production payloads that had longer chat history.

The fix is to chunk payload BEFORE the LLM node, not inside the LLM provider.

// In a Code node sized for n8n's 1MB practical limit
const MAX_KB = 800;
const history = $input.first().json.history || [];
let total = 0;
const trimmed = [];
for (let i = history.length - 1; i >= 0; i--) {
  total += JSON.stringify(history[i]).length;
  if (total > MAX_KB * 1024) break;
  trimmed.unshift(history[i]);
}
return [{ json: { history: trimmed } }];

Enter fullscreen mode Exit fullscreen mode

Trim from the tail. Newest messages survive. Oldest get summarized in a separate node and re-injected as a single system message.

4. The credentials-rotation blackout

Enterprise rotates API keys quarterly. n8n credentials are encrypted at rest and decrypted by the credential service on every workflow run. If a key rotates and the credential update has not propagated, every active workflow fails silently to a 401, and n8n's default error path swallows the auth failure as a "node error" with no alert.

You find out when revenue dashboards stop refreshing.

The fix is a credentials health check workflow that runs every hour and pings every active integration.

Cron (every 1h) =>
  HTTP GET /credentials/all (n8n REST API) =>
  Loop over each credential =>
    Trigger a dry-run call against the provider =>
      If 401, send to ntfy.sh on the on-call channel

Enter fullscreen mode Exit fullscreen mode

That single workflow saved one of my clients 11 hours of debugging in March when their Slack OAuth key rotated and the marketing team's lead-routing flow went dark.

5. Memory poisoning across runs

If you store conversation memory in a Postgres or Redis-backed n8n credential and reuse it across runs, one bad agent output can poison every subsequent run.

I saw this happen with a customer service flow. A user typed a prompt-injection payload. The agent's "memory" node wrote the payload into Redis. Every subsequent customer for that ticket inherited the injection. Three hours later, the agent was telling people their refund was approved when it was not.

The fix is to validate memory on read, not only on write.

// In a Code node before the agent's memory-recall step
const recalled = $input.first().json.memory;
const SUSPICIOUS = /(ignore.*previous|you are now|system\W|admin\W)/i;
if (SUSPICIOUS.test(recalled)) {
  // memory is contaminated. drop it and start fresh.
  return [{ json: { memory: '', alert: 'memory_quarantined' } }];
}
return $input.all();

Enter fullscreen mode Exit fullscreen mode

The memory pattern works fine right up until the day someone feeds a poisoned input. The validate-on-read step is two lines and prevents a class of failure that costs trust to recover from.

What dies in production is not what you tested

Hallucination shows up in dev. These five patterns show up in production. The split matters because the dev-time fixes (structured output, retries, evals) do not catch any of the five above.

If you are running agents in n8n right now, the cheapest thing you can do this week is add a normalization node between every tool call and a credentials health-check workflow. Those two changes alone caught roughly 70 percent of the silent failures in the last enterprise rollout I audited.

What is the failure mode that bit you that you do not see written about anywhere? Drop a snippet in the comments. The pattern library only grows when more people share the n8n flows that actually broke.