惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

H
Help Net Security
T
ThreatConnect
SecWiki News
SecWiki News
F
Future of Privacy Forum
AWS News Blog
AWS News Blog
C
Cisco Blogs
A
Arctic Wolf
Vercel News
Vercel News
The GitHub Blog
The GitHub Blog
Scott Helme
Scott Helme
V
V2EX
博客园 - 叶小钗
阮一峰的网络日志
阮一峰的网络日志
K
Kaspersky official blog
G
Google Developers Blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
P
Privacy International News Feed
C
Cyber Attacks, Cyber Crime and Cyber Security
N
News | PayPal Newsroom
Schneier on Security
Schneier on Security
NISL@THU
NISL@THU
Microsoft Azure Blog
Microsoft Azure Blog
量子位
The Hacker News
The Hacker News
Stack Overflow Blog
Stack Overflow Blog
Security Latest
Security Latest
M
Microsoft Research Blog - Microsoft Research
Google Online Security Blog
Google Online Security Blog
博客园_首页
C
CXSECURITY Database RSS Feed - CXSecurity.com
I
InfoQ
Google DeepMind News
Google DeepMind News
Y
Y Combinator Blog
The Cloudflare Blog
Microsoft Security Blog
Microsoft Security Blog
Martin Fowler
Martin Fowler
Cisco Talos Blog
Cisco Talos Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
T
Troy Hunt's Blog
F
Fox-IT International blog
S
Security @ Cisco Blogs
博客园 - 司徒正美
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
C
Comments on: Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
L
LINUX DO - 最新话题
GbyAI
GbyAI
Project Zero
Project Zero
腾讯CDC
T
Tailwind CSS Blog

DEV Community

Kubernetes sem Cloud Provider (Parte 2): Criando Operators em Go para automação e self-service de plataforma AI Memory Needs an Authority Policy, Not Just More Context You've done tutorial after tutorial. Your GitHub is still empty. (Free 1‑page PDF, no signup) TypeScript 7.0: The Go Compiler That Makes TS 10x Faster Connecting Wallets the Right Way: wagmi v2 and EIP-6963 CSS Scroll-Driven Animations: No JavaScript Required Vite 8 + Rolldown: Rust-Powered Builds That Are 10–30x Faster Core Architectural Components of Azure My Skills How I Use AI as a Senior Engineer Construí um motor ATS determinístico porque estava cansado de adivinhar por que meu currículo era rejeitado SCS-Lab1 — CloudTrail: Trail + S3 + KMS + Log Validation LuisCore MCP server — daily syndication · 2026-05-25 Cursor vs JetBrains Rider for C#/.NET in 2026: which to pay for I built a local-first movie recommender with Corrective-RAG (cited explanations, hybrid retrieval, runs entirely on Ollama) Scaling to 1 Million Users : Load Balancing & Caching Strategies How the Events Table That Looked Right Killed Our Queue Three Failures My AI Memory System Caught — And the Flaw It Revealed in Itself dotnet Framework life cycle tool LangGraph 워크플로우 템플릿 (v41) I built a free image compression API — no signup, just curl Designing TikTok from Scratch — A System Design Deep Dive PREDICTION-20260525-0007: boredom-with-asymmetric-leverage [2026-Q3 through 2027-Q3] [Boost] How to integrate the QuickBooks Invoice API in 2026 How I Cut My Anthropic API Bill by 50% With a Local Python Tool Vibe Coding Problems: 7 Visual Bugs AI Code Generators Always Ship Chinese AI Models 2026: The Agentic Revolution, Hardware Independence, and What It Means for Global Developers The Quiet AI War Inside Your Browser The 12-Line Anti-Bot Trick That Saved Our Airdrop Snapshot From Sybil Farms Building a production-ready SaaS dashboard in Next.js 16 — Recharts, TanStack Table, dark mode, and collapsible sidebar Why 2026 Belongs to Agentic AI (And How to Build Your First Local Agent) It Was 2024 When We Tried to Outsmart the Treasure Hunt Engine RAG 시스템 실전 구축 (v40) I Found a Tool That Generates a Complete .NET 8 or Java Spring Boot API From SQL Schema in 30 Seconds I Added a 4th Agent That Audits My Other Agents. It Caught My Strategist Procrastinating for 3 Weeks. Streaming LLM responses to the browser in Go (Server-Sent Events) How We Publish and Manage Educational Admission Updates at Scale on DailyAxom A prompt is not a conversation. It's a component contract. How to Pass the EAA 2025 Accessibility Audit — A Step-by-Step WCAG Checklist Building an Autonomous MCP Lead Generation System with Hermes Agent LangGraph 워크플로우 템플릿 (v40) How I Built 100 Browser-Based Image Tools With No Server (FFmpeg WASM, PDF-lib, AI Background Removal) Nginx CVE-2026-9256, AI Prompt Injection Defenses, and Claude AI Data Leak Demo Scaling RAG for 10M+ Docs, .md Agent Memory, & Claude Code for Motion Graphics Diagram as Code with draw.io DuckDB Delta, PostgreSQL 17 Migration, & SQLite Optimization Deep Dives Windows 11 Microsoft Account Login Recovery During Internet Restrictions The Linux Commands You Forgot Exist (And Why AI Workflows Make Them Relevant Again) Spec-Driven Development Without an IDE: I Generated NestJS, Go, Spring Boot, Laravel, and Rust Apps From a Single PRD File Components are states Edge SEO y Middleware: Cómo Interceptar a Googlebot y LLMs antes de llegar a tu Servidor Context window exceeded at turn 23. Here's how I track token usage without a tokenizer. My Hermes agent spent $3 before I noticed. Now it can't. My Hermes agent's stop condition was a 40-line if/elif chain. I replaced it with 3 lines. My agent kept hitting context limits. This one function fixed it. Create and configure Azure Firewall Your Hermes agent's audit log is leaking customer emails. Here's a 100-line lib that fixes that. My agent kept forgetting what it was doing. A scratchpad fixed it. I replaced 200 lines of ad-hoc state management in my Hermes agent with one object. Per-Key Rate Limiting for Agent Tool Calls: Stop One User From Breaking Everything Composable Output Guardrails: Filter Agent Responses Before They Reach Users Sanitize Your LLM Message Lists Before Every API Call Thread a Run ID Through Every Agent Call So You Can Debug Anything Normalize Provider Error JSON So Your Agent Can Actually Handle Failures Priority Queue for Agent Sub-Tasks: Stop Processing Low-Priority Work First Static Lint Rules for Your LLM Prompts (Before They Hit Production) tool-call-budgets: Stop Runaway Agent Loops Before They Hit Your Invoice Step Through Your Agent's Failures Like a Debugger The Simplest Stop Condition: A Hard Cap on Agent Loop Iterations Score Your Agent's Responses With a 0.0-1.0 Rubric (No LLM Judge Required) Fix Bad Structured Output by Feeding the Error Back to the Model Building an effective Storyblok Tool Plugin with SvelteKit How to Get Your Renault / Dacia Radio Code for Free RAG 시스템 실전 구축 (v39) Retraction — scrml’s Living Compiler I built a fitness app where the AI roasts you for eating pizza (and hypes you when you PR) The Top SaaS Founder Communities on Discord (Beyond the AI Hype) I Built a Production-Grade Async Job Queue from Scratch — Here's Everything That Actually Happened How to watch SMS from multiple Android phones in one iOS app We Didn’t Want Another AI Wrapper — So We Explored a High-Speed Hermes Orchestrator for Engineering Crews Multi-tenant além do TenantId: problemas reais e aprendizados em sistemas .NET After failing 23 times, I am sharing How I Actually Prepare for a Tech Interview Every Single Time Now. I built an app that works like a nutritionist for your brain. Here's what happened in 7 days. GoBadge Dynamic: From Module Stats to Universal Badges LangGraph 워크플로우 템플릿 (v39) The git Commands You Forgot Exist (And Why AI Workflows Make Them Relevant Again) Six Levels of MCP Servers One container to replace Grafana + Loki + Tempo + Prometheus The Request/Response Cycle, HTTP, Auth, JWT, OAuth & Sessions — Explained Properly Python Week 3: We Stopped Repeating Ourselves (Loops!) Creating a Custom Grid Editor tool in Unreal Engine 我做了个付费 Telegram bot。Telegram Stars 实际给开发者多少钱,我算了一笔账。 I Got 96% Recall on LLM Hallucination Detection With No ML Model – Just 50 Lines of Python A practitioner's guide to getting more value out of AI coding: agent quality & token optimization How to Handle Telegram Albums in Telegraf I Built a Multilingual Spam Detection Dataset with 149K+ Messages Across 23 Languages How to Handle Telegram Albums in grammY RAG 시스템 실전 구축 (v38) Beyond Pip Install: Why Your AI Agent Needs a "Hermetic" Life-Support System to Survive
The 5-Layer Architecture Every Production Multi-Agent System Needs (And Why Most Skip Layers 4 and 5)
shakti mishr · 2026-05-26 · via DEV Community

Your Multi-Agent AI System Is Just a Dinner Party With No One in Charge

Picture this. Dinner guests arriving in an hour. Four people, each capable, each assigned a job.

One handles the grill. One sets the table. One makes the salad. One runs the music.

Thirty minutes in: the grill isn't heating because nobody opened the propane valve. The salad person is waiting on ingredients that were never passed over. The appetizers are cold because the reheating was supposed to happen ten minutes ago. The DJ paired to the wrong speaker and is now blasting techno into the baby's room.

No one was incompetent. Everyone knew their job. The whole thing fell apart because there was no system for coordination.

That is a near-perfect description of most multi-agent AI systems running in production today.

Each agent is capable — a coder, a researcher, a planner, a writer. But without shared memory, deliberate orchestration, and proper state management, capable agents produce incoherent results. The failure isn't in the intelligence of the individual agents. It's in the architecture that's supposed to make them a team.

This post breaks down the five-layer architecture that separates production multi-agent systems from expensive demos — and names the specific failure modes you will hit if you skip any of them.


The Three Ways Multi-Agent Systems Fail Before You Ship

Before the architecture, the failure taxonomy. These three problems appear, in some combination, in nearly every multi-agent system that didn't make it to production:

The Chaos Problem. No orchestration means agents act in parallel without coordination. One agent fetches data while another modifies it. One writes a response while another has already decided the query requires escalation. The outputs contradict each other, or worse — they corrupt shared state.

The Amnesia Problem. Agents can't access context from previous steps in the workflow. Each call starts fresh. An agent that just retrieved customer history has no way to pass that context to the agent writing the response — unless you explicitly build the memory layer. Most teams don't, until it's too late.

The Black Box Problem. Something goes wrong. You have no trace of which agent made which decision, what state the system was in, or what inputs triggered the failure. You can't reproduce it. You can't fix it. You can only watch it happen again.

If any of these sound familiar from your own experiments, keep reading — the architecture below is designed to close all three gaps.


The Five-Layer Architecture

Here's the framework: five layers that must all be functional before a multi-agent system can deliver consistent value in production. Think of them as load-bearing walls. You can skip one in a prototype. You cannot skip one in production.

┌─────────────────────────────────────────────────────┐
│          Layer 1: Orchestration                     │
│   Orchestrator · Classifier · Agent Registry       │
├─────────────────────────────────────────────────────┤
│          Layer 2: Knowledge                         │
│      Source Bases (RAG) · Vector DBs               │
├─────────────────────────────────────────────────────┤
│          Layer 3: Agents                            │
│   Specialized Agents · MCP Client · Local/Remote   │
├─────────────────────────────────────────────────────┤
│          Layer 4: Storage                           │
│  Conversation History · Agent State · Registry DB  │
├─────────────────────────────────────────────────────┤
│          Layer 5: Integration & Observability       │
│    MCP Server · External Tools · Trace · Evals     │
└─────────────────────────────────────────────────────┘

Enter fullscreen mode Exit fullscreen mode


Layer 1 — The Orchestration Layer: Your AI Conductor

This is the component that kills the dinner party chaos problem. Without it, you have a group chat where everyone shouts simultaneously. With it, you have a conductor who decides who plays, when, and with what information.

The orchestrator is responsible for:

  • Routing tasks to the correct agent(s)
  • Managing execution order and sequencing
  • Preventing duplicate or conflicting work
  • Synthesizing outputs from multiple agents into a coherent result Embedded within the orchestration layer is a Classifier — a component using NLU or LLM-based intent detection to understand what kind of request just arrived. "This needs the research agent." "This needs both the research and writing agents, in sequence." "This is ambiguous and requires a clarification step before routing."

The Agent Registry is the orchestrator's phonebook. It knows what agents exist, what capabilities each one exposes, and whether each agent is currently available. At small scale (2–3 agents), this is trivial. At production scale with dozens of specialized agents, a governed registry is the only way to keep routing reliable without hard-coding every path.

Microsoft's Agent Framework (MAF) — a fusion of Semantic Kernel and AutoGen — implements this pattern. But the concepts apply regardless of framework. LangGraph's node-based routing, CrewAI's role-based delegation, and custom orchestrators all need to solve the same problem: deterministic routing with dynamic capability discovery.


Layer 2 — The Knowledge Layer: Institutional Memory

Agents need two kinds of knowledge access: domain-specific content and semantic search over unstructured data.

Source Bases are where you store the specialized content that transforms general-purpose AI responses into expert answers. Policy documents. Product FAQs. Regulatory guidelines. Internal runbooks. The implementation varies — knowledge graphs, document repositories, fine-tuned models — but the goal is consistent: give agents the specific information they need to be right about your domain, not just right in general.

Vector databases enable semantic search over that content. When a support agent searches "issues with login after password reset," vector search understands the semantic relationship between authentication state and credential management. Keyword matching doesn't.

The critical retrieval decision that most teams get wrong: RAG vs. MCP is not a style preference. It is a functional distinction.

Use RAG when:
  - Content is static or semi-static (policy docs, FAQs, guides)
  - Search relevance is the primary quality lever
  - You need to synthesize across multiple documents

Use MCP when:
  - The agent needs real-time system state
  - The operation writes or modifies data
  - You need live API access (inventory, CRM, ERP)

Enter fullscreen mode Exit fullscreen mode

"How many units of SKU-123 are in stock right now?" is not a search question. It's an API call to your ERP. Routing it through RAG produces a stale answer. Routing it through an MCP tool call produces the live value.

The mistake teams make: reaching for RAG everywhere because it's simpler to set up, then spending three months debugging why the agent keeps giving wrong inventory data. The answer isn't better embeddings. The answer is the wrong retrieval pattern.


Layer 3 — The Agent Layer: Specialized Workers

Each agent in the system is a specialist. A finance agent. A coding agent. A research agent. A customer-facing support agent. Each is fine-tuned or prompted for its domain, with access to the relevant subset of the knowledge layer.

Agents communicate with external tools via MCP Client — a standardized interface that handles authentication, manages connections, and formats requests consistently regardless of the target tool. This abstraction is what lets you swap out the underlying tool (say, switching from one search provider to another) without rewriting every agent that uses it.

Local vs. Remote Agents: The Security Distinction That Matters

This is the architectural decision most teams don't think about until something goes wrong.

Local agents run in the same execution environment as the orchestrator. They communicate in-memory. They inherit the orchestrator's trust context. Fast, low-latency, straightforward to reason about.

Remote agents operate across a network boundary. They might live in a different security zone, be owned by a different team, or be an external service. This creates five security requirements that don't apply to local agents:

1. Authentication:  Verify the remote agent's identity before accepting its outputs
2. Authorization:   Enforce what data and tools the remote agent can access
3. Trust boundary:  Never assume a remote agent has the same permissions as the orchestrator
4. Data in transit: Encrypt everything crossing the network boundary
5. Audit:           Log every cross-boundary call with identity and payload

Enter fullscreen mode Exit fullscreen mode

Think of local agents as colleagues sharing an office — implicit trust, fast coordination. Remote agents are external contractors calling in. They need a badge, credentials, and an access review before you hand them anything sensitive.

Agent-to-Agent (A2A) protocol handles the standardized communication pattern for remote agents. Microsoft Entra Agent Identity provides the identity infrastructure on Azure. But the discipline is organizational, not just technical — you need policy decisions about which agents can call which other agents before you write a single line of orchestration code.


Layer 4 — The Storage Layer: Where Most Systems Fail Quietly

This is the layer that kills the amnesia problem. It is also, consistently, the layer teams underbuild first.

A production multi-agent system requires three distinct types of persistent storage:

Conversation History — Every interaction, decision, and intermediate output across the workflow. This is what lets an agent in step 7 know what the agent in step 2 found. Without it, each agent starts from zero. With it, context accumulates across the full workflow.

Agent State — The operational status and working configuration of each agent instance. If an agent crashes mid-task, agent state is what lets it recover — or lets a replacement agent pick up exactly where it left off. Without this, a transient failure means restarting the entire workflow.

Registry Storage — Persistent metadata about what agents exist, what capabilities they expose, what their current health status is, and what their recent performance looks like. This is what backs the Agent Registry in Layer 1.

The typical failure pattern: teams build agent state in memory. Works fine in development. Works fine in testing. Falls apart the first time an agent crashes in production, because the state was ephemeral and the workflow can't resume.

Build persistent storage from day one. Retrofitting it into a production system that was designed around ephemeral state is significantly harder than building it correctly at the start.


Layer 5 — Integration and Observability: How You Stop Flying Blind

This is the layer that kills the black box problem. It is also, almost universally, treated as an afterthought — and then desperately retrofitted after the first production incident that nobody could debug.

MCP Server — the standardized interface that external tools expose to your agents. Databases, APIs, web search, calculators, code execution environments. The MCP Server pattern means agents interact with external tools through a consistent interface, with authentication and audit controls baked in, rather than through a proliferation of custom integrations that each have their own auth model and failure mode.

Observability — real-time visibility into every agent action in the system:

  • Which agents are currently active?
  • What tasks are in progress?
  • Where are the latency bottlenecks?
  • What is the per-agent token consumption and cost?
  • Where do failures cluster? Without this, you're flying blind. You will know something is wrong when a customer complains. You will not know which agent caused it, what state the system was in, or how to reproduce it.

Evals (Evaluation Layer) — the feedback loop that makes your system better over time. How accurately are agents completing their assigned tasks? Where are they making errors? What types of inputs cause failures? This data feeds back into the orchestration layer and the knowledge layer, enabling continuous improvement.

Without evals: you know your system is broken when users tell you
With evals:    you know your system is degrading before users notice

Enter fullscreen mode Exit fullscreen mode

The evaluation layer is how you close the loop between production behavior and system improvement. Without it, you're not iterating on a system — you're waiting for complaints.


Why This Architecture Is Engineering Pragmatism, Not Research

What makes this five-layer model compelling isn't novelty. It's that it solves the concrete problems that show up in real production deployments:

Scalability — New agents can be added without rewriting orchestration logic. The registry discovers new capabilities automatically. The routing classifier routes to them without hardcoded rules.

Debuggability — Proper observability and persistent state mean that when something fails, you can trace exactly what happened. Every agent action is logged. Every state transition is recorded. Failure is reproducible.

Reliability — Persistent agent state means individual failures don't cascade. A crashed agent can be restarted and resume where it left off. The supervisor pattern in the orchestration layer catches local failures before they propagate.

Flexibility — Local and remote agent separation means different parts of the system can scale independently based on load and security requirements. The knowledge layer can be updated without touching the agent layer.


Key Takeaways

  • The failure is almost never in the model. The three failure modes — chaos (no orchestration), amnesia (no memory), black box (no observability) — are architectural failures. Upgrading to a better model doesn't fix them.
  • Build the storage layer first. Agent state persistence is the most commonly underfed layer and the hardest to retrofit. Design for it from day one, even if your first version is simple.
  • RAG and MCP are not interchangeable. One retrieves from static content via search. The other calls live systems via APIs. Using the wrong one for the job produces wrong answers that look right.
  • Remote agents need explicit security architecture. Authentication, authorization, trust boundary enforcement, and audit logging are not optional once you cross a network boundary. Plan the policy before you write the routing code.
  • Observability is not an afterthought. You cannot improve a system you cannot see. Trace every agent action. Measure token cost and latency per agent. Feed production failures back into your eval set within the week.

Closing Thought

The dinner party didn't fail because the guests were bad at cooking. It failed because there was no system — no shared plan, no handoff protocol, no one tracking dependencies.

Multi-agent AI systems fail the same way, for the same reason. Not because the models are weak. Because the architecture isn't there.

The question worth sitting with: which of these five layers is the weakest link in the system you're currently building?

If you've shipped a multi-agent system and hit one of these failure modes — or found a pattern that works better than what's described here — I want to hear about it. Drop it in the comments.