惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

aimingoo的专栏
aimingoo的专栏
量子位
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
S
Schneier on Security
Cisco Talos Blog
Cisco Talos Blog
T
ThreatConnect
J
Java Code Geeks
博客园 - 司徒正美
A
Arctic Wolf
T
True Tiger Recordings
C
Cybersecurity and Infrastructure Security Agency CISA
Cyberwarzone
Cyberwarzone
Know Your Adversary
Know Your Adversary
T
Threat Research - Cisco Blogs
V
Vulnerabilities – Threatpost
Recorded Future
Recorded Future
P
Palo Alto Networks Blog
The Hacker News
The Hacker News
The Register - Security
The Register - Security
S
Securelist
www.infosecurity-magazine.com
www.infosecurity-magazine.com
C
CXSECURITY Database RSS Feed - CXSecurity.com
Application and Cybersecurity Blog
Application and Cybersecurity Blog
I
Intezer
P
Privacy & Cybersecurity Law Blog
Scott Helme
Scott Helme
K
Kaspersky official blog
博客园 - 聂微东
Last Week in AI
Last Week in AI
V
V2EX
小众软件
小众软件
F
Fox-IT International blog
Martin Fowler
Martin Fowler
Apple Machine Learning Research
Apple Machine Learning Research
T
Tenable Blog
F
Future of Privacy Forum
Microsoft Security Blog
Microsoft Security Blog
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
腾讯CDC
Stack Overflow Blog
Stack Overflow Blog
C
Check Point Blog
阮一峰的网络日志
阮一峰的网络日志
GbyAI
GbyAI
T
Threatpost
I
InfoQ
P
Proofpoint News Feed
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
T
Tor Project blog
G
GRAHAM CLULEY
D
DataBreaches.Net

DEV Community

India’s Laws Were Not Built for AI — And Courts Are Filling the Gap skill-insp: A Skill That Scores Other Skills Clprolf Minimalist Building Strong Python Basics – Loops, Functions and Logic How to Choose the Right Tech Stack for Your Project I built a free multi-tab JSON editor — here's what I learned HTTP Headers Every Developer Should Know (2026) Building Cross-Platform Digital Products: Challenges and Best Practices Data Privacy in the Age of AI: How Product Teams Can Build Trust with Users What Would WordPress Look Like If It Were Designed Today? Why Backup Success Does Not Mean Database Recoverability Local AI Office Assistant That Never Sends Your Documents to the Cloud Building TaskForge: Translating Enterprise Chaos into an Open-Source Scheduler Tesla P40 in a Homelab: 24GB of Inference on a Budget Llama 4: Meta's Latest — Scout, Maverick, and the MoE Revolution George Hotz called AI code 'slop.' He's half right. Como Construir um Fluxo de Trabalho Baseado em Engenharia de Prompt e Automação We Audited Our Agent Tool-Call Traces. Half Our Eval Data Was Garbage. The Hidden Cost of Downtime: How SRE Error Budgets Protect National Economic Infrastructure Getting started with openHUMANS can be an exciting venture for developers looking to create innovative applications in the realm of human-ce Stack Overflow: A Powerful Community for Developers and Learners From Language Models to Humanoid Minds ✨ Road to Senior #2: How Computers Think in Numbers Why LLM debugging fails on fragmented repository context How to Deploy a LangGraph Agent on AWS Bedrock AgentCore An outreach kit for solo founders whose drafts can't hallucinate Open Satchel is live Amy Kwalwasser and the Growing Importance of Quantum Risk Modeling I Built ShellReq - A Native API Client for VS Code & Terminal If Microsoft and Uber can't afford AI coding, what chance do the rest of us have? MADCAP: Building a Multi-Agent Debate CLI That Argues With Itself So You Don't Have To Why most AI fails at IDOR (and how AMAS fixes it with causal reasoning) How to Audit a Laravel Codebase You've Inherited LangGraph 워크플로우 템플릿 (v34) BugBench: a developer origin story and practical guide for VS Code / Kiro users A solution to messy token systems for Next.js A NestJS reference app that proves the nest-native stack under realistic backend pressure Observability for AI Systems: Monitoring Drift, Hallucinations, and Reliability in Production I Thought “Data Analyst” Was the Whole Game… Then I Entered the Data Avengers Office 👀 Create and configure network security groups How to analyze the cost of Kafka? How I Shipped 2,500+ Commits With AI Agents Using a 12-Phase Workflow [Boost] We built MDCMS, a Markdown-first CMS for teams using AI agents Zero Heap Allocations at 1.18 GB/s: Deep Dive into ForgeZero 4.0.x The Minimum Viable Test Suite for Working with Agents Why Perplexity Started Citing My Blog: 5 Changes That Actually Worked Sync Supabase via OAuth: No Connection String Needed I asked three AI models the same API question. Only one had it right. Implementing Saga Pattern With Lambda Durable Function Why does AI forget what you said (and how to fix it) I built a daily Wordle-style game for AI tools - Here's how Mapping Polish company structures: querying KRS direct via API Built tmpdrop — a tiny self-hosted ephemeral file drop Running Local LLM - 0$ Personal Agentic AI Assistant - Part 3 LLD Object-Oriented Design: Interfaces & Abstract Classes (Designing Contracts) The Smaller Ship: Vitalik, the Ethereum Foundation's Restructuring, and What It Leaves for Investors Looking for 4 people to build something weird with me Building a Local-Only RAG System with Ollama and TypeScript The False Positive Tax: a 1:1 TP:FP analysis of eslint-plugin-security What's new in Data Preprocessor 1.5.x — R codegen, Robust Scaler, and a deadlock post-mortem How I self-hosted my Flask app on an old laptop for almost free I built a free DSA interview prep site because I was tired of the existing options I built an AI agent that migrates Next.js Pages Router to App Router Prisma Query Logging and PostgreSQL: Where the ORM Ends and the Database Begins Prisma query logging y PostgreSQL: dónde termina el ORM y empieza la base From Browser to Server : The Journey of an HTTP Request (Demystifying the Web’s Infrastructure) Santa Augmentcode Intent Ep.6 I Benchmarked 17 ESLint Security Plugins. Only One Found Every Vulnerability. How to Build a High-Performance Image Optimization Pipeline in 5 Minutes 50 Linux Commands Every DevOps Engineer Must Know Less Toil, More Flow - Automating the Path from Request to Implementation The Code Review Checklist I Actually Use How I run a small blog on Astro 5 + Content Collections Git: Best Practices for Professionals How IBM Bob Became My Everyday Coding Companion Solana Passkey Wallet: Replacing Seed Phrases with SIMD-0075 I built a small browser puzzle game about arrows I wrapped Claude Code in a zsh function. Here's every decision I almost got wrong. Mobile Game Optimization: A Unity Developer's Checklist Git: Best Practices for Beginners Three days I lost chasing a ghost that was already dead on disk Why Too Many Parts Hurt ClickHouse Performance Guardrails for Agent Output: Pluggable Validation Before and After LLM Calls Gemma Forge: Local AI Without the Setup Wall From Half‑dead Prototype to Local‑Only AI Medical Assistant: Rewiring MedClinic with GitHub Copilot Runninig a forkbomb in Jenkins What’s Actually Happening When You Use Git Preventing Recursive Tool Loops in LangChain Agents Building a Rock-Paper-Scissors CLI with TypeScript — Union Types, Conditionals, and Jest Your AI Coding Agent Wastes 80% of Its Context. Fixed That with Graph Theory. Why Flutter Has Become the Go-To Framework for Fintech App Development We built a scripting language just for AI agents. Here's why. Stop building AI inboxes. Build decision layers instead. Meme Monday Why I Built @editora/ui-react? Are AI tools the next level of abstraction in software development? Identity on Solana: Your Wallet Is Your Account One API Call Changed Everything The Internet Career Nobody Talks About Enough: What Is DevRel?
Messaging in the Age of AI
Pravin Khand · 2026-05-26 · via DEV Community

Messaging infrastructure has been boring for a decade. Queues, topics, exchanges — the primitives settled. Then AI agents arrived, and suddenly the assumptions that made messaging boring stopped holding. Messages are no longer just data. They are context. An agent will read your message, reason over it, call tools because of it, and generate responses whose token count you cannot predict at enqueue time. The transport layer that worked fine for deterministic services needs to be rethought — not replaced, but adapted.

This article is not about which message broker to pick. It is about what changes when the producer and consumer are both potentially non-deterministic reasoning systems, and what patterns actually hold up in production. The examples use Spring Boot and Apache Kafka because that is a stack I have seen work at scale, but the patterns apply across stacks.

1. Why AI Changes Messaging

Traditional messaging carries structured, bounded payloads. An order-placed event has a known shape: order ID, customer ID, line items, total. A payment-confirmed event carries a transaction reference. These messages are small (hundreds of bytes), predictable in volume, and idempotent by design — reprocess the same order event, get the same result.

AI-originated messages break all three assumptions. A single agent-to-agent message can carry a 100K-token context window — effectively a small novel's worth of reasoning state. Volume is bursty in ways that do not correlate with user activity: a multi-agent consensus round can generate 50 internal messages for a single user request. And idempotency is no longer free, because the same logical input can produce different reasoning paths on each retry.

The key consideration here is that messaging for AI systems shifts from "deliver this payload reliably" to "manage reasoning context at scale." Reliability still matters — it matters more — but it is joined by concerns that traditional messaging never had to address: token budgets, model latency variance, and reasoning trace integrity.

In the traditional model, each arrow is a bounded, schema-validated message. In the AI model, the arrow from Planner to Executor carries an entire reasoning state — and that arrow has a dollar cost measured in tokens. The messaging layer needs to know that.

2. New Workloads Created by Agents

Agents generate traffic patterns that look nothing like what your messaging infrastructure was designed for. It is worth cataloguing the new workloads explicitly, because each one stresses a different part of the system.

Planning outputs. Before an agent acts, it thinks — and the thinking produces structured output. A planner agent emits a plan object (goal, sub-goals, constraints, assigned agents) that downstream agents consume. These messages are medium-sized (2-8K tokens) and are the highest-leverage messages in the system — get the plan wrong, and everything downstream wastes tokens.

Tool-call results. When an agent invokes a tool — a database query, an API call, a code execution — the result enters the messaging fabric as a first-class message. These are unpredictable in size (a SQL query can return one row or a million) and must be chunked, summarized, or rejected before they blow out a context window.

Chain-of-thought traces. Some architectures persist the agent's reasoning trace as it streams — not just for debugging, but as context shared with other agents. A reasoning trace is verbose by design. Storing and forwarding it as a message requires treating it as a structured artifact, not a log line.

Multi-agent broadcast and consensus. Agents often need to reach agreement — which plan to execute, whether a tool call result is valid, whether a response meets policy. These consensus rounds generate fan-out message bursts: one agent publishes a proposal, N agents respond with votes or critiques. The messaging layer sees N+1 messages where a traditional system would see one.

In practice, this means your messaging system needs to handle message sizes spanning five orders of magnitude (bytes to megabytes), traffic bursts that do not follow any daily or weekly pattern, and consumers that may take seconds or minutes to process a single message — and retry it aggressively if they are unsure of the result.

3. Messaging Architecture Patterns That Actually Work

After observing agent systems in production across several teams, a set of patterns has crystallized. These are not speculative. They are what teams end up building after the first production incident.

Pattern 1: The Message Envelope

Every message in an AI system must carry metadata beyond a correlation ID. The envelope should include the token count of the payload, the model that generated it, the trace ID, the sender type (human, agent, tool), and an idempotency key if the sender is an agent. The consumer uses this metadata to make routing, quota, and deduplication decisions without parsing the payload body.

The companion project implements this as a Java record — see code/src/main/java/com/messaging/relay/model/MessageEnvelope.java:

public record MessageEnvelope<T>(
    String messageId,
    String traceId,
    String parentMessageId,
    SenderType senderType,
    T payload,
    int tokenCount,          // pre-enqueue estimate
    String modelId,
    Instant timestamp,
    String idempotencyKey,   // required for agent traffic
    Map<String, String> metadata
) { }

Enter fullscreen mode Exit fullscreen mode

Pattern 2: Separate Traffic Lanes

Human-to-agent, agent-to-agent, and agent-to-tool traffic have different latency tolerances, token profiles, and failure modes. Placing them on separate Kafka topics lets you apply different retention policies, compaction strategies, and consumer group scaling independently. An observability agent can consume from all three topics without competing with operational consumers.

Pattern 3: Idempotency Keys for Agent Traffic

Agents retry. It is inherent to their design — when a reasoning step produces low confidence, the agent re-executes. Without idempotency keys at the messaging layer, every retry becomes a new transaction, duplicating work and inflating costs. The pattern is straightforward: the producer sets a key derived from the logical operation (e.g., plan-{conversationId}-{stepNumber}), and the consumer deduplicates within a configurable window. Kafka's log compaction can assist here, but application-layer dedup is more reliable for agent workloads because the retry semantics are not strictly exactly-once in the Kafka sense.

Pattern 4: Chunked Context Delivery

Do not send a 100K-token context window as a single Kafka message. Break it into chunks — summary, relevant history, tool outputs, reasoning state — each with its own envelope metadata. The consumer can then decide which chunks to load into the model's context window based on relevance, recency, and token budget. This turns context assembly from a producer-side guess into a consumer-side decision.

The companion project's ContextChunker (see code/src/main/java/com/messaging/relay/chunking/ContextChunker.java) splits content by a configurable maxChunkTokens threshold. The KafkaConfig (code/src/main/java/com/messaging/relay/config/KafkaConfig.java) defines the four-topic topology with per-lane retention policies — 7 days for human traffic, 30 days for agent traffic (audit trail), 3 days with compaction for tool calls, and 90 days for the dead letter topic.

4. Token Limits, Rate Limits, and Quota Management

Rate limiting by request count made sense when every request cost roughly the same. An AI system can receive two messages that are both "one request" — one costs $0.002 and the other costs $0.30. The remedy is token-aware rate limiting.

The mechanism is simple: before enqueuing a message to Kafka, count its tokens using the same tokenizer the model will use. Apply rate limits in tokens-per-minute, not requests-per-minute. Partition the quota: 70% reserved for human-originated traffic (which must be responsive), 30% for agent-to-agent traffic (which can be delayed or degraded). When the quota for a partition is exhausted, apply backpressure — signal to the producer that it should slow down, batch, or degrade to a cheaper model.

The companion project implements this in TokenAwareRateLimiter (see code/src/main/java/com/messaging/relay/ratelimit/TokenAwareRateLimiter.java):

public RateLimitDecision check(String serializedPayload, MessageEnvelope<?> envelope) {
    int tokenCount = countTokens(serializedPayload);
    SenderType senderType = envelope.senderType();
    boolean allowed = quotaManager.tryConsume(senderType, tokenCount);
    if (allowed) return RateLimitDecision.allowed(tokenCount);
    return RateLimitDecision.denied(
        senderType.name() + " quota exhausted. " + backpressureHint(senderType), tokenCount);
}

Enter fullscreen mode Exit fullscreen mode

The QuotaManager maintains per-lane sliding windows resetting each minute, with configurable limits — defaulting to 600K tokens/min for human traffic, 200K for agents, and 100K for tool calls.

The key consideration here is that rate limiting in AI systems is not just about protecting infrastructure. It is about cost control. A runaway agent loop that retries 50 times before converging should not generate a surprise $15 charge. The messaging layer is the correct place to enforce this, because it sits between the agent's impulse to retry and the model provider's metering endpoint.

5. Observability, Auditing, and Operational Safety

Observability for AI messaging is not an extension of APM. APM tells you whether a topic is backed up. AI messaging observability tells you whether the messages flowing through it are producing correct, safe, and cost-effective outcomes. Those are different questions that require different instrumentation.

What to Log per Message

Every message passing through the system should carry a structured log entry — not as an afterthought, but as a first-class part of the messaging pipeline. The minimum fields: traceId, senderType, tokenCount, modelId, latencyMs, retryCount, idempotencyKey, and blockedCheck (whether a safety guardrail intercepted the message). These fields let you reconstruct any interaction from raw logs — what was sent, by whom, at what cost, with what result.

The companion project's ObservabilityFilter (see code/src/main/java/com/messaging/relay/observability/ObservabilityFilter.java) logs a structured JSON event per consumed message:

public void logConsumption(MessageEnvelope<?> envelope, String topic, long offset) {
    Map<String, Object> event = new LinkedHashMap<>();
    event.put("trace_id", envelope.traceId());
    event.put("sender_type", envelope.senderType().name());
    event.put("token_count", envelope.tokenCount());
    event.put("model_id", envelope.modelId());
    event.put("idempotency_key", envelope.idempotencyKey());
    event.put("topic", topic);
    obsLog.info(objectMapper.writeValueAsString(event));
}

Enter fullscreen mode Exit fullscreen mode

A separate passesSafetyCheck method runs before consumer processing, blocking messages flagged in metadata. In production, extend this with PII detection and content policy evaluation.

Message Lineage

A single user request can spawn a tree of agent messages: planner to executor, executor to tool, tool result back to executor, executor to critic, critic back to planner. If you cannot trace that tree, you cannot debug it. The trace ID is the spine of lineage — but it is not enough. Each agent should also record parentMessageId so you can reconstruct the tree topology. In practice, this means the message envelope (Pattern 1) carries a parentMessageId field, and the observability consumer builds the tree from the event stream.

Safety Guardrails at the Messaging Layer

Content policy enforcement, PII scrubbing, and tool-call authorization should not live solely in the agent logic. They should be applied at the messaging boundary — before a message reaches a consumer. A lightweight filter consuming from each topic can validate, block, or redact messages based on policy. The filter is not a model; it is a deterministic rules engine plus (optionally) a small classifier for ambiguous cases. When a message is blocked, the producer receives a structured rejection reason, not silence.

6. Real-World Use Cases and Anti-Patterns

Use Case: Customer Support Triage

A customer sends a message. A triage agent classifies it — billing, technical, account — and routes it to the correct specialist agent. The triage agent publishes to agent.messages with senderType=agent and a classification envelope. The specialist agent consumes, drafts a response, and routes it to a human for approval. The human sees the draft, the classification confidence, and the reasoning trace. The messaging layer carries all three.

Use Case: Code Review Pipeline

A PR is opened. A review agent comments on the diff. The comment is published to agent.messages. A human reviewer sees the agent's comment alongside the diff. The human can accept, reject, or modify the comment. The final review is a merge of agent suggestions and human judgment, with every message in the chain auditable. The messaging layer provides the timeline.

Anti-Pattern: The "Autonomous Everything" Trap

The most common failure mode I have seen is giving agents unbounded autonomy over messaging. The agent decides whom to message, what to say, and how often — with no human-in-the-loop validation. Inevitably, the agent finds an edge case, enters a reasoning loop, and floods the messaging layer with repetitive, costly messages. The fix is straightforward: cap agent-originated messages per conversation, require human approval above a cost or sensitivity threshold, and alert when an agent exceeds its lane quota.

Anti-Pattern: Prompt Chains as Messaging Protocol

The 2026 equivalent of connecting microservices with SSH tunnels. Teams string together LLM calls with raw prompt templates, passing unstructured text between agents. There is no schema, no versioning, no retry contract, no observability hook. When it breaks — and it always breaks — debugging means reading raw prompt logs and guessing which template produced which output. Use a proper message envelope and a proper transport. Kafka adds maybe 50ms of latency and saves hours of debugging.

Do: Structured Messaging Don't: Prompt-Chain Spaghetti
Schema-validated envelopes Raw prompt strings as message format
Versioned message types No versioning — template changes break downstream silently
Idempotency keys on every agent message No retry contract — agents retry, prompts drift
Trace context propagated end-to-end No observability — debugging = grep + guesswork
Token count in every envelope Token consumption unknown until the bill arrives

7. What to Avoid: Hype, Autonomy Theater, and Brittle Prompt Chains

The AI industry has a hype problem, and messaging architecture is not immune. Three flavors of nonsense are particularly common, and it is worth naming them so you can recognize them in a meeting.

Autonomy theater. Dashboards that show agents "autonomously" handling customer interactions while three human operators shadow-monitor every message. The messaging layer is configured to route everything to agents, but the agents' confidence is low on 80% of requests, so humans silently handle those via a side channel. The dashboard reports 95% autonomous resolution. The messaging logs tell a different story. Build the dashboard from the message logs, not from the demo script.

Prompt-chain spaghetti. Mentioned above, but worth calling out as its own category. The problem is not that prompt chains exist — they will always exist as a prototyping tool. The problem is promoting a prototype to production without replacing the prompt-chain transport with a proper messaging layer. It is the architectural equivalent of deploying a bash script as a production service and being surprised when it breaks at 3 AM.

The AGI bait-and-switch. "Our messaging architecture is designed for AGI-scale agent collaboration." No, it is not. AGI does not exist, and designing for it today means optimizing for constraints nobody has measured. Design for the workloads you actually have: LLMs with context windows, token budgets, and human-in-the-loop validation. When the technology changes, the messaging layer will adapt — because it is built on Kafka, not on a proprietary agent framework.

The key consideration here is that the best messaging architecture for AI systems today is boring. Kafka topics with clear schemas. Structured envelopes with metadata. Token-aware rate limiting. Trace-level observability. These are not exotic technologies. They are the same patterns that made microservices manageable, applied with slight adaptation to a new kind of producer and consumer. The teams that succeed will be the ones that resist the urge to build an "AI-native messaging platform" and instead build a solid messaging platform that happens to carry AI traffic.


Companion project: A runnable Spring Boot + Kafka messaging relay implementing the patterns described here — message envelopes, lane-separated topics, token-aware rate limiting, idempotency keys, and structured observability logging. Available in the code/ directory alongside this article.

Sources:

  • Confluent, "The Future of AI Agents Is Event-Driven"
  • Kai Waehner, "MCP vs. REST/HTTP API vs. Kafka"
  • Temporal.io, "What Agentic AI Borrowed from Microservices"
  • RisingWave, "Event-Driven Architecture in 2026"
  • Technode, "Beware the Distributed Monolith"
  • CNCF, "Cloud Native Agentic Standards" (2026)