惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Google DeepMind News
Google DeepMind News
Martin Fowler
Martin Fowler
T
Threatpost
云风的 BLOG
云风的 BLOG
博客园 - 司徒正美
C
CERT Recently Published Vulnerability Notes
V
Vulnerabilities – Threatpost
Help Net Security
Help Net Security
Project Zero
Project Zero
博客园 - 聂微东
博客园_首页
T
Tor Project blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
V
Visual Studio Blog
人人都是产品经理
人人都是产品经理
The Register - Security
The Register - Security
Latest news
Latest news
K
Kaspersky official blog
L
LINUX DO - 热门话题
P
Proofpoint News Feed
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
美团技术团队
C
Cyber Attacks, Cyber Crime and Cyber Security
A
Arctic Wolf
aimingoo的专栏
aimingoo的专栏
J
Java Code Geeks
F
Full Disclosure
Recent Announcements
Recent Announcements
SecWiki News
SecWiki News
C
Cybersecurity and Infrastructure Security Agency CISA
F
Fortinet All Blogs
The Hacker News
The Hacker News
Apple Machine Learning Research
Apple Machine Learning Research
NISL@THU
NISL@THU
The GitHub Blog
The GitHub Blog
量子位
Hugging Face - Blog
Hugging Face - Blog
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
P
Palo Alto Networks Blog
T
Troy Hunt's Blog
O
OpenAI News
T
Threat Research - Cisco Blogs
博客园 - Franky
Hacker News - Newest:
Hacker News - Newest: "LLM"
A
About on SuperTechFans
C
Check Point Blog
Hacker News: Ask HN
Hacker News: Ask HN
AWS News Blog
AWS News Blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
T
Tenable Blog

Redis

Real-Time Fraud Detection: Latency, Features & Scale Context window in AI: why every token is a budget decision Connecting to Redis Cloud with AWS PrivateLink vs. VPC peering | Redis Redis Data Integration in Redis Cloud is now GA in AWS | Redis Why AI Misses Business Context & How Teams Fix It AI Reasoning Explained: Why Context Matters Semantic Layer vs Context Layer: Key Differences Redis array data type: How it works and when to use it Context Graphs vs. Vector Search: When RAG Falls Short What’s new in two – May 2026 edition Redis 8.8 performance improvements: Faster string, hash, streams, SCAN & more Redis 8.8: New array data structure & open source features How Conflict-free Replicated Data Types power active-active database replication Context Orchestration: What It Is & How It Works Context Compaction for AI Agents: A Complete Guide Prompt Bloat: Causes, Costs & Fixes for LLM Apps Agentic Retrieval Techniques: A Complete Guide Single-shot reliable consumers with XREADGROUP CLAIM in Redis 8.4 | Redis Long-Horizon AI Agents: Memory & State Infrastructure What is a context engine? What Is a Context Layer? AI Agent Infrastructure Context Retrieval for AI Agents: What It Is & Why It Matters Context Poisoning: How Bad Data Breaks Agent Reasoning Context is all you need: Introducing Redis Iris | Redis Context Engineering for AI: What It Is & How to Build It Dynamic endpoints: Migrate databases without changing your endpoint | Redis AI Shopping Assistants: How They Work & What to Build Endless Aisle Retail: Infrastructure & Real-Time Data LLM Speed Benchmarks: Metrics & Infrastructure Guide Context Pruning: Cut LLM Tokens Without Losing Quality What’s new in two – April 2026 edition AI Agent vs Chatbot: Key Differences Explained Advantages of Building a Vector Search Solution API Latency in LLM Apps: Causes & How to Fix It Security advisory: [CVE‑2026‑23479] [CVE‑2026‑25243] [CVE-2026-25588] [CVE‑2026‑25589] [CVE-2026-23631] | Redis Edge Computing Latency: Causes & How to Reduce It AI Agents vs Workflows: When to Use Each Streaming LLM Responses: Make Your AI App Feel Fast Active-Active vs Active-Passive Database Architecture Prefill vs Decode: LLM Inference Phases Explained Long-Term Memory Architectures for AI Agents Time to First Byte Test: Tools, Causes & Fixes Speculative decoding: how it works & when to use it P95 Latency: What It Is & Why It Matters Why Multi-Agent LLM Systems Fail & How to Fix Them AI Human in the Loop: Production Oversight Patterns Native OpenTelemetry metrics for Redis client libraries | Redis Client-side geographic failover for Redis Active-Active | Redis Use Redis with SQL | Redis Introducing Redis Feature Form Build Google ADK Agents with persistent, real-time memory on Redis | Redis Startup Spotlight: Neuron Systems API Throttling: Algorithms, Patterns & Mistakes Agentic AI Examples Across 6 Industries Best Chunking Strategies for RAG Pipelines Agentic AI Guardrails: Controls That Work Redis joins AWS at GDC to support the next generation of gaming | Redis Designing a semantic routing system: From static rules to dynamic intelligence with Redis and Java | Redis Real-Time Dispatch System: A Complete Guide P99 Latency: What It Means & How to Fix It Tokenization in LLMs: What AI App Devs Need to Know TTFT Meaning: What is Time to First Token? Atomic slot migration with Redis 8.4 Hybrid search benefits: Why your RAG system needs both keyword & vector search What’s new in two: March 2026 edition Vector embedding generators: How they work & how to use them Throughput-optimizing Redis for L2 KV Cache Reuse What is a data pipeline? Building AI agent pipelines that don't forget, fail, or fall apart Redis achieves Google Cloud Ready, Distributed Cloud status ahead of Google Cloud Next ‘26 | Redis Real-time network monitoring: what your data platform needs to keep up AI agent API: How agents connect to the real world What is multicloud infrastructure? A guide for 2026 What is a transaction monitoring system & how does it work? Why your AI agent fails in production & how tracing helps AI agent benchmarks: Where they fall short & why your infrastructure matters What is a JSON database (and when should you use one)? Introducing the Redis Partner Network: A new foundation for real-time innovation How real-time customer segmentation works in retail Payment orchestration & vault architecture in retail Agentic systems vs. GenAI: when generation isn't enough What is fuzzy matching? Semantic caching & routing: two powerful patterns for vector classification Redis alternatives: Why there are no exact substitutes Connect to Azure Managed Redis with Redis Insight 3.2.0 How to tame the thundering herd problem Redis to Manage Storage Replication | Redis How hierarchical navigable small world (HNSW) algorithms can improve search | Redis How leading financial institutions use Redis to drive growth | Redis What’s new in two: May 2025 | Redis Introducing Model Context Protocol (MCP) for Redis | Redis Redis vs. Elasticsearch: What’s faster for GenAI & vector search? | Redis Build fast, production-worthy AI apps with Spring AI and Redis | Redis Azure Managed Redis is GA today | Redis Redis then & now: Adapting with developers through every era | Redis Supercharge Your AI with OpenShift AI and Redis: Unleash speed and scalability | Redis What’s new in two: April 2025 | Redis Redis 8 is now GA, loaded with new features and more than 30 performance improvements | Redis What is a data strategy? 6 key components explained Data replication explained: types, examples & use cases
Agentic AI Architecture: 5 Patterns Explained
Redis · 2026-05-07 · via Redis

Most teams building with LLMs hit the same wall. A single prompt and response works fine for simple use cases, but real production work needs systems that can plan, take actions, check their progress, and keep going until the job is done. That's the jump from a chatbot to an agent, and the architecture behind that loop shapes everything about how your system performs in production.

This guide covers what agentic AI architecture actually is, the five major patterns you'll encounter when building these systems, and why the data layer underneath them matters more than most teams expect.

An agentic system plans, reasons, remembers, and executes actions on behalf of a user. An agent typically operates with autonomy (making decisions independently based on environmental input) and goal-directedness (working toward an objective across multiple steps rather than responding to a single input).

Here's a practitioner test: if there's no loop, it's not an agent. A single-turn question-and-answer interaction doesn't qualify. A real agent accepts an intent, breaks it down, takes action toward it, checks whether the intent was actually followed, and then loops until the job is done.

One common framing includes four core components:

  • Planning module: Decomposes high-level goals into executable subtasks.
  • Memory systems: Maintains continuity across interactions. This is what separates agents from stateless prompt-response systems.
  • Tool-calling interface: Connects the agent to the outside world through APIs, databases, code execution, and search.
  • Reflection & reasoning loops: The agent evaluates outcomes and refines its approach before continuing.

Those pieces can be implemented in different ways, but you'll usually find some version of all four. The important thing to understand is the difference between a static LLM pipeline and an agentic one. In a static pipeline, you know all the steps ahead of time. In an agentic system, the LLM decides what happens next at runtime.

And a word of caution: not everything needs to be agentic. A direct API call returning account status in under 10ms doesn't need an extra LLM step. Doing so adds hundreds of milliseconds, consumes tokens on every invocation, and introduces unnecessary parsing on already-structured data.

The architecture types you'll encounter in production

With that definition in place, the next step is choosing a runtime pattern. A handful of recurring options show up in practice. Each solves a different class of problem, and the right choice depends on your task complexity, coordination needs, and latency budget.

Single-agent architecture

The simplest pattern is one LLM acting as the central reasoning engine, connected to tools and memory, looping until a task is complete. The most common formalization is Reasoning and Acting (ReAct): the agent thinks about what to do next, takes an action with a tool, observes the result, and repeats until it hits a termination condition. Coding assistants like Claude Code and Replit's agent run on this pattern, pairing tool loops with memory compaction and tool-result clearing to keep the agent on track over long sessions.

The main production challenge is context overflow. Long-running agents accumulate tool outputs that can exceed context windows, and a flawed retrieval in step two can shape every reasoning step that follows. Small errors compound across the loop.

Redis Iris

Build fast, accurate AI apps that scale

Get started with Redis for real-time AI context and retrieval.

A single agent is usually enough when the task is self-contained, context stays manageable, and there's a clear termination condition. When in doubt, start with one agent and only add complexity once you've hit a real limit.

Plan & execute architecture

When single agents start making short-sighted decisions on long-horizon tasks, plan-and-execute addresses the problem by splitting the work into two distinct phases. A planner generates the steps upfront, and executors carry out each step without deciding what comes next. Separating planning from execution helps the planner focus on long-horizon coherence rather than per-step decisions.

One common implementation breaks complex queries into a directed acyclic graph (DAG) with explicit dependency ordering. Sub-questions without dependencies run in parallel, and an LLM-based verification layer checks result completeness before output.

Re-planning is what keeps this pattern adaptive rather than brittle. Once execution finishes, the planner is called again to decide whether the task is done or whether a follow-up plan is needed. Scoped re-planning has reported 82% token reduction compared to regenerating full plans from scratch.

The trade-off is upfront latency, and generating accurate plans is hard since LLMs aren't trained specifically for it. But for multi-step workflows where context window degradation is the main problem, this separation of concerns often pays for itself.

Orchestrator-worker architecture

When one agent isn't enough, the next step is splitting work across many. An orchestrator agent receives a goal, breaks it into pieces, delegates each piece to specialized workers, and aggregates their outputs. Decomposition, routing, and aggregation all sit with the orchestrator, and worker count and assignment can be decided at runtime instead of pre-wired.

One implementation pattern uses a top-level orchestrator that delegates to subagents for deep research workflows. Detailed search context stays isolated inside the subagents while the lead agent focuses on synthesis.

How is this different from plan-and-execute? Plan-and-execute decides decomposition and scheduling upfront. An orchestrator makes routing and delegation calls dynamically, based on what it sees coming back from workers and what fails along the way.

Hierarchical multi-agent architecture

When coordination overhead becomes too much for a single orchestrator, teams add another layer. Hierarchical architectures organize agents into a tree-structured chain of command: a strategic layer that decomposes by domain, a coordination layer of domain-specific supervisors, and an execution layer of leaf agents that take concrete actions. Each level adds oversight and refines requirements from the level below, with active validation flowing back up rather than passive relay. Financial-services workflows like loan processing fit this shape, with domain supervisors routing to specialized agents for credit scoring, income verification, and documentation review.

As the tree grows, hierarchical systems need per-layer checkpoints, distributed tracing, and strict tool scoping to stay manageable. Without those, coordination overhead can outweigh the gains from adding another layer.

Reflection architecture

The previous patterns describe who does the work. Reflection changes how that work gets evaluated, and it can layer on top of any of the architectures above. The agent reviews its own outputs, generates a critique, and uses that critique to revise its response—all without updating model weights.

Three approaches show up repeatedly:

  • Reflexion: Stores verbal self-critique in an episodic memory buffer that persists across attempts, giving the agent a signal it can use to improve on later tries.
  • Self-Refine: Runs a generate-critique-revise loop in a single session, with one LLM playing all three roles.
  • CRITIC: Grounds the critique in external tools (search engines, code interpreters, calculators) rather than the model's own judgment. This matters because a model critiquing its own hallucinations with more hallucinations is a known failure mode.

Reflection is most useful when the task has a checkable signal, like tests, retrieval relevance, or tool feedback. Without that signal, extra critique loops can add cost without adding much value.

What agentic architectures need from the data layer

Once you've picked a runtime pattern, the next constraint is the data layer underneath it. Across all five patterns, the backend requirements end up looking similar: memory, vector search, caching, and coordination are usually easier to manage as shared infrastructure than duplicated per agent.

Memory that spans sessions & scopes

Agents benefit from tiered memory. Short-term working memory holds the current session's messages and tool outputs, where fast access keeps the loop tight. Long-term memory persists facts, experiences, and learned workflows across sessions through semantic retrieval.

The hard part is lifecycle management: ranking relevance, expiring stale facts, and keeping things consistent as user context evolves. Outdated information left in an agent's working context is a common failure mode.

Fast vector search for context retrieval

As external data sources grow, retrieval latency starts to undermine real-time apps, especially in time-sensitive settings like financial analytics or live customer support.

Redis is a real-time data platform that supports vector search and semantic caching alongside fast in-memory access. Because the same platform handles vectors and operational data, Redis can support semantic retrieval for agent memory without a separate vector database in some architectures.

Vector Database

Search meaning, not just keywords

Use Redis vector search to deliver smarter results instantly.

Semantic caching to control LLM costs

Agentic workloads burn through tokens fast. In one benchmark for solving a single GitHub issue, the average trajectory contained 48.4K tokens across 40 steps, with tool messages alone accounting for 30.4K. Semantic caching, which retrieves cached responses based on meaning rather than exact text match, can reduce both latency and cost.

Redis LangCache is a fully managed semantic caching service that recognizes when queries mean the same thing despite different wording. Redis reports up to 15x faster responses for cache hits and up to 73% lower LLM inference costs without code changes.

Real-time coordination for multi-agent systems

Multi-agent systems need a way to share state changes, task completions, and errors without tight coupling. Synchronous communication creates structural bottlenecks where a single delayed message can halt an entire workflow. Event-driven coordination helps by allowing decoupled, parallel execution. Pub/sub and streams are common building blocks here.

State that survives interruptions

Production agents need more than basic state saving. They need state versioning, searchability, and rollback. Human-in-the-loop workflows require durable state that survives long pauses while awaiting sign-off. Policy-aware storage also has to handle retention rules, personally identifiable information (PII) handling, and permission trimming on stored state.

How Redis unifies the agent stack

The architecture you pick shapes how your agents think. The data layer shapes whether they can do that fast enough to be useful. Most agentic patterns end up leaning on the same set of capabilities: memory, vector retrieval, semantic caching, coordination, and durable state.

Redis brings those capabilities together in one place—vector search, semantic caching, pub/sub, and durable state, with sub-millisecond latency for many core operations and agent framework integrations including LangChain, LangGraph, and Microsoft Agent Framework. That lets agents consolidate memory and coordination without stitching together as many separate systems, though some teams will still want complementary tools for governance, compliance, or long-horizon archival storage.

Redis Open Source

You've made it this far

Now see how this actually runs in Redis. Power AI apps with real-time context, retrieval, and semantic caching.

Try Redis free to see how it fits your agentic workloads, or talk to the team about your data layer.