惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Microsoft Security Blog
Microsoft Security Blog
Google DeepMind News
Google DeepMind News
P
Privacy International News Feed
www.infosecurity-magazine.com
www.infosecurity-magazine.com
T
Threatpost
GbyAI
GbyAI
V
Visual Studio Blog
H
Help Net Security
Vercel News
Vercel News
P
Palo Alto Networks Blog
Project Zero
Project Zero
AWS News Blog
AWS News Blog
Latest news
Latest news
Cyberwarzone
Cyberwarzone
C
Cybersecurity and Infrastructure Security Agency CISA
The Register - Security
The Register - Security
博客园_首页
WordPress大学
WordPress大学
G
GRAHAM CLULEY
T
Tor Project blog
有赞技术团队
有赞技术团队
Know Your Adversary
Know Your Adversary
AI
AI
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
O
OpenAI News
博客园 - 聂微东
月光博客
月光博客
S
Security Affairs
Webroot Blog
Webroot Blog
L
LangChain Blog
Apple Machine Learning Research
Apple Machine Learning Research
NISL@THU
NISL@THU
N
News and Events Feed by Topic
Blog — PlanetScale
Blog — PlanetScale
S
Securelist
V
Vulnerabilities – Threatpost
aimingoo的专栏
aimingoo的专栏
阮一峰的网络日志
阮一峰的网络日志
Stack Overflow Blog
Stack Overflow Blog
Application and Cybersecurity Blog
Application and Cybersecurity Blog
D
DataBreaches.Net
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
Y
Y Combinator Blog
Cisco Talos Blog
Cisco Talos Blog
The Cloudflare Blog
IT之家
IT之家
博客园 - 三生石上(FineUI控件)
雷峰网
雷峰网
L
Lohrmann on Cybersecurity
T
The Blog of Author Tim Ferriss

Redis

Real-Time Fraud Detection: Latency, Features & Scale Context window in AI: why every token is a budget decision Connecting to Redis Cloud with AWS PrivateLink vs. VPC peering | Redis Redis Data Integration in Redis Cloud is now GA in AWS | Redis Why AI Misses Business Context & How Teams Fix It AI Reasoning Explained: Why Context Matters Semantic Layer vs Context Layer: Key Differences Redis array data type: How it works and when to use it Context Graphs vs. Vector Search: When RAG Falls Short What’s new in two – May 2026 edition Redis 8.8 performance improvements: Faster string, hash, streams, SCAN & more Redis 8.8: New array data structure & open source features How Conflict-free Replicated Data Types power active-active database replication Context Orchestration: What It Is & How It Works Context Compaction for AI Agents: A Complete Guide Prompt Bloat: Causes, Costs & Fixes for LLM Apps Single-shot reliable consumers with XREADGROUP CLAIM in Redis 8.4 | Redis Long-Horizon AI Agents: Memory & State Infrastructure What is a context engine? What Is a Context Layer? AI Agent Infrastructure Context Retrieval for AI Agents: What It Is & Why It Matters Context Poisoning: How Bad Data Breaks Agent Reasoning Context is all you need: Introducing Redis Iris | Redis Context Engineering for AI: What It Is & How to Build It Dynamic endpoints: Migrate databases without changing your endpoint | Redis AI Shopping Assistants: How They Work & What to Build Endless Aisle Retail: Infrastructure & Real-Time Data LLM Speed Benchmarks: Metrics & Infrastructure Guide Context Pruning: Cut LLM Tokens Without Losing Quality What’s new in two – April 2026 edition Agentic AI Architecture: 5 Patterns Explained AI Agent vs Chatbot: Key Differences Explained Advantages of Building a Vector Search Solution API Latency in LLM Apps: Causes & How to Fix It Security advisory: [CVE‑2026‑23479] [CVE‑2026‑25243] [CVE-2026-25588] [CVE‑2026‑25589] [CVE-2026-23631] | Redis Edge Computing Latency: Causes & How to Reduce It AI Agents vs Workflows: When to Use Each Streaming LLM Responses: Make Your AI App Feel Fast Active-Active vs Active-Passive Database Architecture Prefill vs Decode: LLM Inference Phases Explained Long-Term Memory Architectures for AI Agents Time to First Byte Test: Tools, Causes & Fixes Speculative decoding: how it works & when to use it P95 Latency: What It Is & Why It Matters Why Multi-Agent LLM Systems Fail & How to Fix Them AI Human in the Loop: Production Oversight Patterns Native OpenTelemetry metrics for Redis client libraries | Redis Client-side geographic failover for Redis Active-Active | Redis Use Redis with SQL | Redis Introducing Redis Feature Form Build Google ADK Agents with persistent, real-time memory on Redis | Redis Startup Spotlight: Neuron Systems API Throttling: Algorithms, Patterns & Mistakes Agentic AI Examples Across 6 Industries Best Chunking Strategies for RAG Pipelines Agentic AI Guardrails: Controls That Work Redis joins AWS at GDC to support the next generation of gaming | Redis Designing a semantic routing system: From static rules to dynamic intelligence with Redis and Java | Redis Real-Time Dispatch System: A Complete Guide P99 Latency: What It Means & How to Fix It Tokenization in LLMs: What AI App Devs Need to Know TTFT Meaning: What is Time to First Token? Atomic slot migration with Redis 8.4 Hybrid search benefits: Why your RAG system needs both keyword & vector search What’s new in two: March 2026 edition Vector embedding generators: How they work & how to use them Throughput-optimizing Redis for L2 KV Cache Reuse What is a data pipeline? Building AI agent pipelines that don't forget, fail, or fall apart Redis achieves Google Cloud Ready, Distributed Cloud status ahead of Google Cloud Next ‘26 | Redis Real-time network monitoring: what your data platform needs to keep up AI agent API: How agents connect to the real world What is multicloud infrastructure? A guide for 2026 What is a transaction monitoring system & how does it work? Why your AI agent fails in production & how tracing helps AI agent benchmarks: Where they fall short & why your infrastructure matters What is a JSON database (and when should you use one)? Introducing the Redis Partner Network: A new foundation for real-time innovation How real-time customer segmentation works in retail Payment orchestration & vault architecture in retail Agentic systems vs. GenAI: when generation isn't enough What is fuzzy matching? Semantic caching & routing: two powerful patterns for vector classification Redis alternatives: Why there are no exact substitutes Connect to Azure Managed Redis with Redis Insight 3.2.0 How to tame the thundering herd problem Redis to Manage Storage Replication | Redis How hierarchical navigable small world (HNSW) algorithms can improve search | Redis How leading financial institutions use Redis to drive growth | Redis What’s new in two: May 2025 | Redis Introducing Model Context Protocol (MCP) for Redis | Redis Redis vs. Elasticsearch: What’s faster for GenAI & vector search? | Redis Build fast, production-worthy AI apps with Spring AI and Redis | Redis Azure Managed Redis is GA today | Redis Redis then & now: Adapting with developers through every era | Redis Supercharge Your AI with OpenShift AI and Redis: Unleash speed and scalability | Redis What’s new in two: April 2025 | Redis Redis 8 is now GA, loaded with new features and more than 30 performance improvements | Redis What is a data strategy? 6 key components explained Data replication explained: types, examples & use cases
Agentic Retrieval Techniques: A Complete Guide
Redis · 2026-05-23 · via Redis

Your AI assistant just answered a complex, multi-part question by pulling data from three different sources, checking its own work, and re-querying when the first results fell short. That's agentic retrieval in action.

This guide covers what agentic retrieval is, how it differs from traditional retrieval-augmented generation (RAG), the main techniques behind it, and where Redis fits as the data layer.

What is agentic retrieval?

Agentic retrieval is an architectural pattern where an LLM-powered agent controls retrieval: deciding when to retrieve, what to query, which tools or sources to use, and whether the results are good enough. If they're not, the agent iterates, reformulates, and tries again until it has enough evidence or hits a stopping condition.

Traditional RAG, by contrast, uses a fixed pipeline that fetches documents in a single pass with static control flow that can't adapt mid-process. Agentic retrieval treats retrieval as a dynamic operation rather than a one-off preprocessing step.

In practice, this pattern shows up in assistants, copilots, and enterprise search where the system has to gather evidence across multiple steps instead of relying on one retrieval pass.

How agentic retrieval fits into modern AI systems

Agentic retrieval is one action inside a larger agent loop, sitting alongside tool use, memory reads and writes, planning, and response generation. The agent invokes it, skips it, or repeats it based on what the current step needs.

In a typical agent workflow, retrieved data sometimes grounds an answer, sometimes informs the next tool call, sometimes updates long-term memory, and sometimes resolves a routing decision. In a Reasoning and Acting (ReAct) loop, the agent reasons about what it needs, issues a retrieval call, observes results, then decides whether to refine the query, switch sources, call a tool, write to memory, or generate a response.

That placement also means failures compound across the loop: a flawed retrieval in step two shapes reasoning in step five, which determines the tool call in step eight. Without tracing the full reasoning chain, the bad output is visible but the originating decision is not.

Redis Iris

Redis Iris serves agent context in milliseconds

Redis Iris connects memory, live data, and retrieval in one place.

From static RAG to agentic retrieval: why we needed an upgrade

Retrieval evolved in four stages, with each generation fixing a limitation of the last.

Keyword search

Keyword search matched surface-level tokens and missed anything phrased differently. It worked for exact-match lookups but struggled on questions that required connecting information across documents, a limitation visible in Best Match 25 (BM25) results on multi-hop benchmarks.

Vector & naive RAG

Dense vector retrieval added semantic similarity and let systems match meaning instead of just words. It was a clear improvement, but static pipelines still retrieved on every query whether or not it helped, and they had no way to recover when the first pass missed.

Modular & advanced RAG

Modular RAG added query rewriting, reranking, and hybrid search to improve retrieval quality at each step. It made the pipeline smarter, but the pipeline itself stayed largely pre-planned and linear, with limited ability to adapt based on intermediate results.

Agentic retrieval

Agentic retrieval hands control of the search process to the agent itself, making retrieval iterative and conditional on what it has already learned. Once the agent can decide when to keep searching, failure modes shift from "we didn't find the document" to "we found it but reasoned about it poorly," a more tractable problem.

The context engine: the missing layer under most RAG stacks

Once retrieval becomes iterative, the next question is what infrastructure keeps the loop supplied with fresh, usable context.

A context engine is the layer beneath your RAG and agent frameworks responsible for ingesting, indexing, retrieving, governing, and caching the context your LLMs and agents need. Without one, teams cobble together a vector database, a document store, a cache layer, and possibly a time-series database. The result is multiple systems to operate, with integration seams where data goes stale.

Redis Iris is a context engine that feeds agents the right context, in the right form, at the right time, built on Redis' in-memory architecture and designed for low-latency AI workloads. The application or agent framework still orchestrates retrieval strategy; Iris makes sure the context it reaches for is navigable, fast, fresh, and backed by memory that builds over time.

Matching techniques in agentic retrieval

Matching is how agentic retrieval finds the right information for a given question, and no single method does it well on its own.

Hybrid search

Hybrid search combines dense vector search with sparse keyword search like BM25 so each method covers the other's blind spots. Vector search handles paraphrased queries; BM25 catches exact identifiers like SKUs and error codes. Their results are then fused with a hybrid ranking method, and combining the two has been shown to improve recall over single-method pipelines. Fusion can backfire, though, when one path is substantially weaker than the other, a weakest-link effect seen in hybrid search research.

Multi-level retrieval

Multi-level retrieval indexes the same content at different granularities (full documents, sections, paragraphs, sentences) and matches at the level best suited to the query. A broad question hits document-level summaries; a specific one drops straight to sentences. Hierarchical approaches where the LLM navigates a corpus's semantic tree adapt the search path to the query.

Reranking

Reranking improves quality after the first retrieval pass by reordering a broad candidate set into a tighter shortlist. A cross-encoder reranker scores each candidate by attending jointly to the query and document, then selects the top few for the LLM. Adding a cross-encoder reranker on top of hybrid retrieval has been shown to raise long-document QA scores.

Metadata filtering

Metadata filtering narrows the search space by structured attributes like department, date, document type, and access level, before vector or keyword search runs. A RAG survey shows chunks enriched with metadata can be filtered by recency, source, or category, with timestamp weighting to keep knowledge fresh. The biggest gains come when filtering happens on the same layer as retrieval, so hybrid search and metadata filters share one query path.

Redis AI Agent Memory

Build agents that remember, not agents that guess

Redis Iris gives every agent fresh context and long-term memory.

Routing techniques in agentic retrieval

Routing is how an agent decides where a query should go: which knowledge base, tool, or modality to hit, and in what order.

LLM-based routing

LLM-based routing has the model classify intent and output a structured enum that maps to a specific data source. It's the most flexible option because the model can reason about nuance, but it adds an LLM call to every query. It works best when target sources are few and classification benefits from query context.

Semantic routing

Semantic routing skips the LLM call by matching queries against pre-defined example utterances using embedding similarity, offering speed benefits over full LLM inference. The tradeoff is rigidity: it only handles the categories you've defined upfront. That makes it well suited to high-throughput systems with stable routing categories.

Parallel federation

Parallel federation fans queries out to multiple specialized agents simultaneously using per-destination query reformulation, then synthesizes results into one response. It's the right choice when you don't know which source has the answer, or when the answer spans multiple sources. The classifier node can generate not just a routing decision but a targeted sub-question for each source's domain.

Routing configurations, source metadata, and access policies are themselves data agents need at query time. Keeping them on the same layer as retrieval indexes avoids extra hops to separate stores.

Query formulation techniques in agentic retrieval

Query formulation is how an agent decides what to actually ask for once it's picked a source, and it can matter as much as the retrieval algorithm itself.

Query planning

Query planning decomposes complex, multi-part questions into atomic sub-queries before retrieval starts. The LevelRAG architecture, for example, uses a high-level searcher that breaks complex queries into independent atomic queries, decoupled from retriever-specific optimizations.

Query rewriting

Query rewriting rephrases queries to better align with how documents are indexed. It matters most in multi-turn conversations, where ambiguous references and colloquial omissions force the rewriter to use full dialogue history to produce a query the retriever can act on.

Query expansion

Query expansion enriches a query with additional terms or generated content to widen the recall net. Hypothetical Document Embeddings (HyDE), for example, generates a pseudo-answer using the LLM and then uses that hypothetical answer for similarity search rather than the original query.

Multi-turn refinement

Multi-turn refinement ties planning, rewriting, and expansion together across reasoning cycles, with each retrieval step's results informing the next query. Interleaving Retrieval with Chain-of-Thought (IRCoT), for instance, interleaves reasoning with retrieval, making the two co-dependent. The tradeoff is error propagation through the full chain.

Caching & memory techniques in agentic retrieval

Caching and memory are how an agent reuses prior work, both to cut cost and latency, and to keep continuity across steps and sessions.

Semantic caching

Semantic caching matches incoming queries against previously answered ones by meaning rather than exact text, cutting retrieval and generation work before the rest of the pipeline runs. As a first layer before LLM invocation, it can deliver millisecond-level responses for recurrent queries, and agent-aware variants like the Agent RAG Caching (ARC) algorithm push that further by reusing retrieval work across agent steps, not just final answers.

Session and long-term memory

Session and long-term memory give agents continuity across a single interaction and across sessions. Session memory holds conversation state within the LLM's active context window. Long-term memory persists distilled facts, user preferences, and behavioral patterns across sessions using vector search for conceptual retrieval, so the agent can recall both immediate context and historical knowledge when making retrieval decisions.

How Redis Iris ties matching, routing, and memory together

Agentic retrieval techniques benefit from sharing a single data layer rather than getting stitched across separate systems. Redis Iris is built for that role: a context engine that holds retrieval indexes, agent memory, and cached responses in one in-memory platform.

Underneath:

  • Redis Context Retriever turns business data into governed, agent-accessible tools via Model Context Protocol (MCP), so agents navigate entities and relationships instead of writing raw queries against the database.
  • Redis Agent Memory persists two-tier memory (session and long-term) across tasks and sessions, with semantic retrieval for recalling distilled facts and preferences. Available as a REST API and Python SDK.
  • Redis LangCache runs semantic caching in front of the LLM on the same layer the agent reads from for retrieval; in Redis benchmarks it cut LLM inference costs by up to 73% without code changes.
  • Redis Data Integration keeps Redis in sync with systems of record using change data capture, so agents work against current operational state.

Under all four, Redis Search handles vector, structured, unstructured, and real-time retrieval in a single query path.

AI Agent

Fresh context, every call

Redis Iris keeps agent data current so answers stay accurate.

Why agentic retrieval needs a fast context layer

Agentic retrieval shifts retrieval from a one-shot preprocessing step into an iterative loop the agent controls. Matching, routing, query formulation, and caching all become decisions the agent revisits as it gathers evidence. That works only if the infrastructure beneath the loop can serve fresh context at low latency, hold agent memory, and let one query path span vectors, metadata, and full text.

Redis Iris is the real-time context engine for AI: an in-memory platform that unifies retrieval, caching, and memory while leaving orchestration to the application or agent framework.

Try Redis Iris to start building, or book a meeting to talk through your agent architecture with our team.