惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Microsoft Security Blog
Microsoft Security Blog
Google DeepMind News
Google DeepMind News
P
Privacy International News Feed
www.infosecurity-magazine.com
www.infosecurity-magazine.com
T
Threatpost
GbyAI
GbyAI
V
Visual Studio Blog
H
Help Net Security
Vercel News
Vercel News
P
Palo Alto Networks Blog
Project Zero
Project Zero
AWS News Blog
AWS News Blog
Latest news
Latest news
Cyberwarzone
Cyberwarzone
C
Cybersecurity and Infrastructure Security Agency CISA
The Register - Security
The Register - Security
博客园_首页
WordPress大学
WordPress大学
G
GRAHAM CLULEY
T
Tor Project blog
有赞技术团队
有赞技术团队
Know Your Adversary
Know Your Adversary
AI
AI
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
O
OpenAI News
博客园 - 聂微东
月光博客
月光博客
S
Security Affairs
Webroot Blog
Webroot Blog
L
LangChain Blog
Apple Machine Learning Research
Apple Machine Learning Research
NISL@THU
NISL@THU
N
News and Events Feed by Topic
Blog — PlanetScale
Blog — PlanetScale
S
Securelist
V
Vulnerabilities – Threatpost
aimingoo的专栏
aimingoo的专栏
阮一峰的网络日志
阮一峰的网络日志
Stack Overflow Blog
Stack Overflow Blog
Application and Cybersecurity Blog
Application and Cybersecurity Blog
D
DataBreaches.Net
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
Y
Y Combinator Blog
Cisco Talos Blog
Cisco Talos Blog
The Cloudflare Blog
IT之家
IT之家
博客园 - 三生石上(FineUI控件)
雷峰网
雷峰网
L
Lohrmann on Cybersecurity
T
The Blog of Author Tim Ferriss

Redis

Real-Time Fraud Detection: Latency, Features & Scale Context window in AI: why every token is a budget decision Connecting to Redis Cloud with AWS PrivateLink vs. VPC peering | Redis Redis Data Integration in Redis Cloud is now GA in AWS | Redis Why AI Misses Business Context & How Teams Fix It AI Reasoning Explained: Why Context Matters Semantic Layer vs Context Layer: Key Differences Redis array data type: How it works and when to use it Context Graphs vs. Vector Search: When RAG Falls Short What’s new in two – May 2026 edition Redis 8.8 performance improvements: Faster string, hash, streams, SCAN & more Redis 8.8: New array data structure & open source features How Conflict-free Replicated Data Types power active-active database replication Context Orchestration: What It Is & How It Works Context Compaction for AI Agents: A Complete Guide Prompt Bloat: Causes, Costs & Fixes for LLM Apps Agentic Retrieval Techniques: A Complete Guide Single-shot reliable consumers with XREADGROUP CLAIM in Redis 8.4 | Redis Long-Horizon AI Agents: Memory & State Infrastructure What Is a Context Layer? AI Agent Infrastructure Context Retrieval for AI Agents: What It Is & Why It Matters Context Poisoning: How Bad Data Breaks Agent Reasoning Context is all you need: Introducing Redis Iris | Redis Context Engineering for AI: What It Is & How to Build It Dynamic endpoints: Migrate databases without changing your endpoint | Redis AI Shopping Assistants: How They Work & What to Build Endless Aisle Retail: Infrastructure & Real-Time Data LLM Speed Benchmarks: Metrics & Infrastructure Guide Context Pruning: Cut LLM Tokens Without Losing Quality What’s new in two – April 2026 edition Agentic AI Architecture: 5 Patterns Explained AI Agent vs Chatbot: Key Differences Explained Advantages of Building a Vector Search Solution API Latency in LLM Apps: Causes & How to Fix It Security advisory: [CVE‑2026‑23479] [CVE‑2026‑25243] [CVE-2026-25588] [CVE‑2026‑25589] [CVE-2026-23631] | Redis Edge Computing Latency: Causes & How to Reduce It AI Agents vs Workflows: When to Use Each Streaming LLM Responses: Make Your AI App Feel Fast Active-Active vs Active-Passive Database Architecture Prefill vs Decode: LLM Inference Phases Explained Long-Term Memory Architectures for AI Agents Time to First Byte Test: Tools, Causes & Fixes Speculative decoding: how it works & when to use it P95 Latency: What It Is & Why It Matters Why Multi-Agent LLM Systems Fail & How to Fix Them AI Human in the Loop: Production Oversight Patterns Native OpenTelemetry metrics for Redis client libraries | Redis Client-side geographic failover for Redis Active-Active | Redis Use Redis with SQL | Redis Introducing Redis Feature Form Build Google ADK Agents with persistent, real-time memory on Redis | Redis Startup Spotlight: Neuron Systems API Throttling: Algorithms, Patterns & Mistakes Agentic AI Examples Across 6 Industries Best Chunking Strategies for RAG Pipelines Agentic AI Guardrails: Controls That Work Redis joins AWS at GDC to support the next generation of gaming | Redis Designing a semantic routing system: From static rules to dynamic intelligence with Redis and Java | Redis Real-Time Dispatch System: A Complete Guide P99 Latency: What It Means & How to Fix It Tokenization in LLMs: What AI App Devs Need to Know TTFT Meaning: What is Time to First Token? Atomic slot migration with Redis 8.4 Hybrid search benefits: Why your RAG system needs both keyword & vector search What’s new in two: March 2026 edition Vector embedding generators: How they work & how to use them Throughput-optimizing Redis for L2 KV Cache Reuse What is a data pipeline? Building AI agent pipelines that don't forget, fail, or fall apart Redis achieves Google Cloud Ready, Distributed Cloud status ahead of Google Cloud Next ‘26 | Redis Real-time network monitoring: what your data platform needs to keep up AI agent API: How agents connect to the real world What is multicloud infrastructure? A guide for 2026 What is a transaction monitoring system & how does it work? Why your AI agent fails in production & how tracing helps AI agent benchmarks: Where they fall short & why your infrastructure matters What is a JSON database (and when should you use one)? Introducing the Redis Partner Network: A new foundation for real-time innovation How real-time customer segmentation works in retail Payment orchestration & vault architecture in retail Agentic systems vs. GenAI: when generation isn't enough What is fuzzy matching? Semantic caching & routing: two powerful patterns for vector classification Redis alternatives: Why there are no exact substitutes Connect to Azure Managed Redis with Redis Insight 3.2.0 How to tame the thundering herd problem Redis to Manage Storage Replication | Redis How hierarchical navigable small world (HNSW) algorithms can improve search | Redis How leading financial institutions use Redis to drive growth | Redis What’s new in two: May 2025 | Redis Introducing Model Context Protocol (MCP) for Redis | Redis Redis vs. Elasticsearch: What’s faster for GenAI & vector search? | Redis Build fast, production-worthy AI apps with Spring AI and Redis | Redis Azure Managed Redis is GA today | Redis Redis then & now: Adapting with developers through every era | Redis Supercharge Your AI with OpenShift AI and Redis: Unleash speed and scalability | Redis What’s new in two: April 2025 | Redis Redis 8 is now GA, loaded with new features and more than 30 performance improvements | Redis What is a data strategy? 6 key components explained Data replication explained: types, examples & use cases
What is a context engine?
Redis · 2026-05-20 · via Redis

Count the systems behind your AI agent. A vector database for embeddings. A separate cache for LLM responses. A memory service for conversation state. A pipeline syncing data from Postgres. Probably a queue. Maybe a feature store. Now count the things that go wrong when one of them is out of sync with the others.

Most production agent failures aren't model failures. They're context failures: the data the agent needed was somewhere in the stack, just not where the agent could see it, not fresh, or not connected to what came before. So teams swap models, tweak prompts, and tune the temperature. The failures keep coming back. The fix isn't a smarter prompt. It's a context engine: the platform layer that handles retrieval, memory, caching, and freshness as one coordinated stack instead of five disconnected ones.

This guide covers what a context engine is, how it fits into agent architecture, where traditional retrieval-augmented generation (RAG) runs into limits, and how Redis powers the context layer in production.

A context engine is the layer that gets the right information in front of an AI agent at the right time. It sits between your data and your agent, doing all the work of finding, ranking, and assembling context before the agent calls the model.

Without one, every agent in your stack rolls its own retrieval logic, hits a different mix of databases and APIs, and ends up with inconsistent context. With one, all of that lives behind a single coordinated pipeline. The agent asks for what it needs and gets a curated payload back, every time.

The category exists because LLMs don't know your data. They have to be told, on every call, what's relevant, and what's relevant depends on the user, the session, the workflow, and what just happened two turns ago. Context engineering, the practice of curating what enters the LLM's context window at each step, has become a core skill for engineers building agents. A context engine is the infrastructure that makes context engineering possible at production scale.

How a context engine sits between your data & your agents

The context engine talks to both sides so your agent doesn't have to. On the data side, it connects to everything your agent might need: relational databases, document stores, vector indexes, internal APIs, event streams. It handles authentication, query translation, freshness, and access control once, instead of every agent re-implementing them.

On the agent side, it speaks the language agents already understand: tools, schemas, and natural-language queries. The agent asks for "everything we know about customer X" or "the last five steps in this workflow," and gets back a clean, ranked payload, not a pile of raw rows from six different systems and the job of stitching them together.

Redis Iris

Build agents that remember, not agents that guess

Redis Iris gives every agent fresh context and long-term memory.

The three layers every production context engine should cover

A working context engine has to do three things: get the right data (retrieval), remember what's happened (memory), and keep context both fast and current (caching & freshness). The next three sections walk through each.

Layer 1: Retrieval & ranking

Retrieval is the first layer to get right. Your agent needs to find the right documents, records, and facts in your data the moment it needs them, and rank them by what actually matters for the query.

Most production systems use hybrid retrieval to do this: keyword search for exact terms like product codes, vector search for conceptually similar content, and a merged ranked list at the end. Retrieval also usually runs in two passes. Cast a wide net first with something fast and approximate, then re-rank to fit only the most relevant results into the token budget.

But retrieval isn't just about documents. Agents also need structured context: who the user is, what plan they're on, which tickets they've opened, what their last order looked like. That data lives in your operational systems, not your vector index, and pulling it from a different place every time leads to agents that contradict themselves between turns.

That's why context engines run retrieval and operational data through the same layer. Redis Search is built for exactly this: vector, full-text, and hybrid retrieval against the same data your app already uses, so your agent sees one consistent view instead of stitching together five.

Layer 2: Memory & session state

Retrieval tells agents what's in your data. Memory tells agents what happened before. That distinction matters because LLMs themselves don't remember anything. Every call starts fresh: no user, no last turn, no decision from five steps ago. If your agent feels like it has memory, that's because something around the model is holding state and feeding it back in on every call. That something is the memory layer.

Memory layers usually run on two timescales. Short-term working memory captures conversation context within a session, so the agent can refer back to what was said two turns ago. Long-term memory persists user preferences, past decisions, and high-signal facts across sessions, so the agent on Tuesday knows what the same user asked for on Friday.

A memory layer also can't be passive. Dumping every message into a log won't cut it. Production memory systems actively distill conversations: extracting user profile information, summarizing chat history, and consolidating duplicate facts so the agent gets something useful back, not raw turns.

Session state is a third, shorter-lived timeline. It tracks what the agent is currently doing across a multi-step workflow: which tools it's called, what those tools returned, and how far along the plan it is. Without this layer, agents start every turn from scratch, which compounds errors and erodes user trust.

Redis Agent Memory handles all three timelines in one component: short-term working memory, long-term memory with vector retrieval, and session state. It also runs the distillation step automatically, summarizing sessions, promoting durable facts to long-term storage, and keeping it all retrievable by semantic similarity.

Layer 3: Caching & data freshness

After retrieval and memory are in place, the remaining challenge is making context both fast and current. Semantic caching differs from traditional key-value databases. Instead of matching exact query strings, it uses vector similarity to identify semantically equivalent queries and serve cached responses. "What's the weather?" and "Tell me today's temperature" use different words but mean the same thing. At agent scale, parallel sub-agents or repeated sessions issue rephrased versions of the same question constantly.

Every duplicate triggers a full retrieval-and-generation cycle without semantic caching in place. Cache hit rate becomes a production metric engineering teams should monitor alongside retrieval quality.

On the freshness side, agents need data that reflects current state, not yesterday's batch export. When real-time data access is missing, retrieved context goes stale as indexed data diverges from the actual system state, and agents act on information that no longer reflects reality.

Redis LangCache is a fully managed semantic caching service that catches rephrased duplicates and returns cached answers in milliseconds rather than running the full pipeline again. Redis Data Integration keeps your context fresh by syncing changes from operational databases like Postgres, MySQL, Oracle, and MongoDB into Redis in near real time, so what the agent retrieves matches what's actually in your systems right now.

Memory

Fresh context, every call

Redis Iris keeps agent data current so answers stay accurate.

Why traditional RAG runs into limits at agent scale

Most RAG pipelines were designed around a simpler pattern: one question, one retrieval pass, one generation step, no memory between calls, no caching between calls. That works for chat over a fixed document set. It doesn't hold up when agents need to reason across multiple steps, share state, and reuse work.

The three layers above name what's missing: iterative retrieval the model can guide, a memory substrate so agents don't start every turn from scratch, and semantic caching so duplicate queries don't trigger a full pipeline run each time.

Stuffing more documents into the window doesn't fix this. When too many irrelevant or poorly structured documents land in context, the LLM gets worse at finding the right answer. One paper estimates that a 32k context length supports only about 10 to 15 effective interaction turns as tool responses accumulate. Researchers studying this have named four degradation modes that show up at scale:

  • Context poisoning: errors or hallucinations that enter the context and get repeatedly referenced
  • Context distraction: the model focusing on accumulated history over its training
  • Context confusion: irrelevant content polluting responses
  • Context clash: conflicting information inside the window

Single-pass RAG has no built-in defenses against any of them.

What Redis adds to the context engine stack

Redis is the real-time context engine for AI: sub-millisecond latency on many core in-memory operations, with vector search and semantic caching built into the same platform. It sits between raw data sources and the agent reasoning loops where these problems show up.

Redis Iris packages retrieval, freshness, memory, and caching into one runtime built on top of Redis. Underneath it all is Redis Search, the fast layer that pulls structured, unstructured, and vector data. Iris is composed of five tools:

Context Retriever targets a specific gap in the retrieval layer: getting agents to structured business data without relying on text-to-SQL. Devs define a semantic model of entities, fields, and relationships, and Context Retriever auto-generates Model Context Protocol (MCP) compatible tools that agents call to navigate that schema. Agents navigate business entities via defined interfaces instead of guessing at SQL. Currently in public preview.

Supporting all of these tools is the underlying Redis Query Engine, which supports Hierarchical Navigable Small World (HNSW) and FLAT indexes for vector storage, k-nearest neighbor (kNN) and hybrid retrieval, and full-text search. In one published Redis benchmark on a billion-vector dataset, with 50 concurrent queries retrieving the top 100 neighbors under a specific HNSW configuration, Redis reported 90% precision at ~200ms median latency including round-trip time.

AI Agent

Agents are only as smart as the data they can reach

Redis Iris connects memory, live data, and retrieval in one place.

Context is a platform problem, not a prompt problem

Production agent performance depends as much on context infrastructure as on the model itself. Context handling is a platform concern, not a prompt-design problem, and it runs across retrieval, memory, caching, and freshness: each one affects whether an agent can act with relevance and reliability in production.

Redis Iris is built to be that platform layer. It brings semantic caching, vector search, memory, and live operational data into one runtime on top of Redis, so teams stop stitching together a vector database, a memory service, a cache, and custom ETL glue. For teams trying to reduce fragmentation across the agent stack, a unified context layer simplifies how context is searched, assembled, and served.

If you're already running Redis, try Redis Iris free to start adding Iris context capabilities to your agent stack, or book a meeting to talk through your architecture with the team.