惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Microsoft Security Blog
Microsoft Security Blog
Google DeepMind News
Google DeepMind News
P
Privacy International News Feed
www.infosecurity-magazine.com
www.infosecurity-magazine.com
T
Threatpost
GbyAI
GbyAI
V
Visual Studio Blog
H
Help Net Security
Vercel News
Vercel News
P
Palo Alto Networks Blog
Project Zero
Project Zero
AWS News Blog
AWS News Blog
Latest news
Latest news
Cyberwarzone
Cyberwarzone
C
Cybersecurity and Infrastructure Security Agency CISA
The Register - Security
The Register - Security
博客园_首页
WordPress大学
WordPress大学
G
GRAHAM CLULEY
T
Tor Project blog
有赞技术团队
有赞技术团队
Know Your Adversary
Know Your Adversary
AI
AI
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
O
OpenAI News
博客园 - 聂微东
月光博客
月光博客
S
Security Affairs
Webroot Blog
Webroot Blog
L
LangChain Blog
Apple Machine Learning Research
Apple Machine Learning Research
NISL@THU
NISL@THU
N
News and Events Feed by Topic
Blog — PlanetScale
Blog — PlanetScale
S
Securelist
V
Vulnerabilities – Threatpost
aimingoo的专栏
aimingoo的专栏
阮一峰的网络日志
阮一峰的网络日志
Stack Overflow Blog
Stack Overflow Blog
Application and Cybersecurity Blog
Application and Cybersecurity Blog
D
DataBreaches.Net
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
Y
Y Combinator Blog
Cisco Talos Blog
Cisco Talos Blog
The Cloudflare Blog
IT之家
IT之家
博客园 - 三生石上(FineUI控件)
雷峰网
雷峰网
L
Lohrmann on Cybersecurity
T
The Blog of Author Tim Ferriss

Redis

Real-Time Fraud Detection: Latency, Features & Scale Context window in AI: why every token is a budget decision Connecting to Redis Cloud with AWS PrivateLink vs. VPC peering | Redis Redis Data Integration in Redis Cloud is now GA in AWS | Redis Why AI Misses Business Context & How Teams Fix It AI Reasoning Explained: Why Context Matters Semantic Layer vs Context Layer: Key Differences Redis array data type: How it works and when to use it Context Graphs vs. Vector Search: When RAG Falls Short What’s new in two – May 2026 edition Redis 8.8 performance improvements: Faster string, hash, streams, SCAN & more Redis 8.8: New array data structure & open source features How Conflict-free Replicated Data Types power active-active database replication Context Orchestration: What It Is & How It Works Prompt Bloat: Causes, Costs & Fixes for LLM Apps Agentic Retrieval Techniques: A Complete Guide Single-shot reliable consumers with XREADGROUP CLAIM in Redis 8.4 | Redis Long-Horizon AI Agents: Memory & State Infrastructure What is a context engine? What Is a Context Layer? AI Agent Infrastructure Context Retrieval for AI Agents: What It Is & Why It Matters Context Poisoning: How Bad Data Breaks Agent Reasoning Context is all you need: Introducing Redis Iris | Redis Context Engineering for AI: What It Is & How to Build It Dynamic endpoints: Migrate databases without changing your endpoint | Redis AI Shopping Assistants: How They Work & What to Build Endless Aisle Retail: Infrastructure & Real-Time Data LLM Speed Benchmarks: Metrics & Infrastructure Guide Context Pruning: Cut LLM Tokens Without Losing Quality What’s new in two – April 2026 edition Agentic AI Architecture: 5 Patterns Explained AI Agent vs Chatbot: Key Differences Explained Advantages of Building a Vector Search Solution API Latency in LLM Apps: Causes & How to Fix It Security advisory: [CVE‑2026‑23479] [CVE‑2026‑25243] [CVE-2026-25588] [CVE‑2026‑25589] [CVE-2026-23631] | Redis Edge Computing Latency: Causes & How to Reduce It AI Agents vs Workflows: When to Use Each Streaming LLM Responses: Make Your AI App Feel Fast Active-Active vs Active-Passive Database Architecture Prefill vs Decode: LLM Inference Phases Explained Long-Term Memory Architectures for AI Agents Time to First Byte Test: Tools, Causes & Fixes Speculative decoding: how it works & when to use it P95 Latency: What It Is & Why It Matters Why Multi-Agent LLM Systems Fail & How to Fix Them AI Human in the Loop: Production Oversight Patterns Native OpenTelemetry metrics for Redis client libraries | Redis Client-side geographic failover for Redis Active-Active | Redis Use Redis with SQL | Redis Introducing Redis Feature Form Build Google ADK Agents with persistent, real-time memory on Redis | Redis Startup Spotlight: Neuron Systems API Throttling: Algorithms, Patterns & Mistakes Agentic AI Examples Across 6 Industries Best Chunking Strategies for RAG Pipelines Agentic AI Guardrails: Controls That Work Redis joins AWS at GDC to support the next generation of gaming | Redis Designing a semantic routing system: From static rules to dynamic intelligence with Redis and Java | Redis Real-Time Dispatch System: A Complete Guide P99 Latency: What It Means & How to Fix It Tokenization in LLMs: What AI App Devs Need to Know TTFT Meaning: What is Time to First Token? Atomic slot migration with Redis 8.4 Hybrid search benefits: Why your RAG system needs both keyword & vector search What’s new in two: March 2026 edition Vector embedding generators: How they work & how to use them Throughput-optimizing Redis for L2 KV Cache Reuse What is a data pipeline? Building AI agent pipelines that don't forget, fail, or fall apart Redis achieves Google Cloud Ready, Distributed Cloud status ahead of Google Cloud Next ‘26 | Redis Real-time network monitoring: what your data platform needs to keep up AI agent API: How agents connect to the real world What is multicloud infrastructure? A guide for 2026 What is a transaction monitoring system & how does it work? Why your AI agent fails in production & how tracing helps AI agent benchmarks: Where they fall short & why your infrastructure matters What is a JSON database (and when should you use one)? Introducing the Redis Partner Network: A new foundation for real-time innovation How real-time customer segmentation works in retail Payment orchestration & vault architecture in retail Agentic systems vs. GenAI: when generation isn't enough What is fuzzy matching? Semantic caching & routing: two powerful patterns for vector classification Redis alternatives: Why there are no exact substitutes Connect to Azure Managed Redis with Redis Insight 3.2.0 How to tame the thundering herd problem Redis to Manage Storage Replication | Redis How hierarchical navigable small world (HNSW) algorithms can improve search | Redis How leading financial institutions use Redis to drive growth | Redis What’s new in two: May 2025 | Redis Introducing Model Context Protocol (MCP) for Redis | Redis Redis vs. Elasticsearch: What’s faster for GenAI & vector search? | Redis Build fast, production-worthy AI apps with Spring AI and Redis | Redis Azure Managed Redis is GA today | Redis Redis then & now: Adapting with developers through every era | Redis Supercharge Your AI with OpenShift AI and Redis: Unleash speed and scalability | Redis What’s new in two: April 2025 | Redis Redis 8 is now GA, loaded with new features and more than 30 performance improvements | Redis What is a data strategy? 6 key components explained Data replication explained: types, examples & use cases
Context Compaction for AI Agents: A Complete Guide
Redis · 2026-05-25 · via Redis

Your agent just spent 40 turns debugging a gnarly authentication issue. It found the root cause, mapped out a fix, and started implementing. Then, somewhere around turn 45, it forgot everything it learned and started investigating from scratch.

This is what happens when context windows fill up without a plan. As agents take on longer tasks, it's happening more often. This guide covers what context compaction is, why it matters, how it differs from truncation and retrieval-augmented generation (RAG), and where Redis fits in a broader context architecture.

Why context compaction matters now

Context compaction matters because long agent sessions get expensive, slow, and forgetful fast. Without optimizations like prompt caching, every token in the context window typically gets re-processed and billed on every API call, so as conversation history piles up, each new inference call works through everything that came before. A session that keeps growing costs more per turn the longer it runs.

Cost is only part of the problem. As context length grows, latency also rises and recall can degrade. Bigger windows give agents more room to work, but they don't guarantee the model will keep using earlier information effectively. Context management has become a practical design problem, not just a model-capability problem.

That's why compaction is showing up alongside RAG and bigger context windows instead of competing with them. Each one solves a different problem, and compaction fills the gap the other two leave behind.

What context compaction actually is

Context compaction takes a conversation approaching the context window limit, condenses its contents into a structured, high-fidelity representation, and reinitiates a new context window with that condensed form in place of the raw history. The goal is to let the agent continue with minimal performance degradation.

Think of it as a skilled engineering handoff note. After a two-week sprint, a senior engineer doesn't hand the next engineer a full Slack export. They write a structured document covering decisions made, the reasoning behind them, open issues, and current system state. The raw back-and-forth is gone, but everything needed to continue without losing ground is preserved.

This is different from naive truncation, which mechanically removes tokens at a boundary to stay within the limit. Truncation doesn't care whether it's cutting a critical architectural decision made early in a session or throwaway chatter. It's also different from basic summarization, which rewrites history as prose and may drop specific numbers or exact phrasing whose importance only becomes apparent later.

Good compaction preserves active constraints the agent is still bound by, open decisions not yet acted on, and completed-task state. It discards the exploratory process that led to decisions, redundant tool outputs already reflected in agent state, and verbose intermediate steps.

Redis Iris

Redis Iris serves agent context in milliseconds

Redis Iris connects memory, live data, and retrieval in one place.

How context compaction fits into your context strategy

With the definition in place, the next question is where compaction sits in a broader context strategy. It's one piece of a bigger picture, not a standalone fix. It's the "compress what's already in the window" step: keep the tokens the model still needs, drop the rest, and keep the session moving.

A simple priority order helps here. Start with raw context. Move to reversible compaction (where dropped content still exists elsewhere and can be fetched back) when the window gets tight. Only fall back to lossy summarization (where dropped content is permanently destroyed) when nothing cheaper works. Compaction lives in the middle of that stack: useful when the raw history no longer fits, but before you start throwing signal away.

Reach for compaction when long tasks need continuity across many tool calls, when tool results are blowing up the context, or when agents have to keep working across context resets. Skip lossy compaction when exact wording matters, like legal text or precise API responses, or when the task still fits comfortably in the window.

Common patterns for context compaction

Once you've decided compaction belongs in your stack, the next decision is how to implement it. There's no single "right" way to compact context. Most teams pick from a handful of common patterns based on how long their sessions run, how much tool output they generate, and how much they can afford to lose. Here are the main ones.

Sliding windows

Sliding windows keep the last N conversation turns and drop everything older. Context size stays predictable and bounded, but anything from dropped turns is gone for good. It works best for stateless or short-horizon tasks where early turns genuinely don't matter later.

Token-count thresholds with lossy summarization

This pattern watches token usage and triggers summarization once the context hits a set fraction of the window. It's easy to wire up, but LLM-based summarization gives you limited control over what survives, and the output can vary from run to run.

Tool output offloading (reversible)

Tool output offloading writes large tool results to an external store and leaves a reference pointer plus a short preview in the context. Nothing is destroyed, and the agent can pull the full content back when it needs it. This is a good fit when tool calls return long payloads the agent only occasionally needs in full.

Staged compaction under pressure

Staged compaction graduates through progressively more aggressive strategies instead of jumping straight to lossy summarization. Lighter moves like masking unused fields or pruning stale turns often free up enough space on their own, so lossy summarization becomes a last resort instead of a default.

Reversible vs. lossy: the key distinction

The biggest split across these patterns is whether the compaction is reversible or lossy. Reversible compaction removes information that still exists somewhere else, so the agent can fetch it again with a tool call. Lossy summarization permanently destroys whatever doesn't make it into the summary. Which one you choose shapes what the agent can still do later, so it's worth evaluating against the tasks the agent has to finish.

Context compaction vs. bigger windows & RAG

A common pushback on compaction is that bigger context windows or RAG should make it unnecessary. They don't. Bigger context windows don't eliminate the need for compaction. They increase capacity, but they also raise the risk of higher cost, higher latency, and weaker recall as context grows.

RAG doesn't eliminate the need for compaction either. RAG handles retrieval of external knowledge but does not manage the agent's own evolving working memory across a long-running task. And compaction alone isn't sufficient. In practice, all three strategies often work as complementary layers: RAG retrieves external knowledge, larger windows provide ceiling capacity, and compaction manages accumulated session state within that ceiling.

Redis AI Agent Memory

Build agents that remember, not agents that guess

Redis Iris gives every agent fresh context and long-term memory.

Where context engines come in

Compaction doesn't run in isolation. It usually sits inside a broader system that decides what context goes into the model on every turn. That system is the context engine: the architectural layer responsible for determining what information enters the LLM's context window at each reasoning step, managing selection, compression, retrieval, and routing of information from multiple sources into a coherent, token-efficient input.

At the architectural level, compaction can be combined with external memory so the full history is preserved elsewhere before the window is overwritten. The compaction mechanism itself can be implemented without external memory, but recoverable production compaction often relies on external memory so state can be restored after compression. Context compression manages what the agent sees this session, while external memory manages what the agent stores across sessions.

That external memory layer only helps if the data inside it stays current. Batch-oriented pipelines often can't serve the freshness requirements of agentic systems, and if your compacted summaries and embeddings are built on stale data, the agent acts on outdated context regardless of how well the compaction was performed.

How Redis Iris supports context compaction

All of this is what Redis Iris is built for. Iris is Redis' real-time context engine for AI: a single layer that sits between the agent and the data it needs, feeding the right context, in the right form, at the right time. It bundles five tools (Redis Context Retriever, Redis Agent Memory, Redis Data Integration, Redis LangCache, and Redis Search) into one runtime, so memory, retrieval, freshness, and caching aren't separate vendors glued together. These are managed Redis capabilities, not features you flip on in a self-hosted Redis Open Source build.

For compaction workflows specifically, three pieces of Iris do most of the work.

Redis Agent Memory, currently in preview, uses a two-tier memory model that maps cleanly to compaction. Working memory holds session-scoped events bounded by a configurable time to live (TTL). Long-term memory persists cross-session knowledge as vector embeddings retrieved through semantic search. Active context stays bounded in the session while high-signal information lives separately and gets pulled in when the agent needs it.

Redis LangCache, a fully managed semantic caching service, reduces repeated context injection by recognizing when queries are semantically similar despite different wording. In Redis benchmarks, LangCache reported up to 15x faster responses for cache hits and up to 73% lower LLM inference costs without code changes. For compaction workflows, that means fewer redundant LLM calls when agents hit variations of questions they've already answered.

Redis Data Integration keeps operational state fresh so agents act on current business context instead of stale exports. That matters for compaction because a perfectly compressed summary built on yesterday's data still makes the agent act on yesterday's reality.

The shared idea across Iris: one runtime for vectors, caching, memory, retrieval, and operational data, instead of stitching together a separate system for each layer of the context stack.

Redis Context Retriever

Fresh context, every call

Redis Iris keeps agent data current so answers stay accurate.

Build context compaction into the stack

Context compaction helps agents keep working when sessions get long, tool output grows, and raw history stops being practical. It works best as a deliberate architectural choice, not an emergency move when the window is already full: raw context first, reversible compaction when the window gets tight, lossy summarization only as a last resort.

Bigger windows and RAG help, but they don't replace deliberate context management. Iris gives teams already running Redis for caching or session management a way to layer agent context onto infrastructure they already trust, instead of standing up a separate stack of vector, memory, and caching vendors.

Try Redis Iris for free to start building context-aware agents on a real-time data layer, or book a meeting to talk through how Iris fits into your AI stack.