惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

GbyAI
GbyAI
Y
Y Combinator Blog
Recent Announcements
Recent Announcements
D
Docker
Blog — PlanetScale
Blog — PlanetScale
罗磊的独立博客
美团技术团队
V
V2EX
Last Week in AI
Last Week in AI
D
DataBreaches.Net
T
The Blog of Author Tim Ferriss
宝玉的分享
宝玉的分享
Microsoft Security Blog
Microsoft Security Blog
Microsoft Azure Blog
Microsoft Azure Blog
人人都是产品经理
人人都是产品经理
M
MIT News - Artificial intelligence
P
Proofpoint News Feed
B
Blog RSS Feed
博客园_首页
B
Blog
博客园 - 叶小钗
I
InfoQ
WordPress大学
WordPress大学
L
LangChain Blog
Apple Machine Learning Research
Apple Machine Learning Research
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
A
About on SuperTechFans
The GitHub Blog
The GitHub Blog
The Register - Security
The Register - Security
MyScale Blog
MyScale Blog
云风的 BLOG
云风的 BLOG
博客园 - 司徒正美
Latest news
Latest news
W
WeLiveSecurity
T
The Exploit Database - CXSecurity.com
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
aimingoo的专栏
aimingoo的专栏
小众软件
小众软件
Cyberwarzone
Cyberwarzone
Scott Helme
Scott Helme
D
Darknet – Hacking Tools, Hacker News & Cyber Security
C
CERT Recently Published Vulnerability Notes
C
CXSECURITY Database RSS Feed - CXSecurity.com
Recent Commits to openclaw:main
Recent Commits to openclaw:main
N
News and Events Feed by Topic
S
Secure Thoughts
The Hacker News
The Hacker News
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
Google DeepMind News
Google DeepMind News

Finisky Garden

The Hivemind of Language Models From RAG to Knowledge Compilation Theoretical Ceiling of Vector Retrieval Unexpected Perks of Talking to AI How Claude Dreams: Background Memory Defragmentation AI and Employment: A 200-Year-Old Debate Three Evolutions of Agent Engineering Foundation Models Plateau, Applications Take Off How OpenClaw Hit 350K Stars in 4 Months Deferred Tool Loading in Claude Code Why Claude Code's Edit Tool Doesn't Mangle Your Files Claude Code's Undercover Mode: When AI Learns to Hide Itself How Forked Sub-Agents Share Prompt Cache for 90% Savings Context Compaction in Claude Code: A Five-Layer Cascade and the Art of Free Summaries How Claude Code Defends Against Bash Injection
Context Management in Claude Code vs OpenClaw
finisky · 2026-04-07 · via Finisky Garden

After OpenClaw crossed 350K stars, a narrative started forming in the community: since both run on Opus 4.6 under the hood, the open-source option should be on par with Claude Code. Anyone who has actually used both probably shares the same observation — in long sessions, OpenClaw starts losing context, forgetting files it already read, redoing work it already did. Claude Code does too, but noticeably later, and it recovers much better.

Same model, different experience. Why?

After reading through both codebases, the gap isn’t in model capability — it’s in how each Agent framework manages the 200K context window. Three core differences:

  1. Claude Code has a four-layer compression cascade where the first three layers are free; OpenClaw has one layer that always calls the LLM
  2. Claude Code continuously maintains session notes in the background and uses them as free summaries during compression; OpenClaw only archives sessions on exit
  3. Claude Code’s sub-agents are role-specialized so search results don’t pollute the main thread’s context; OpenClaw’s sub-agents are a generic framework with no scenario-specific isolation

It boils down to fundamentally different engineering choices on “how to compress,” “whether compression costs money,” and “how to isolate context consumption.”

Compression Layers Define the Experience Ceiling

Claude Code’s context compression is a four-layer cascade. It starts with the cheapest approach and only escalates when the previous layer can’t handle it:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
Cost
 ^
 |           * Full Compact ($$$, LLM summary)
 |
 |       * Session Memory ($0, background notes)
 |
 |    * Cached Microcompact (~$0, cache edit API)
 |
 |  * Time-Based MC ($0, content clearing)
 |
 +------------------------------------------>
                           Compression Depth

OpenClaw has one layer: call the LLM directly. It chunks messages, summarizes each chunk with the model, and if there are multiple chunks, runs another LLM call to merge the partial summaries. Every compression costs at least one LLM call, usually two or three.

Side-by-side comparison of what happens after compression triggers:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
Claude Code                          OpenClaw
-----------                          --------
tokens > 167K?                       tokens > threshold?
  |                                    |
  v                                    v
Try Session Memory ($0)              chunk messages
  |  fail                              |
  v                                    v
Run microcompact ($0)                call LLM per chunk ($$$)
  |                                    |
  v                                    v
Full Compact via fork ($$$)          merge summaries ($$$)
  |                                    |
  v                                    v
3 failures? stop retrying            done

Claude Code’s first two layers don’t call the LLM at all. Layer one is “time-based clearing”: when the user has been away for more than 60 minutes, it replaces old tool results with a placeholder. The logic is straightforward — Anthropic’s server-side cache TTL is one hour. If you’ve been gone that long, the cache is already cold, and the entire prefix needs to be rewritten anyway. Might as well clean out old content while you’re at it.

Layer two is more elegant: it uses Anthropic’s cache editing API to delete old tool results directly from the server-side cache without modifying local messages at all. You save tokens, but the cache prefix stays intact — no cache miss penalty. This is an optimization only possible with deep Anthropic API integration.

OpenClaw supports over 20 providers, so this kind of vendor-specific optimization isn’t feasible. That’s the cost of an architectural choice: the trade-off between generality and depth.

Session Memory: The Foundation for Free Summaries

Claude Code’s most interesting design is Session Memory. It continuously maintains a structured notes file in the background, periodically extracting key information from the current session into markdown. The notes cover the current task, important files, workflows, errors encountered, key conclusions, and more.

When compression is needed, it uses these notes directly as the summary — no LLM call required. It’s like taking notes during a meeting: when the meeting ends, you don’t need to recall everything from memory. Just read your notes.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
Timeline     Claude Code                    OpenClaw
--------     ----------                     --------
Turn 1       [work normally]                [work normally]
Turn 5       bg: extract notes  <--+
Turn 10      bg: update notes      |        (nothing)
Turn 15      bg: update notes      |
  ...           ...                |
Turn 30      context full!         |
             compress:             |
               summary = notes  ---+  $0    compress:
               done!                          call LLM  $$$
                                              done
Turn 31      [keep working]                 [keep working]
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Claude Code: session memory compaction (simplified)
def session_memory_compact(messages, last_summarized_id):
    notes = read_session_memory_file()
    if is_empty_template(notes):
        return None  # notes not ready, fall back to LLM

    messages_to_keep = messages[calculate_keep_index():]
    return CompactionResult(
        summary=notes,          # FREE! no LLM call
        kept=messages_to_keep
    )

OpenClaw also has a mechanism called session-memory, but it does something entirely different: it only triggers when the user runs /new or /reset, saving the entire session to a memory file in one shot. This is session archival, not real-time note maintenance. During an active session, it does no background extraction whatsoever.

The result: Claude Code can complete most compressions without any LLM calls, while OpenClaw pays for every single one. The cheaper compression is, the more aggressively you can compress, and the less likely context will balloon to the breaking point before you act. It’s a positive feedback loop.

The Recovery Gap After Compression

Compression isn’t just about deleting old messages. After compression, the model’s context contains only the summary and a few retained messages. File states, loaded tool instructions, plan contents — all gone. Without recovery, the model’s first action will almost certainly be re-reading files it was just looking at.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
After compaction:

Claude Code                          OpenClaw
-----------                          --------
[summary]                            [summary]
[kept messages]                      [kept messages]
  + re-inject recent files (top 5)     + repair message pairing
  + reload CLAUDE.md & config          (done)
  + restore skill content
  + clear stale caches
  + reset prompt cache baseline

Claude Code performs a detailed state recovery after compression: re-injecting content from the 5 most recently accessed files, reloading config files and skill content, and clearing various internal caches to force reinitialization. The code comments specifically note that skill content is intentionally not cleared, because it needs to survive across multiple compressions.

OpenClaw’s post-compression processing mainly repairs message pairing relationships to keep API calls from erroring. This is necessary, but it only solves the “correct format” problem, not the “model recovers its working context” problem.

Sub-Agent Role Specialization

Claude Code’s sub-agents have clear division of labor:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
+------+  search task  +-------------------+
| Main |-------------->| Explore Agent     |
| Loop |               | - Haiku (fast)    |
|      |               | - read-only       |
|      |<--------------| - returns summary |
|      |  summary only +-------------------+
|      |
|      |  bg extract   +-------------------+
|      |-------------->| Session Memory    |
|      |               | - only Edit notes |
|      |               | - shares cache    |
|      |               +-------------------+
|      |
|      |  compact      +-------------------+
|      |-------------->| Compact Agent     |
|      |               | - NO tools        |
|      |               | - shares cache    |
+------+               +-------------------+

The Explore Agent is deliberately constrained: it can’t write files, can’t spawn more sub-agents, and uses Haiku (the fastest and cheapest model) for external users. All search results stay in the sub-agent’s own context, with only a distilled summary returned to the main thread. This solves a core problem: search processes consume massive amounts of context, and if everything stays in the main thread, a few rounds of searching will fill the window.

OpenClaw has a more feature-complete sub-agent framework: two execution modes, sandbox inheritance, cross-process communication, recursion depth limits, and orphan recovery. But it’s a generic framework without specialization for scenarios like “search without polluting the main context.”

The problem with generic frameworks: when every sub-agent is general-purpose, no sub-agent is specifically optimized. Claude Code built a few surgical instruments for the coding scenario; OpenClaw provides a universal toolkit.

Prompt Cache: The Overlooked Battlefield

Comparing both codebases, what strikes me most is Claude Code’s obsession with prompt cache. Almost every design decision’s comments include consideration of “will this break the cache.”

1
2
3
4
5
6
7
8
9
API request with prompt cache:

Request tokens: [system + tools + history + new turn]
                 |_______________|
                    cached prefix    --> cache hit:  $0.50/MTok
                                     --> cache miss: $5/MTok (10x)

Claude Code: everything is designed to keep the prefix stable
OpenClaw:    no prompt cache management (multi-provider)

For example, the forked agent used for compression inherits the main session’s complete tool set — not because compression needs tools, but because the tool list is part of the cache key, and any mismatch causes a cache miss. Cache editing only runs on the main thread because sub-agent tool IDs would pollute the global state. After compression completes, the cache monitoring module is notified to reset its baseline, preventing compression-induced cache hit rate drops from being flagged as anomalies.

OpenClaw’s context management is about discovering and caching model context window sizes — an entirely different concern from prompt cache management. With 20+ providers to support, deep optimization on any single provider’s caching mechanism isn’t practical.

In a world where you pay per token, prompt cache hit rates directly impact cost. Claude Code’s effective per-call cost may be a fraction of OpenClaw’s, because most input tokens are read from cache. The savings feed back into more frequent background extraction and earlier compression.

Context Engine: Architecture Without Implementation

To be fair, OpenClaw’s context engine framework is well-designed. The interface defines seven lifecycle methods, supports safe transcript rewriting, and the registry distinguishes between core and third-party engines — third parties can’t override the core engine. It’s a complete architecture for pluggable context management.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
OpenClaw Context Engine interface vs reality:

Interface (well designed)       Actual LegacyContextEngine
-------------------------       --------------------------
bootstrap()                     (not implemented)
ingest(message)            -->  return {ingested: false}
assemble(messages, budget) -->  return {messages: messages}
compact(params)            -->  delegate to old path
afterTurn(messages)        -->  no-op
maintain()                      (not implemented)
prepareSubagentSpawn()          (not implemented)

The problem is that the engine currently running is almost entirely hollow: message ingestion is a no-op, context assembly is a pass-through, post-turn processing does nothing, compression delegates to the legacy path. The architectural capability is there, but the implementation is still single-layer LLM summarization. The frame is built, but living in it feels like bare concrete.

The Root Difference

Back to the three differences from the opening: four-layer cascade vs single-layer summarization, real-time notes vs end-of-session archival, specialized sub-agents vs generic framework. These aren’t differences in engineering taste — they’re driven by product positioning.

Claude Code is optimized for “one developer working on one codebase,” free to pursue Anthropic API-specific optimizations, forked agent cache sharing, and fine-grained cleanup strategies by tool type. OpenClaw covers “multiple users, multiple channels, multiple models” — it handles Telegram, Discord, Slack, WhatsApp, voice synthesis, cross-process communication, multi-account rotation, and model degradation. The latter is more complex overall, but the former goes deeper on the specific scenario it targets.

The good news is that OpenClaw’s context engine plugin architecture is already in place — complete interface, working registry. If it could absorb Claude Code’s layered compression approach, even just implementing Session Memory and time-based clearing, long session experience would improve dramatically. The framework is ready. What’s missing is the fill.