Context

2026-03-01 10:35

This is my very personal, very meta take on AI tooling — not a how-to, just a lens: treat an LLM like an improv comedian. The model is the performer; the context is whatever the performer has to work with — the audience suggestions, the rules of the game, and everything said so far. Change that input, and you change the performance. It’s a simple framing, but it matters, because most of the “new” tooling we talk about later is really just different ways of shaping context.

If we go back a bit — when ChatGPT first went mainstream — prompt engineering became a buzzword. The idea was that the right wording could coax noticeably better output. That hype has cooled, especially as a standalone job title, but not because the practice vanished. My take is that, back then, the comedian simply wasn’t that strong yet, so you had to feed it very specific suggestions to get a good bit. Today, the comedian is better, and the craft has shifted: less about finding magic phrases, more about shaping context and building workflows. In that sense, the kaleidoscopic tools we have today are still extending prompt engineering — they’re just turning it into scaffolding, templates, and context choreography.

Fast forward to today, we live in an age of abundance in AI models. There are a lot of capable performers, and within the top tier the difference is often subtle — and highly task-dependent. So the leverage moves to the room you put them in: the better you can shape the context, the better outcomes you tend to get. Give a decent comedian great suggestions and a clear setup, and they can outperform a better comedian stuck with a confused, noisy room. Context really matters.

How do we control context? Tools can help, but most of them are generic — they don’t really know what you consider signal versus noise, so they can’t safely curate the room for you. For example, Claude Code can compact the conversation to make more room for what comes next. But if you toss in an unrelated question halfway through, or dump a giant chunk of raw logs into the chat, the performer is now doing improv in a noisy room: it becomes harder to track what matters, what’s incidental, and what should be ignored.

At this point, you might think we’re powerless. We’re not. A whole set of inventions exists for one purpose: keep the context window aligned with the problem we’re actually trying to solve.

Let’s start with MCP. To me, it’s a great invention not because it’s yet another integration, but because it changes how much integration knowledge you have to carry in the context window. MCP gives the model a small, discoverable interface to external systems — tools, resources, and even reusable prompts — via clear schemas and structured inputs/outputs. In practice, an MCP server often fronts one or more APIs, but it doesn’t have to expose the entire surface area. It can present a curated menu of capabilities with sane defaults. The LLM sees a menu of dishes, not a book full of recipes: instead of pasting curl/auth/swagger docs into the chat, you give it a handful of typed tool calls. That keeps the context lean and the model focused on the task.

Next, I’d like to talk about subagents. In my opinion, this is another great invention: instead of stuffing everything into one ever-growing chat, you delegate a narrowly scoped task to a separate agent running in its own context window (often with its own prompt and tool permissions), then bring the result back to the main thread. The goal isn’t “zero context” so much as “clean context”: keep the orchestrator’s context focused on the plan and the decisions, not the churn. Debugging is a classic example. Logs are high-volume and usually low-signal, and they can quickly drown out what you actually care about. A subagent can wade through the noise in isolation and can be instructed/designed to return a short, decision-ready summary so the rest of the workflow doesn’t get contaminated.

I don’t see skills as a breakthrough. To me, it’s prompt engineering with better ergonomics: modular, shareable, and loaded on demand. That’s a real UX win. But context-wise, it’s an incremental improvement, not a new paradigm. You’re not eliminating instruction tokens — you’re moving them out of the chat transcript and pulling them in when relevant. Once loaded, those tokens still compete with conversation history and everything else. If anything, a large catalog of Skills can create a new kind of noise: routing and discoverability overhead.

While I know Geoffrey Huntley personally, Ralph Loop still doesn’t feel like a new kind of context management to me. It’s a solid reliability pattern: keep the long-term state outside the conversation, restart with a clean context each round, do one small unit of work, validate it, then loop. That can be very effective precisely because it avoids context rot. But it’s not the “art” I’m seeking. It’s less about shaping the room and more about clearing the room over and over — paying iteration cost for robustness.

Gas Town adds real value in coordination and persistence — it stores work state outside the chat in git-backed hooks/ledgers — but at the end of the day it still inherits the strengths and limits of the underlying agent runtime (e.g., Claude Code or Codex). So if you only care about token-level context shaping, it may feel like ‘nothing new’; if you care about multi-agent reliability, it’s a different story.

So if we zoom out, a lot of AI tooling innovation is really context management innovation — not making the comedian magically smarter, but managing what the comedian gets to hear, when, and at what bandwidth. Different tools optimize different bottlenecks — token budget, noise isolation, state persistence, and coordination. The “art”, for me, is knowing which bottleneck you’re hitting and picking the smallest intervention that keeps the room coherent.

推荐订阅源

年华转瞬