Prompt caching vs the long LLM conversation: where your input bill actually hides

I kept watching my Claude Code bill climb through long sessions, and most of it was not new work. It was the same conversation getting re-sent every turn. A multi-turn call is stateless, so your client ships the whole history each time: file reads, tool output, old diffs, all of it, and you pay input tokens on that pile again and again.

So I built PromptCrunch. It is a drop-in proxy that optimizes the conversation before it reaches the model. Under the hood: it deduplicates superseded code, compacts stale tool output, summarizes old turns and reuses those summaries across requests, and preserves your recent turns and structured data verbatim, rewriting a request only when it nets out cheaper to send. Setup is two lines: swap your base_url, add one header. Your provider key still goes straight to your provider.

It is not just a Claude Code thing. The same re-sent history piles up in any long multi-turn app: agents, customer chatbots, conversational products. Claude Code is just where I run into it most.

Isn't this prompt caching?

Caching discounts the repeated prefix, and inside a hot window it does most of the work. But it expires after about 5 minutes and only covers the prefix. Real sessions are bursty. You work, you read, you step away, you come back. The cache goes cold in the gaps, and as the session grows the history runs past what caching holds.

That cold-cache window is where PromptCrunch pays for itself. On my own Claude Code runs, input tokens dropped about 75% when caching was not covering the request, and 7 to 10% when it was. The two stack: caching takes the hot window, PromptCrunch takes the long session.

The Impact

On a long session you stop re-paying for history the model already processed, so the bill starts tracking the work instead of the turn count. You only pay on requests that came out smaller, so trying it costs you nothing. Your keys are never stored, and a zero-retention mode means we hold nothing at all.

It earns the most on long, multi-turn work, and not much on short prompts or one-shot calls, so point it where the sessions run long.

Try it

Point it at one real session and watch the per-request savings on your dashboard. You start with $5 of free credit, no card needed. Try it here.

推荐订阅源

DEV Community

Isn't this prompt caching?

The Impact

Try it