惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

T
The Blog of Author Tim Ferriss
The Last Watchdog
The Last Watchdog
罗磊的独立博客
博客园 - 司徒正美
WordPress大学
WordPress大学
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
The Cloudflare Blog
I
InfoQ
美团技术团队
V
V2EX
Recent Announcements
Recent Announcements
Blog — PlanetScale
Blog — PlanetScale
Microsoft Security Blog
Microsoft Security Blog
腾讯CDC
博客园 - Franky
T
Tailwind CSS Blog
博客园 - 【当耐特】
酷 壳 – CoolShell
酷 壳 – CoolShell
Last Week in AI
Last Week in AI
C
CERT Recently Published Vulnerability Notes
K
Kaspersky official blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
Microsoft Azure Blog
Microsoft Azure Blog
L
LINUX DO - 热门话题
Project Zero
Project Zero
Scott Helme
Scott Helme
T
Threat Research - Cisco Blogs
NISL@THU
NISL@THU
T
Threatpost
N
Netflix TechBlog - Medium
P
Privacy & Cybersecurity Law Blog
博客园_首页
A
Arctic Wolf
P
Proofpoint News Feed
Latest news
Latest news
IT之家
IT之家
博客园 - 聂微东
Spread Privacy
Spread Privacy
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
Google Online Security Blog
Google Online Security Blog
小众软件
小众软件
Webroot Blog
Webroot Blog
O
OpenAI News
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
阮一峰的网络日志
阮一峰的网络日志
Cyberwarzone
Cyberwarzone
M
MIT News - Artificial intelligence
G
GRAHAM CLULEY
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
MongoDB | Blog
MongoDB | Blog

Show HN

暂无文章

GitHub - RimantasZ/contextspy: Context profiler for LLMs and AI agents - used to introspect context contents and reduce token costs
iezhy · 2026-06-16 · via Show HN

Quick start | Motivation | What's new | Install guide | Coding agent setup | FAQ | Supported agents

ContextSpy is a context window profiler for large language models and common agentic AI coding tools. It is used to intercept requests to an LLM API, analyze and visualize prompt composition, and track context changes between multiple requests in the same session. Modern AI coding agents (GitHub Copilot, Claude Code, opencode, etc.) pack a lot into each LLM request: system prompts, tool definitions and results, file contents, conversation history. It's often unclear why a session is slow, expensive, or hitting the context limit. ContextSpy makes the invisible visible - you see a live breakdown of every token category for every request, across sessions, over time.

Example dashboard view
Dashboard view

Think of your favorite CPU or memory profiler, just applied to the contents of the context of an AI agent. While you can optimize performance just by reviewing code, having a profiler to capture and visualise snapshot data helps a lot. Same with LLM context optimisation.

Quick start

Quick setup for macOS (Apple Silicon) — see install guide for Linux, Windows, and PyPI options:

# install latest binary release with Homebrew
brew tap RimantasZ/contextspy
brew install contextspy

# Install CA certificate into system trust store (one-time, cloud mode only)
sudo contextspy install-cert

# Start the proxy (keep this terminal open)
contextspy start

# In a new terminal: launch your coding agent through the proxy
# contextspy run sets required environment variables, so LLM requests are routed through the proxy
contextspy run claude <path to your project>
# contextspy run opencode <path to your project>
# contextspy run code <path to your project>

Open http://127.0.0.1:5173 in your browser for the ContextSpy dashboard.

If something doesn't work, see the troubleshooting section in the install guide.

Alternatively, refer to configure your agent on how to route LLM traffic through the proxy at http://127.0.0.1:8888

Context profiling? Why should I care?

Token costs are rising. With AI agents embracing more and more complex workflows and use cases, token consumption and subsequent cloud API bills are growing larger and larger. This is also applicable for AI coding agents and tools, where providers are gradually switching from subsidized subscription mode and are either reducing token limits or switching to token usage based billing (e.g. GitHub Copilot).

Input tokens are major part. When discussing AI model pricing, most people bring up token generation cost - that's where the numbers look most dramatic ($25 per million tokens for Opus 4.8 output vs $0.40 for gpt-5-nano). But in agentic workloads, input tokens outnumber output by 20-50x, or even more. So most of your API bill is influenced by input context, not the output the model generates.

AI coding agents = lots of input. The expensive part is the quick accumulation of context - with every turn it fills up with additional tokens - system prompt, skills, tool definitions, tool results, file contents, conversation history. You start with 5000 - 10000 tokens in a fresh session, but by turn 25 it might be 30 to 50 thousand, spend some more time and it might be hitting the context window limit and compacting. Every API call to the model sends the full context as part of the prompt - and here is where the token consumption and costs skyrocket quickly.

Why large context is bad

We all have been told that the more information we will give to the model, the more capable it will be. And there are models with 1M token (or even bigger) context windows.

There are three ways you pay for extra (and sometimes unnecessary) information in your context:

  1. API Costs - even with near perfect cache hits, input token costs outweigh output, often by order of magnitude or more.
  2. Compute and latency - larger contexts take considerably longer to process - especially in locally hosted models
  3. Context rot - with larger contexts, LLMs start to lose precision rapidly, with 100k being the limit where rapid degradation starts. So you are paying for more expensive model, but getting performance of cheaper one - or even worse.

ContextSpy makes these costs visible so you can act on them.

How does it work

ContextSpy starts an HTTPS proxy (or reverse proxy for locally hosted models) which intercepts every request to LLMs, analyzes it and stores to local SQLite db. A webserver is also started on localhost, and serves dashboard to visualise all captured data.

Some screenshots

Request view
Request view

Context breakdown
Context breakdown

Session view
Session view

Is it safe to use? Does it send my data to the cloud?

No, it does not send any data to the cloud. All data is stored locally on your machine.

But users must be aware, that it will be running proxy, and capturing all traffic from agent to LLM provider - and storing it locally to be displayed and analysed in the UI. The proxy and dashboard server are bound to localhost, and not exposed to external access, but still could be accessed locally.

The intended use case is to run ContextSpy as a profiler tool on dedicated profiling and optimisation sessions, rather than keeping it permanently as a monitoring tool.

The contents of requests are purged from the database after 7 days, and only statistics are retained.

The contents of database can be cleared manually by running contextspy reset-db. In practice, it is recommended to do it from time to time.

Upgrades and migration

The new version can be installed with homebrew:

## optional - sometimes brew "forgets" custom tap, add it again if just update fails
brew tap RimantasZ/contextspy 
## update homebrew and upgrade contextspy
brew update
brew upgrade contextspy

At this stage, the database schema is subject to change, so it is advisable to purge db before upgrading.

Tech stack

Layer Technology
Backend Python 3.11+, FastAPI + uvicorn, WebSocket for live push
Frontend React + Vite, TanStack Query, Recharts, Tailwind CSS
CLI Typer
Proxy mitmproxy — TLS-terminating forward proxy (cloud) and reverse proxy (local)
Storage SQLite via SQLAlchemy — all data local in ~/.contextspy/
Tokenizer tiktoken (cl100k_base) for token estimation
Packaging uv, Homebrew tap, .deb, standalone binary

Features

  • Two proxy modes — forward proxy for cloud APIs (OpenAI, Anthropic, Copilot), reverse proxy for local LLM servers (Ollama, llama.cpp, vLLM)
  • Context breakdown — input tokens split into 8 categories: system prompt, tool definitions, tool results, file contents, conversation history, current user message, assistant prefill, uncategorised
  • Live dashboard — real-time charts and per-request detail with a visual block map of the context window
  • Session tracking — name and group requests by task to compare usage across runs
  • SQLite storage — all data stored locally in ~/.contextspy/; no data leaves your machine
  • Agent detection — Copilot, Claude Desktop/Code, opencode, Cursor, and generic clients

Documentation links

License

Apache 2.0 — see LICENSE and NOTICE.