AI coding tools are getting very useful, but I kept running into one problem:
Expensive frontier models are often used for everything, including small file-scoped implementation patches.
That feels wasteful.
For many coding tasks, I want the strong model to stay in charge of planning and judgment, but I do not necessarily need it to write every narrow diff.
So I built TokenPatch.
What it does
TokenPatch lets you keep using your current AI coding tool, such as Codex, Claude Code, Cursor, or MCP-capable coding agents.
The strong model still decides what should change.
TokenPatch then routes bounded implementation work to a cheaper executor, checks the patch locally, and reports what the useful change actually cost.
The core metric is:
cost per applied patch
Not just request cost.
Example
A task might look like this:
text
tp: change the page title. Only modify index.html.
A report can show:
Task: change page title, only modify index.html
All-strong estimate: $0.42
TokenPatch actual: $0.08
Saved: 81%
Patch applied: yes
Tests: passed
Why I built it
Most LLM cost tools focus on API requests.
But when coding with agents, I care more about task-level economics:
Did the patch actually apply?
Did it stay inside allowed files?
Did it pass validation?
How much did the accepted change cost?
Would this have been more expensive if everything used the strong model?
That is the layer I wanted to explore.
Current status
TokenPatch is open source and BYOK-first.
You bring your own executor API key, currently DeepSeek-compatible, and TokenPatch runs locally.
Install from GitHub:
pip install git+https://github.com/Leoyen1/tokenpatch.git
tokenpatch bootstrap
Then use it from your coding app with:
tp: implement a small change. Only modify <file>.
What I am looking for
This is still early.
I am looking for feedback from developers who use AI coding tools regularly:
Is “cost per applied patch” a useful metric?
Is the setup too hard?
Would you trust a cheaper executor if file boundaries are enforced?
What coding-agent workflows should this support next?
If you try it, I would really appreciate feedback or issues on GitHub.
























