codemcp — Meta-MCP "Code-Mode" Gateway
A single MCP server that connects to many upstream MCP servers and exposes
one tool: execute_python. Agents write Python that calls every upstream
tool as a typed function and transform/combine results in-process — instead of
issuing many sequential MCP tool calls.
agent harness ──MCP──► codemcp gateway ──MCP clients──► upstream servers
(stdio/HTTP) (one tool: execute_python) (github, sentry, …)
│ (stdio / streamable-http)
├─ generates a typed Python SDK (1 fn per upstream tool)
├─ runs Python in a worker process
└─ SDK fns call back into the gateway → route to upstream
The agent's LLM sees only execute_python. Its description contains a short
intro plus two lines per upstream tool: the full typed Python signature and a
one-line summary. The SDK itself is preloaded into the Python runtime once — it is
never concatenated into the per-call code string.
Why
A typical agent task ("find the open issue mentioning X, fetch its linked PR, and summarize the diff") becomes three or more round-trips through the model, each re-sending tool schemas and intermediate JSON. With codemcp the agent writes one Python snippet that calls the three tools and returns just the summary:
issue = github_search_issues(query="X", state="open")[0] pr = github_get_pull_request(number=issue["linked_pr"]) result = {"title": pr["title"], "files_changed": len(pr["files"])}
One model turn, one tool call, only the final result returned.
Bench: token usage vs direct MCP
A repeatable experiment in bench/ measures whether the design above
actually saves model tokens. A standalone LangGraph agent answers three
read-only questions over the GitHub user skymoore's data, under two arms that
expose the identical GitHub MCP toolset (~41 tools) but differ only in how
it reaches the model:
| arm | what the model sees |
|---|---|
direct |
all ~41 GitHub tools bound directly (large per-turn tool schema) |
codemcp |
one execute_python tool whose description lists ~41 two-line signatures |
Both arms run the same model (claude-sonnet-4-6 via the OpenCode Zen
Anthropic endpoint, temperature=0), the same upstream GitHub MCP server, and a
fresh MCP client + (for codemcp) fresh gateway subprocess per run. Token counts
are the provider's own usage figures (Anthropic usage block), summed per run.
Each (task, arm) is repeated 3× for variance. Correctness is reviewed manually
against ground_truth.json, computed by calling the GitHub tools directly
(no LLM).
Results (18 runs, all answered correctly)
| task | direct input | codemcp input | Δinput | direct turns | codemcp turns |
|---|---|---|---|---|---|
| A — repo count (1 tool call) | 25,233 | 22,816 | −2,417 (−10%) | 2.0 | 2.0 |
| B — most-starred repo's latest commit (2 calls) | 35,779 | 15,981 | −19,798 (−55%) | 3.0 | 3.0 |
| C — most-issues repo + README check (2 calls) | 399,456 | 41,702 | −357,755 (−90%) | 3.0 | 3.7 |
Headline: on the multi-tool tasks codemcp cut input tokens 55–90%; on the
1-tool baseline it's roughly flat. cache_read was 0 across all runs (Zen
returned no prompt-cache hits on these short sessions, so this is the
no-caching case — a separate, larger-context run would be needed to measure
caching behavior). Full per-run answers, per-turn usage, and the manual-review
section are in bench/results/summary.md after a run.
One toolset (GitHub) and three tasks — a real data point, not an exhaustive benchmark. Re-run it and vary the tasks/toolset yourself.
Run it yourself
From the repo root:
cd bench uv sync # install pinned deps into a local venv uv run python ground_truth.py # compute ground_truth.json (no LLM) uv run python runner.py --smoke # 6 runs: connectivity check uv run python runner.py --reset --repeats 3 # full bench: 18 runs uv run python analyze.py # -> results/summary.{md,csv} cat results/summary.md
Prerequisites:
codemcpon yourPATH(the codemcp arm launches a fresh gateway per run).- A working Docker daemon (the GitHub MCP server runs in a container).
- An OpenCode Zen API key: sign in at https://opencode.ai/auth and run
opencode /connectonce so it's stored in~/.local/share/opencode/auth.json. - A GitHub personal access token in
bench/.envasGITHUB_TOKEN(git-ignored;bench/mcp.github.jsonuses{env:GITHUB_TOKEN}). One is already there if you cloned this setup; otherwise add your own withrepo+read:orgscopes.
See bench/README.md for the full methodology, fairness
notes, and file layout.
Status
Working vertical slice over stdio and Streamable HTTP:
- Connects to upstream MCP servers (stdio + Streamable HTTP), discovers tools.
- Generates a typed Python SDK from each tool's JSON Schema.
- Exposes a single
execute_pythonMCP tool whose description carries the SDK. - Runs user Python in a persistent host CPython worker; SDK calls round-trip back to the gateway over an authenticated WebSocket control channel and are routed to the right upstream server.
Runs untrusted agents safely with CODEMCP_ISOLATION=DOCKER (the worker runs in
a container; see Docker isolation). The Monty in-process
sandbox and optional LLM tool summaries are still planned — see
TODO.
Install
One-line install (prebuilt binary)
curl -fsSL https://raw.githubusercontent.com/skymoore/codemcp/main/install.sh | shThis downloads a prebuilt binary for your OS/arch from
GitHub Releases, verifies its
SHA-256 checksum, and installs it to ~/.local/bin (or /usr/local/bin).
Supported platforms: macOS (arm64, x86_64) and Linux (arm64, x86_64).
Useful overrides:
# pin a version and/or choose the install dir curl -fsSL https://raw.githubusercontent.com/skymoore/codemcp/main/install.sh \ | CODEMCP_VERSION=v0.1.0 CODEMCP_BIN_DIR="$HOME/bin" sh
| Variable | Purpose |
|---|---|
CODEMCP_VERSION |
Release tag to install (default: latest) |
CODEMCP_BIN_DIR |
Install directory |
CODEMCP_REPO |
owner/repo to download from (default skymoore/codemcp) |
opencode launches
codemcpby bare name, so the install dir must be on yourPATH. The installer prints the exact line to add if it isn't.
Build from source
Requires a Rust toolchain.
make install # release build, install onto PATH (/usr/local/bin) make install PREFIX=~/.local # install somewhere else make uninstall # remove it make help # list all targets
Or with cargo directly: cargo install --path ..
Quick start
Set up from an existing harness (opencode)
If you already have MCP servers configured in opencode, let codemcp adopt them:
This backs up ~/.config/opencode/opencode.json, moves its mcp section
verbatim into codemcp's mcp.json, and rewrites opencode to launch a single
codemcp server instead of all the individual ones. Restart opencode afterward.
(codemcp must be on your PATH, since opencode launches it by bare name.) Only
opencode is supported today; more harnesses can be added later.
Or configure manually
-
Write a config at
~/.config/codemcp/mcp.json(XDG; override withCODEMCP_CONFIG). The format is a subset of opencode'smcpobject:{ "mcp": { "everything": { "type": "local", "command": ["npx", "-y", "@modelcontextprotocol/server-everything"] }, "sentry": { "type": "remote", "url": "https://mcp.sentry.dev/mcp", "headers": { "Authorization": "Bearer {env:SENTRY_TOKEN}" } } } }type: "local"→ stdio server launched viacommand(with optionalenvironment,cwd).type: "remote"→ Streamable HTTP server aturl(with optionalheaders).- Any string value supports
{env:VAR}interpolation. "enabled": falseskips an entry."timeout": <seconds>caps how long to wait for that upstream to spawn and finish the MCP handshake (default 30s). An upstream that exceeds it is logged and skipped rather than blocking startup.
-
Run the gateway:
# stdio (default) — for an agent harness that launches codemcp as a subprocess codemcp # Streamable HTTP CODEMCP_TRANSPORT=http CODEMCP_HTTP_BIND=127.0.0.1:3388 codemcp
-
Point your MCP client at it. Inspect the generated SDK and tool description without serving:
Enabling/disabling upstreams at runtime
mcp.json is the boot-time desired state. While the gateway is running you
can connect or disconnect upstreams without restarting it using the admin
subcommands, which talk to the running gateway over its Unix admin socket:
codemcp list # show every configured server + live status codemcp enable github # connect 'github' now (runtime only) codemcp disable github # disconnect 'github' now (runtime only)
NAME TYPE DEFAULT CONNECTED TOOLS
github local yes yes 45
brave local no no 0
DEFAULT= theenabledflag inmcp.json(what connects at boot).CONNECTED= whether it is connected in the running process right now.
By default admin commands change only the live process and do not touch
mcp.json. To also persist the change as the new boot default, pass
--make-default:
codemcp enable brave --make-default # connect now AND set enabled:true in mcp.json codemcp disable github --make-default # disconnect now AND set enabled:false in mcp.json
When an upstream is enabled/disabled, codemcp regenerates the Python SDK,
hot-reloads it into the running worker (no worker restart, no lost state), and
sends a notifications/tools/list_changed to connected MCP clients so they
re-read the updated execute_python description.
Note:
--make-defaultrewritesmcp.json(preserving all values) and may reorder keys alphabetically.
Most flags have short forms: -d (--make-default), -i (--instance),
-p (--port), -H (--host).
Running multiple gateways (one per harness)
You can point several harnesses at codemcp at once — e.g. opencode and LM Studio. Two ways:
A. One gateway per harness (default, stdio). Each harness launches its own
codemcp process. These are fully independent: each gets its own upstream
connections, Python worker, and a per-instance admin socket
(~/.config/codemcp/admin-<config-hash>-<pid>.sock), so they never collide.
Each gateway records which application launched it — from
CODEMCP_INSTANCE_LABEL if set (the setup command writes opencode), else the
auto-detected parent process name (e.g. lmstudio). List them and target one:
codemcp instances # show every running gateway # LAUNCHER PID CONFIG # opencode 40012 /Users/you/.config/codemcp/mcp.json # lmstudio 40988 /Users/you/.config/codemcp/mcp.json codemcp list -i opencode # admin commands target a specific gateway codemcp enable github -i lmstudio # by launcher name, config substring, or PID
When only one gateway is running, -i is unnecessary. When several are running,
admin commands require -i and otherwise print the list to disambiguate.
B. One shared gateway over HTTP. Run a single long-lived gateway on a fixed
port (default 3388) and point both harnesses at the same URL:
codemcp start # listens on 127.0.0.1:3388 codemcp start --port 3388 # explicit; or -p 3388 -H 0.0.0.0 to expose it
start runs the Streamable HTTP transport and fails fast if the port is
already in use. Configure each harness with a remote MCP entry pointing at
http://127.0.0.1:3388/mcp.
Note: a single shared gateway means one Python worker and one SDK behind that
port. Concurrent execute_python calls from different harnesses are correlated
correctly (no crosstalk, isolated stdout/result, independent timeouts) and their
tool round-trips overlap, but they share one interpreter (no CPU parallelism)
and one global SDK state (an admin enable/disable affects all clients).
How it works
- Connect & discover. On startup codemcp connects to every enabled upstream server and lists its tools.
- Generate the SDK. Each tool's JSON Schema becomes a typed Python function
(
server_tool_name(arg: type, ...)). Tool names are sanitized to valid Python identifiers. The generatedsdk.pyis validated as parseable Python. - Expose one tool. The gateway serves a single
execute_pythontool. Its description is the intro + two lines per upstream tool (signature + summary). - Execute. A persistent Python worker process imports
sdk.pyonce. Eachexecute_pythoncall sends the user's code to the worker, which runs it and returns{ result, stdout, stderr }. Assign toresult(or leave a final expression) to return a value. - Route SDK calls. When user code calls an SDK function, the worker sends a
call_toolrequest back to the gateway over the WebSocket control channel; the gateway forwards it to the right upstream MCP server and returns the result.
Control channel
The gateway runs a WebSocket server (loopback by default). The worker connects as
a client and, as its first message, sends a shared auth token
(CODEMCP_CONTROL_TOKEN, auto-generated per run if unset). JSON-RPC 2.0 messages
then flow both ways on the one connection:
- gateway → worker:
run { code } - worker → gateway:
call_tool { server, tool, args }
One protocol covers host loopback, future Docker workers (Linux + macOS), and future remote workers, and is natively bidirectional with built-in message framing.
Self-provisioning worker
bootstrap.py provisions its own websockets dependency (into a cache dir via
pip install --target) if it is missing, so the worker runs on any stock Python
host or container without a custom image. Controlled by CODEMCP_WS_*.
Configuration
All settings are read once at startup from CODEMCP_* environment variables.
Core
| Variable | Default | Description |
|---|---|---|
CODEMCP_CONFIG |
~/.config/codemcp/mcp.json |
Path to the upstream mcp.json. |
CODEMCP_ISOLATION |
HOST_SYSTEM |
Execution isolation: HOST_SYSTEM (default) or DOCKER (containerized). MONTY is not implemented yet. |
CODEMCP_TRANSPORT |
stdio |
Downstream MCP transport: stdio or http. |
CODEMCP_ADMIN_SOCKET |
(per-instance) | Override the admin socket path. By default each gateway uses ~/.config/codemcp/admin-<config-hash>-<pid>.sock so multiple instances don't collide; set this to pin an explicit path. |
CODEMCP_INSTANCE_LABEL |
(auto) | Friendly name for this gateway in codemcp instances/list (e.g. opencode). Falls back to the auto-detected parent process name. |
CODEMCP_LOG |
info |
Tracing filter (e.g. info, debug, codemcp=debug). |
CODEMCP_PYTHON |
(auto) | Path to the Python interpreter (defaults to python3/python on PATH). |
Streamable HTTP transport
The HTTP server binds a fixed, reliable port (it does not fall back to a
random port). If the port is already taken — e.g. another codemcp instance — the
gateway fails to start with a clear error instead of silently moving. Use
codemcp start -p <port> for the common case, or set CODEMCP_TRANSPORT=http
plus the variables below.
| Variable | Default | Description |
|---|---|---|
CODEMCP_HTTP_BIND |
127.0.0.1:3388 |
Address to bind the HTTP server. (codemcp start --port/--host overrides this.) |
CODEMCP_HTTP_PATH |
/mcp |
URL path the MCP endpoint is mounted at. |
CODEMCP_HTTP_JSON_RESPONSE |
false |
true = stateless plain application/json replies; false = stateful SSE with session IDs. |
Control channel
| Variable | Default | Description |
|---|---|---|
CODEMCP_CONTROL_BIND |
127.0.0.1:0 |
Address for the WebSocket control server (:0 = ephemeral port). |
CODEMCP_CONTROL_HOST_FOR_WORKER |
(auto) | Host the worker uses to reach the control server. Auto-derived per isolation mode (loopback for HOST; bridge gateway or host.docker.internal for DOCKER); set to override for unusual topologies. |
CODEMCP_CONTROL_TOKEN |
(random per run) | Shared secret the worker must send as its first WS frame. |
Worker dependency provisioning
| Variable | Default | Description |
|---|---|---|
CODEMCP_WS_AUTO_INSTALL |
true |
Self-install websockets into a cache dir if missing. |
CODEMCP_WS_VERSION |
(unset) | Pin the websockets version. |
CODEMCP_WS_PIP_ARGS |
(empty) | Extra args passed to pip install (whitespace-split). |
Execution limits
| Variable | Default | Description |
|---|---|---|
CODEMCP_EXEC_TIMEOUT_MS |
30000 |
Per-run execution timeout in milliseconds. |
CODEMCP_MAX_OUTPUT_BYTES |
1048576 |
Max captured stdout/stderr bytes. |
Docker isolation
Set CODEMCP_ISOLATION=DOCKER to run the Python worker inside an isolated
container instead of the host process. The worker uses the same bootstrap.py
and the same WebSocket control protocol; only the spawn mechanism and the
control-channel networking differ. Requires a running Docker (or Podman) daemon
and a binary built with the docker feature (on by default).
Cold start installs websockets fresh in the container and (on first use) pulls
the image, so the first execute_python call is slower than on the host.
| Variable | Default | Description |
|---|---|---|
CODEMCP_DOCKER_IMAGE |
python:3.14-slim |
Image the worker runs in. Any stock python image with pip works; pulled automatically if missing. |
CODEMCP_DOCKER_NETWORK |
codemcp-net |
User-defined bridge network the worker attaches to. The control channel binds to this network's gateway only (see security notes). |
CODEMCP_DOCKER_MEMORY |
0 |
Hard memory limit in bytes (0 = unlimited). |
CODEMCP_DOCKER_CPUS |
0 |
CPU limit in cores, e.g. 1.5 (0 = unlimited). |
CODEMCP_DOCKER_PIDS_LIMIT |
0 |
Max processes in the container (0 = unlimited). |
CODEMCP_DOCKER_READONLY |
false |
Mount the container root filesystem read-only. |
The container is always created with hardening defaults: --rm (auto-remove),
all Linux capabilities dropped, no-new-privileges, attached only to the
dedicated bridge network, and the worker files bind-mounted read-only.
The previous
CODEMCP_DOCKER_EXTRA_ARGSknob is gone: codemcp talks to the Docker API directly (nodocker runCLI), so limits are set with the typed variables above.
On macOS / Windows (Docker Desktop): the worker workdir is materialized under
~/.cache/codemcp/work/<pid> (inside $HOME, which Docker Desktop shares by
default) rather than $TMPDIR. If you point CODEMCP_DOCKER_IMAGE at a custom
setup that needs other host paths, add them to Docker Desktop's file sharing.
Monty isolation (planned — see TODO)
| Variable | Default | Description |
|---|---|---|
CODEMCP_MONTY_MEM_LIMIT |
268435456 |
Memory limit (bytes) for the Monty sandbox. |
LLM tool summaries (planned — see TODO)
| Variable | Default | Description |
|---|---|---|
CODEMCP_ENABLE_LLM_SUMMARIES |
false |
Condense upstream tool descriptions via one cached LLM call per tool. |
CODEMCP_SUMMARY_MODEL |
(unset) | Model to use for summaries. |
CODEMCP_SUMMARY_API_BASE |
(unset) | API base URL for the summary model. |
CODEMCP_SUMMARY_API_KEY |
(unset) | API key for the summary model. |
CODEMCP_SUMMARY_CACHE |
~/.cache/codemcp/summaries.json |
Summary cache file. |
Isolation modes & security boundaries
The execute_python tool runs arbitrary code and can call any connected
upstream MCP server. Choose isolation based on how much you trust the agent.
| Mode | Status | Isolation | Use when |
|---|---|---|---|
HOST_SYSTEM |
implemented | None — full host access with the gateway's privileges, full stdlib + installed packages. | Development / trusted agents only. |
DOCKER |
implemented | OS-level container; only the authenticated WebSocket control channel bridges in, and that channel is never exposed to the LAN. | Untrusted agents (recommended). |
MONTY |
planned | Strict in-process sandbox: no filesystem/network/env except the SDK callbacks the gateway grants. Limited Python subset (no classes, no third-party libs, partial stdlib). | Maximum safety; simple transform code. |
HOST_SYSTEMhas no sandbox. It executes with the gateway's privileges. Run it only with agents and tools you trust.- Control channel auth. Because the control channel both executes arbitrary
code and routes to authenticated upstreams, it is gated by a per-run shared
token (
CODEMCP_CONTROL_TOKEN, sent as the first WS frame). It binds loopback by default. Never expose the control port publicly without TLS and a strong token. - DOCKER channel is not LAN-exposed. The control channel is effectively
god-mode over every connected upstream, so codemcp never binds it to
0.0.0.0. On native Linux it binds the dedicated bridge network's gateway IP (a host-internal interface that is not routed to your physical network). On Docker Desktop (macOS/Windows), where the host can't bind the bridge IP, it binds loopback and lets the container reach it viahost.docker.internal(which Docker Desktop forwards to host loopback). Either way, only the worker container — not other machines on the same Wi‑Fi — can reach the port. - HTTP transport. The Streamable HTTP server validates the
Hostheader against a loopback allow-list by default to prevent DNS-rebinding attacks. Set appropriate hosts/origins before any non-loopback deployment, and front it with TLS + authentication.
TODO / planned work
These phases are designed but not yet implemented. Configuration knobs already exist (see tables above) but are inert until the backends land.
Phase 8 — Monty isolation (exec/monty.rs, feature-gated)
An in-process, safe-by-construction sandbox using
Monty (pinned to =0.0.18). SDK calls are
exposed as Monty external_functions rather than over the WebSocket. Opt-in via
CODEMCP_ISOLATION=MONTY and the monty cargo feature (off by default; the crate
is pulled from git). Monty is a limited Python subset (no classes, no third-party
libraries, partial stdlib), so it suits simple transform code, not arbitrary
scripts. Memory bounded by CODEMCP_MONTY_MEM_LIMIT.
Phase 9 — LLM tool summaries + cache
Optionally condense verbose upstream tool descriptions into a single tight summary
line per tool via one cached LLM call (CODEMCP_ENABLE_LLM_SUMMARIES,
CODEMCP_SUMMARY_*). Default behavior stays fully offline, using each tool's own
description.
Development
cargo build cargo test # Inspect generated SDK + tool description for a given config CODEMCP_CONFIG=/path/to/mcp.json CODEMCP_DUMP=1 cargo run # One-shot smoke test: run a Python snippet against the worker and exit CODEMCP_CONFIG=/path/to/mcp.json \ CODEMCP_SMOKE='print(everything_get_sum(a=2, b=40))' cargo run
Requires a Python 3 interpreter on PATH (3.14 tested) and, for stdio upstreams,
whatever launcher their command needs (e.g. npx, uvx).
License
MIT
























