If you've built an MCP server or any API that costs you money to run (an LLM call, a paid data source, compute), you've probably hit the same wall I did:
How do you get paid per call — when the caller is an AI agent, not a human with a credit card form?
A human can sign up, enter a card, get an API key. An autonomous agent can't fill out a Stripe checkout form mid-task. And you don't want to hand an agent a raw API key with no spending limit — one runaway loop and your bill explodes.
This post walks through the design I landed on. It's not the only way, but the pieces are reusable even if you build your own.
The core idea: HTTP 402
402 Payment Required has been a reserved HTTP status code since the beginning, basically unused. It turns out to be exactly the primitive we need.
The flow:
An agent calls your endpoint with no payment.
You respond 402 with a small JSON body describing how to pay (price, where to top up, what token format you accept).
The agent (or its owner) tops up once, getting a budget-capped token.
The agent retries with the token in the Authorization header. Now it works — and keeps working until the budget runs out, then it gets 402 again.
HTTP/1.1 402 Payment Required
Content-Type: application/json
{
"accepts": [
{
"scheme": "lemoncake-pay-token",
"price": "0.01",
"currency": "USD",
"mintUrl": "https://.../buy/",
"gatewayUrl": "https://.../g/"
}
]
}
This is the shape the x402 spec standardizes. You don't strictly need the spec to do it — but following it means agent frameworks that already understand 402 can pay you without custom glue.
The budget cap is the important part
The naive version — "give the agent an API key" — is dangerous because there's no ceiling. The whole point of an agent paying autonomously is that you stop watching it. So the token has to carry its own limits:
{
"budget": 5.00,
"spent": 0.06,
"max_calls": 50,
"calls_used": 6,
"expires_at": "2026-07-01T00:00:00Z"
}
The gateway checks these on every call before forwarding upstream. Budget exhausted → 402. Rate limit hit → 429. Expired → 402. The agent can never spend more than the token allows, even if it goes haywire.
I encode the token as a signed JWT (HS256) so the gateway can verify it without a DB round-trip on the hot path, then check the live spend counter in Postgres. The JWT carries the token id, endpoint id, and owner id; the mutable budget lives in the DB.
The gateway pattern
The key architectural move: a proxy in front of the real endpoint. The agent never calls your upstream directly. It calls a gateway URL like /g/. The gateway:
Verifies the pay token.
Checks budget / rate limit / expiry.
Forwards the request to your real upstream (with your upstream auth attached server-side, so the agent never sees your real keys).
Records the call + cost in a ledger.
Returns the upstream response.
agent ──► /g/ (gateway)
├─ verify token
├─ check budget
├─ forward ──► your real API (with your secret key)
├─ record usage + cost
└─ return response
This decouples two things that are usually tangled: who can call (the agent's pay token) and how you authenticate upstream (your secret, never exposed). It also means you can put a per-endpoint price on any existing API without touching its code.
Settling the money
Minting a budget-capped token means someone paid up front. I use Stripe Checkout as a Direct Charge on the provider's connected account (Stripe Connect), so the money lands in the provider's balance and the platform takes a small application fee — once, at payment time, not per call. The per-call cost is just a ledger figure that draws down the prepaid budget.
This matters because charging a fee on every tiny call would get eaten by Stripe's per-transaction minimums. Prepaid bundle + ledger drawdown sidesteps that entirely.
What I'd tell you before you build your own
A few things that bit me:
Don't put the fee on each call. Stripe's minimum charge makes sub-cent per-call billing impossible. Prepay a budget, draw it down in your own ledger.
The token must be re-displayable. Agents lose context. The buyer needs a way to recover the same token (I key the success page off the Stripe session id, which is single-use and unguessable).
Scope the token to one endpoint. A token minted for endpoint A should be rejected at endpoint B. Otherwise a leaked token is a blank check.
Forward upstream auth server-side only. The agent should never be able to read your real upstream key. The gateway attaches it after auth.
The result
I packaged this up as a project called LemonCake — you wrap any API/MCP endpoint, set a price per call, and get a gateway URL an agent can pay through autonomously. There's a live demo (no signup) if you want to see the 402 → top-up → call loop run end to end: https://www.lemoncake.xyz/demo
But honestly, even if you never touch it, the pattern stands on its own:
402 to advertise the price → budget-capped token → gateway that verifies, forwards, and meters.
I'm still genuinely unsure whether autonomous per-call payment is something agent builders need today or whether I'm a year or two early. If you've hit the "how do I charge an agent" problem from the other side, I'd love to hear how you solved it.























