Metering a Paid API Without Overselling: the Credit Ledger Behind a Solo SaaS

I charge per invoice extracted. One upload of a five-invoice PDF should cost five credits, a failed extraction should cost nothing, and a user with one credit left must never get two. That last rule is where it got interesting.

This is the billing design behind GSTExtract — a tool that reads Indian GST invoice PDFs into Excel — and the concurrency bug a second-pass audit found in it. Writing it up because "meter a paid API correctly" is one of those problems that looks trivial until you hold it up to the light.

The shape of the problem

Each extraction calls a vision model, which costs me real money. So the user spends a credit per invoice. The catch: I don't know how many invoices are in a PDF until after the model reads it. A single page can hold three invoices; a "10-page" file might be one. I can't charge up front because I don't know the count, and I can't charge after without a window where a user with a zero balance has already burned my API budget.

The answer most billing systems land on is reserve → settle → refund:

Reserve one credit before doing the expensive work. This is the gate — if it fails, the user is out of credits and nothing runs.
Settle after success: charge the actual count minus the one already reserved.
Refund the reserve if the extraction failed, so failures are never billed.

Money and credits are integers. Every change is also written to an append-only ledger with an idempotency key, so a retry can never double-charge. Balance is derived: the sum of remaining credits across non-expired "lots" (a lot is one grant — a signup bonus, a purchase). Consumption is FIFO by soonest expiry, so credits get used before they lapse.

That part worked. The bug was in the word "atomic."

The bug: two uploads, one credit

The original charge function did the obvious thing:

# read the balance
bal = current_balance(db, user_id)
if bal < n:
    return False, 0
# ... then walk the lots and decrement in Python
for lot in lots:
    take = min(lot.remaining, remaining)
    lot.remaining -= take          # <-- writes an ABSOLUTE value

Read the balance, decide, write the new value. Single-threaded, this is fine. Now picture a user with exactly one credit firing two uploads at the same moment (a double-click, or two tabs):

Request A reads remaining = 1. Decides it's allowed.
Request B reads remaining = 1. Also decides it's allowed.
A writes remaining = 0, inserts a -1 ledger row.
B writes remaining = 0, inserts a -1 ledger row.

Both succeed. The user spent one credit and got two extractions. The idempotency key didn't save me — the two requests had different keys, because they're two genuinely different uploads. The read-then-write-absolute is the classic lost-update race.

The fix: let the database do the deciding

The cure is to stop computing the new value in Python and let the write itself be the gate — a conditional UPDATE that only fires if there's still enough left, checked under the row lock:

res = db.execute(
    update(CreditLot)
    .where(CreditLot.id == lot_id, CreditLot.remaining >= take)
    .values(remaining=CreditLot.remaining - take)
)
if res.rowcount == 1:
    taken += take      # we actually got it
# rowcount 0 → someone else took it first; move on or roll back

remaining = remaining - take instead of remaining = <number I computed>. The WHERE remaining >= take re-checks the balance at write time, and SQLite serializes writers, so the second request blocks, re-evaluates against the now-committed remaining = 0, gets rowcount = 0, and is correctly told it can't reserve. Exactly one upload wins. Balance can never go negative.

I proved it with a test that fires N concurrent reservers at a one-credit account on a real file-backed SQLite DB (WAL mode, busy timeout — same as production, because an in-memory shared-connection DB won't reproduce writer contention). Exactly one succeeds, every time.

The second bug: a retry that costs me money

There was a sneakier one, found in the same audit. The workspace retries an upload once on a dropped connection, reusing the same request_id so it doesn't double-charge. And it didn't double-charge — the ledger keys saw to that. But the idempotency cache was written after the model call, not before. So a retry that arrived while the first call was still in flight sailed past the cache check and fired a second vision-model call. No double charge, but I paid for two API calls and logged a duplicate usage event.

Fix: mark the request in-flight before the expensive call, while everything above it is still synchronous (so two requests can't both pass the check), and have a retry return 409 instead of re-running:

cached = _REQUESTS.get(rid)
if cached:
    if cached.get("in_flight"):
        return JSONResponse({"code": "in_progress"}, status_code=409)
    return JSONResponse(cached["response"])
# ... reserve the credit ...
_REQUESTS[rid] = {"in_flight": True}   # before the model call

The lesson that keeps repeating: idempotency has to cover the side effect, not just the database row. The charge was idempotent. The expensive API call wasn't.

Why an append-only ledger earns its keep

The thing that made all of this debuggable: nothing is ever updated or deleted in the transaction log. Every grant, charge, and refund is an immutable row with a signed delta. That gives you a reconciliation invariant you can assert on a schedule:

the sum of all ledger deltas for a user == the sum of remaining across all their lots

If those two numbers ever disagree, a write went wrong, and an alert fires. (I also learned to not fold a cache-freshness check into that alert — a balance cache that goes stale when a credit quietly expires is not corruption, and it had me chasing phantom "drift" emails for a day.)

Takeaways

Don't read-modify-write a balance. Make the write conditional (WHERE remaining >= n) and trust rowcount.
Reserve → settle → refund cleanly separates "can they pay" from "what did it actually cost."
Idempotency must guard the expensive side effect, not only the ledger insert.
An append-only ledger gives you a cheap, always-true invariant to reconcile against.
Write the concurrency test against a real on-disk DB, or it won't reproduce the race you're trying to kill.

The hosted tool has a free daily tier and twenty free credits on signup, with prepaid packs for heavier use — so this ledger runs on every extraction, and I'd rather it be boringly correct. If you want to see it in action, it's at gstextract.com; the extraction engine is open source, though the billing layer above is part of the hosted product.

If you've metered a paid API differently — especially how you handle the count-unknown-until-after problem — I'd genuinely like to hear it.

推荐订阅源

DEV Community