惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

L
LangChain Blog
Security Latest
Security Latest
P
Proofpoint News Feed
GbyAI
GbyAI
PCI Perspectives
PCI Perspectives
博客园 - Franky
N
Netflix TechBlog - Medium
博客园_首页
WordPress大学
WordPress大学
K
Kaspersky official blog
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
Vercel News
Vercel News
T
Threatpost
The Hacker News
The Hacker News
H
Help Net Security
S
Securelist
Recent Announcements
Recent Announcements
腾讯CDC
T
Tailwind CSS Blog
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
Engineering at Meta
Engineering at Meta
C
Cisco Blogs
V
V2EX
C
Check Point Blog
S
Schneier on Security
Cyberwarzone
Cyberwarzone
C
Cybersecurity and Infrastructure Security Agency CISA
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
B
Blog RSS Feed
H
Hackread – Cybersecurity News, Data Breaches, AI and More
Jina AI
Jina AI
M
MIT News - Artificial intelligence
T
Threat Research - Cisco Blogs
博客园 - 叶小钗
A
Arctic Wolf
AWS News Blog
AWS News Blog
Latest news
Latest news
Martin Fowler
Martin Fowler
Recorded Future
Recorded Future
Last Week in AI
Last Week in AI
The GitHub Blog
The GitHub Blog
小众软件
小众软件
B
Blog
aimingoo的专栏
aimingoo的专栏
C
Cyber Attacks, Cyber Crime and Cyber Security
V
Visual Studio Blog
P
Palo Alto Networks Blog
Spread Privacy
Spread Privacy

DEV Community

Authentication Security Deep Dive: From Brute Force to Salted Hashing (With Java Examples) Why AI Systems Don’t Fail — They Drift Spilling beans for how i learn for exam😁"Reinforcement Learning Cheat Sheet" I Replaced Chrome with Safari for AI Browser Automation. Here's What Broke (and What Finally Worked) How Python Borrows Other People's Work The $40 Architecture: Processing 1 Billion API Requests with 99.99% Uptime Vibe Coding: A Workflow Guide (From Zero to SaaS) Most webhook security guides protect the wrong side. The scary part is delivery. Headless CMS for TanStack Start: Build a Blog with Cosmic EU Age Verification App "Hacked in 2 Minutes" — What Actually Happened Comfy Cloud’s delete function does not actually remove files Running AI Models on GPU Cloud Servers: A Beginner Guide Event-driven media intelligence with AWS Step Functions and Bedrock I scored 500 AI prompts across 8 quality dimensions — here's what broke How to Call Google Gemini API from Next.js (Free Tier, No Backend Needed) The Portal Protocol: Reclaiming Human Connection in the Age of AI How to Fix Your Team's Scattered Knowledge Problem With a Self-Hosted Forum Intro to tc Cloud Functors: A Graph-First Mental Model for the Modern Cloud Designing Multi-Tenant Backends With Both Ownership and Team Access I Built a Neumorphic CSS Library with 77+ Components — Here's What I Learned PostgreSQL Performance Optimization: Why Connection Pooling Is Critical at Scale Cómo construí un SaaS multi-rubro para gestionar expensas en Argentina con FastAPI + Vue 3 🚀 I Built an Ethical Hacking Scanner Tool – Open Source Project I Replaced /usage and /context in Claude Code With a Single Statusline A Pythonic Way to Handle Emails (IMAP/SMTP) with Auto-Discovery and AI-Ready Design I Collected 8.9 Million Polymarket Price Points — Here's What I Found About How Markets Really Move EcoTrack AI — Carbon Footprint Tracker & Dashboard Everyone's Using AI. No One Agrees How. 5 self-hosted ebook managers worth trying in 2026 Building Your First AI Agent with LangChain: From Chatbot to Autonomous Assistant Common SOC 2 Failures (Real World) Stop Vibe-Checking Your AI App: A Practical Guide to Evals How to Use SonarQube and SonarScanner Locally to Level Up Your Code Quality Your Next To-Do App Is Dead — I Replaced Mine with an OpenClaw AI Sign a Nostr event in 60 lines of Python using coincurve — no nostr-sdk, no nbxplorer, no rust toolchain ITGC Audit Explained Like You’re in Big 4 Patch Tuesday abril 2026: Microsoft parcha 163 vulnerabilidades y un zero-day en SharePoint Stop scraping everything: a better way to track competitor price changes Listing on MCPize + the Official MCP Registry while routing payments OUTSIDE the marketplace — how I kept 100% of my x402 revenue Building an AI-Powered Risk Intelligence System Using Serverless Architecture Why We Ripped Function Overloading Out of Our AI Toolchain Testing AI-Generated Code: How to Actually Know If It Works SaaS Churn Is Killing Your Business. Here Is What to Do About It (Without a Support Team) The Speed of AI Is No Longer Linear - And Self-Improving Models Are Why How to Implement RBAC for MCP Tools: A Practical Guide for Engineering Teams From Standard Quote to Persuasive Proposal: AI Automation for Arborists I built a CLI that scaffolds complete multi-tenant SaaS apps Axios CVE-2025–62718: The Silent SSRF Bug That Could Be Hiding in Your Node.js App Right Now The dashboard that ended our friendship Data Pipelines Explained Simply (and How to Build Them with Python) The Hidden Cost of AI Systems Nobody Talks About. undefined vs undeclared, and how typeof behaves Switching from file-based jobs to NATS/Kafka in Rust without changing code io_uring Adventures: Rust Servers That Love Syscalls Why Agentic AI is Killing the Traditional Database The POUR principles of web accessibility for developers and designers Quantum Neural Network 3D — A Deep Dive into Interactive WebGL Visualization How To Install Caveman In Codex On macOS And Windows Automation Pipeline Reliability: Why Your Workflow Breaks When Nobody Is Watching I Built an 'Open World' AI Coding Agent — It Works From ANY Folder From Freelancing to Product: A Tech Service Company's SaaS Transformation China's AI Giants: Adding Tencent Hunyuan & ByteDance Doubao to AI University (74 Providers) On the Vibe Coders and Their Lies clerk: Auto-Summarize Your Claude Code Sessions AI Weekly — 2026/04/10–04/17 | The Model Lockdown Is Here, but the Toolchain Is the Real Battleground AI 週報 — 2026/04/10–2026/04/17 模型封鎖潮來了,但工具鏈才是真戰場 Maybe this is how Open-Source apps are born... 🚀 Fine-Tune LLMs with LoRA and QLoRA: 2026 Guide tRPC v11 + Next.js App Router: End-to-End Type Safety Without the Boilerplate ShadCN UI in 2026: Why I Stopped Installing Component Libraries and Started Owning My Components SaaS Billing in React Server Components: Stripe + Supabase Without a Single `useEffect` Join our DEV Weekend Challenge — $1,000 in Prizes Across TEN winners! Submissions Due April 20 at 6:59 AM UTC. Implementing FSRS Spaced Repetition in Flutter + Supabase — Adding Memory Science to an AI Learning App "I Texted My Localhost From the Train — Claude Code Fixed the Bug Before I Got Home" I Built a Sales Prep AI and It Went Deeper Than Expected Design to Code #2: One JSON, Eleven Outputs Solving the 100M-Row Problem: A Summary Table Pattern for High-Volume Push Notification Logs Flutter Web With Wasm: What Actually Changes For Developers I Built 50 Royalty-Free Soundtracks for My Side Project in a Weekend Using AI Music Generation The Vibe Coding Security Checklist: 7 Things to Check Before You Ship Stop Letting Googlebot Guess Fix Your React App's SEO Right Desconstruindo o Streaming do LinkedIn: Como Criar um Engine de Extração de Vídeo de Alta Performance com HLS e FFmpeg (EDA Part-1) EDA (Exploratory Data Analysis) Explained With Real Life — Why Looking at Your Data Is the Most Important Step in Machine Learning Brand Relationship Management at Scale: Our 4-Touch Outreach System for 200+ Brands Why String.fromEnvironment() Might Return an Empty String in Dart JGuardrails 1.0.0 — Hardening Java LLM Apps Against Jailbreaks, Toxicity, and Prompt Injection Plan and Schedule a Full Week of Threads Content From One Claude Conversation Coding Cat Oran Ep3, Five Tables Changed Everything BFF模式详解:构建前后端协同的中间层 I'm done watching freelancers get buried by 200 proposals. So I'm building the alternative. This is my first post BFS Algorithm in Java Step by Step Tutorial with Examples Tracking LLM Pricing Monthly: An Open Dataset for 22 AI Models How We Measure Content ROI on a Comparison Site: Revenue Attribution Without Perfect Data Introducing Nova AI Ops: The AI-Native Operating System for SRE Teams I built a free desktop video downloader for Windows — Grabbit How Talkie OCR Helps Vision-Impaired & Dyslexic Users Read the World Around Them VRCFaceTracking安装和iPhone面捕配置教程,有bug Even CrowdStrike Can't See Your Agents The Automation Gold Rush: What n8n Workflows and Claude Are Opening Up for Developers Right Now
The race condition a stress test found in my double-entry ledger — and how I fixed it
Do Pham Dinh · 2026-05-24 · via DEV Community

I'm building ledger-service, a double-entry e-wallet ledger in Java 21 / Spring Boot 3.5 / PostgreSQL. It's live on Render. Early on I wrote a stress test that fires 50 transfers at the same account at once and asserts the books are never corrupted. It went red — and the way it went red is the most useful thing I've learned building this.

This post walks the whole chain: why a money ledger keeps a balance cache at all, the read-modify-write race that cache invites, how I detected it, the fix (optimistic locking + bounded retry), the benchmark that justified choosing optimistic over pessimistic locking, and how idempotency has to compose with retry so a network hiccup never double-spends.

The setup: a ledger, and why it has a cache

The source of truth is a double-entry, append-only table (ADR-0005). Every money operation writes at least one balanced DEBIT/CREDIT pair where Σ DEBIT == Σ CREDIT, and ledger_entries is insert-only — no UPDATE, no DELETE. A mistake is fixed by posting a correcting entry, never by editing history. This is the model Stripe, Modern Treasury, and Formance all use, and it's what gives you an audit trail you can trust.

But "what is account X's balance?" should not be a SUM over every entry that account has ever had. So I keep a cache: accounts.balance is a materialized Σ of that account's entries, updated in the same transaction as the entries themselves (ADR-0006). The entries are the truth; the balance is a derived read cache that stays O(1).

That cache is exactly where concurrency bites.

The race

Two requests debit the same account at the same time:

R1: read balance $500 (enough)      R2: read balance $500 (enough)
R1: commit −$300 → $200             R2: commit −$400 → −$200   ← overdraft / lost update

Enter fullscreen mode Exit fullscreen mode

Both read $500, both decide they have enough, both write back their own idea of the new balance. One write silently clobbers the other: a lost update, and a balance that no longer matches the ledger entries underneath it.

The trap is assuming the database stops this for you. It does not. PostgreSQL's default isolation level, READ COMMITTED, only guarantees you don't read uncommitted data — it does nothing about two transactions that each read-then-write the same row concurrently. A read-modify-write race sails right through it.

Detecting it: the stress test

Here's the test that surfaced the bug. Fund one account, then fire N = 50 transfers out of it concurrently and check the books afterward:

AtomicInteger successes = new AtomicInteger();
CountDownLatch start = new CountDownLatch(1);
ExecutorService pool = Executors.newFixedThreadPool(16);
for (int i = 0; i < N; i++) {
    pool.submit(() -> {
        start.await();                       // line them all up...
        int code = post("/transfers", from, to, AMOUNT);   // ...then fire at once
        if (code == 201) successes.incrementAndGet();
    });
}
start.countDown();
// after all complete:
assertThat(balanceCache(from)).isEqualTo(ledgerBalance(from));   // cache == Σ entries
assertThat(balanceCache(from)).isGreaterThanOrEqualTo(0);        // no overdraft
assertThat(balanceCache(to)).isEqualTo((long) ok * AMOUNT);      // exact accounting

Enter fullscreen mode Exit fullscreen mode

The assertions are deliberately timing-independent — they hold for any split of successes and failures, because they compare the cache against the ledger truth rather than against a fixed expected count. That's what makes the test a stable regression guard instead of a flaky one.

I confirmed the bug by experiment: with the @Version column removed, ~85% of the cache updates were lost and these assertions went red — the cached balance drifted far from the sum of the entries. The cache and the truth disagreed, which in a money system is the whole ballgame.

The fix, part 1: optimistic locking

accounts already had a version BIGINT column, because the Account is the aggregate / locking boundary (ADR-0010). Mapping it as a JPA @Version turns every balance write into a compare-and-set:

UPDATE accounts SET balance = ?, version = version + 1
 WHERE id = ? AND version = ?

Enter fullscreen mode Exit fullscreen mode

Two concurrent writers both load version 7. The first to commit sets it to 8. The second's UPDATE ... WHERE version = 7 now matches zero rows, and Hibernate raises OptimisticLockingFailureException at commit time. The lost update is now impossible: instead of silently clobbering, the loser is told it lost.

The key property: this is detection, not blocking. No reader ever waits for a lock. For a ledger — where balance/history reads vastly outnumber writes — that matters a lot.

The fix, part 2: bounded retry

Detection alone isn't enough. With @Version in place but no retry, the stress test stopped corrupting data but a big chunk of transfers now failed with a conflict — correct, but a lousy experience. So the loser needs to retry.

The retry helper sits outside @Transactional, and that placement is the whole point:

public <T> T execute(Supplier<T> operation) {
    for (int attempt = 1; ; attempt++) {
        try {
            return operation.get();           // a FRESH transaction each attempt
        } catch (OptimisticLockingFailureException e) {
            if (attempt >= maxAttempts) throw new ConcurrencyConflictException();
            sleep(backoffWithFullJitter(attempt));   // 25–200 ms, capped
        }
    }
}

Enter fullscreen mode Exit fullscreen mode

Each attempt is a brand-new transaction that reloads the row at its current version — retrying inside the failed transaction would just re-fail against the stale version. Defaults: 5 attempts, exponential backoff with full jitter (so a thundering herd doesn't resynchronize into another collision), and on exhaustion a clean 409 Conflict — never a 500.

There's a small piece of reasoning that makes this provably terminating under moderate load: the k-th committer can only lose to a distinct earlier committer, so it needs at most k attempts. With 4 concurrent writers and a 5-attempt budget, all 4 succeed deterministically — no flaky test. A genuine hot account (more concurrent writers than the attempt budget) surfaces as 409, which is honest backpressure rather than a hidden corruption.

Optimistic vs pessimistic: the measured choice

The obvious alternative is pessimistic locking — SELECT ... FOR UPDATE to lock the row before touching it, so writer #2 simply waits. No retries, easy to reason about. So why optimistic?

I didn't want to argue this from vibes, so I wrote a benchmark (TransferConcurrencyBenchmark) that runs the identical transfer logic under both strategies, 50 concurrent writers, against one real PostgreSQL:

Scenario Optimistic + retry Pessimistic FOR UPDATE
Low-contention (50 disjoint account pairs) 34 ms · 50/50 ok · 0 retry waste 31 ms · 50/50 ok
High-contention (50 transfers → 1 hot row) 731 ms · 50/50 ok · 185 retry waste 358 ms · 50/50 ok

Reading the numbers:

  • Low contention is the common case, and it's a tie (34 vs 31 ms) — but optimistic wastes zero retries and, crucially, never blocks reads. That's the deciding factor for a read-heavy ledger.
  • On a single hot row, pessimistic is ~2× faster (358 vs 731 ms) and wastes nothing, while optimistic burns 185 extra attempts (≈4.7× the work) on collisions and backoff. But pessimistic "wins" here precisely by serializing and blocking reads — the thing I'm trying to avoid — and it doesn't actually solve a hot account, it just queues it.

So the verdict is optimistic + retry, and the value of the benchmark isn't "optimistic is faster" (it isn't, under contention) — it's that those 185 wasted retries quantify the threshold at which a truly hot account (think: every top-up debiting a shared SYSTEM_FUNDING row) needs a real escalation: async queueing or sub-account sharding, not flipping the whole system to pessimistic locks.

Composing with idempotency

There's one more way to double-spend that retry actually makes worse if you're not careful. A client whose connection drops after the server committed will retry the whole HTTP request — and now you risk posting the transfer twice. Retry-on-conflict and retry-on-network-blip are different problems, and the fix for one must not break the other.

So both money endpoints require an Idempotency-Key header (ADR-0012, the Stripe pattern). The mechanism that makes it concurrency-safe is claim-first:

INSERT INTO idempotency_keys (key, status) VALUES (?, 'PENDING')
ON CONFLICT (key) DO NOTHING;     -- committed immediately, before business logic

Enter fullscreen mode Exit fullscreen mode

That atomic insert is the serialization point. Whoever wins the claim runs the operation; a concurrent request with the same key sees the committed PENDING row and gets 409 (in-flight) instead of running a second time. A completed key replays the stored response; a key reused with a different body gets 422 (a client contract violation, deliberately distinct from the 409 conflict code).

The reason this composes cleanly with the retry from earlier: the optimistic-lock retry sits after the key is claimed. All those internal attempts happen under one already-claimed idempotency key, so they're completely invisible to the client and can never produce a second posting. Conflict-retry and request-idempotency stack instead of fighting.

What I'd reach for next

The cache is fast but it can drift (a bug, a partial failure). So a scheduled reconciliation job re-derives every balance from the immutable entries and alerts on any mismatch — it never auto-corrects; an operator posts a correcting entry. The append-only ledger means the truth is always recoverable.

And the hot-account ceiling those 185 retries exposed is the next real scaling problem: when one row is genuinely contended, the answer is async posting or sharding that account, with the retry rate as the signal that tells you when you've crossed the line.


The throughline: in a money system, the cache disagreeing with the ledger is the failure that matters, and a default-isolation database won't stop you from creating it. A @Version compare-and-set makes the lost update impossible, bounded retry with jitter makes it invisible under normal load, a benchmark tells you the price you're paying and where the ceiling is, and idempotency makes sure the retries — at every layer — never turn into double-spends.

Code: github.com/xidoke/ledger-service — the concurrency model doc and ADRs 0005, 0006, 0011, 0012 go deeper. Live demo: ledger-service-bjzr.onrender.com (free instance — first request cold-starts ~50 s).