Request Coalescing with the Singleflight Pattern: Stop Drowning Your Database on Every Cache Miss

The Practical Developer

The Libuv Thread Pool Trap: Why Node.js Async APIs Stall Under Load Postgres Covering Indexes with INCLUDE: Eliminate Heap Fetches on Read-Heavy Workloads Postgres DISTINCT ON: The Fastest Way to Get the Latest Row Per Group Postgres Transaction Isolation: The Anomalies Your App Actually Faces in Production Linux TCP Tuning for Node.js Microservices: The Kernel Settings That Stop Silent Connection Drops Under Load Postgres HOT Updates and Fillfactor: Why Not All Writes Are Created Equal Database Connection Pool Leaks: Finding the Promise That Never Returns Its Seat Linux OOM Killer in Production: Why Your Node.js Containers Die Without a Stack Trace Postgres Materialized Views: Refresh Strategies That Do Not Lock Your Dashboards API Dependency Health Checks: Why /health Is Not Enough Authorization with Zanzibar Tuples: How Google Manages Permissions and How To Build the Same Check in Node.js Postgres Advisory Locks: The 20-Character Primitive That Replaces Redis for Coordination Dead Letter Queues: The Message Queue Pattern That Saves You at 2 a.m. File Descriptor Exhaustion: The Kernel Limit That Silently Drops Node.js Connections Graceful Degradation: The Pattern That Turns Total Outages into Partial Success PostgreSQL Full-Text Search: Dropping Elasticsearch for 90% of Use Cases S3 Presigned Multipart Uploads: Stop Your API Server from Being a File Upload Bottleneck MessagePack vs JSON: The Binary Serialization Switch That Cut Our Internal RPC Overhead by 40% DNS Caching in Node.js: The Silent Cause of Production Latency Spikes Reliable Cron Jobs: The Pattern That Stops Double Runs, Missed Executions, And The 2 AM Page GraphQL Query Complexity: Stop the OOM Query Before It Reaches Your Resolver Node.js Event Loop Lag: The Hidden Metric Behind Random Latency Spikes API Request Validation with Zod: The Schema That Catches Bad Input Before It Corrupts Your Database Load Shedding in Node.js: How to Reject Traffic Before You Drown Request Hedging: Cut Tail Latency In Half Without Overprovisioning Git Bisect: The Automated Binary Search That Finds Breaking Commits in Minutes Node.js Garbage Collection Tuning: Stop Letting V8 Pause Your Event Loop Node.js Server Timeouts: The Settings That Stop Slow Clients from Holding Sockets Hostage Postgres BRIN Indexes: The Time-Series Secret That Shrinks Indexes by 99% Event Sourcing with PostgreSQL: The Pragmatic 80% Solution Node.js Cluster Mode: Scaling the Event Loop Across CPU Cores Postgres Partial Indexes: Stopping Soft Deletes from Ruining Your Query Performance The Bulkhead Pattern: Why One Slow Endpoint Should Not Drown Your Whole Service Node.js AsyncLocalStorage: End-to-End Request Context Without the Propagation Hell Postgres Deadlocks: Logging the Victim, Reproducing the Race, and Fixing the Lock Order Your Node.js HTTP Client Is the Bottleneck: Connection Pool Tuning That Works Optimistic Locking in Postgres: Stop Losing Data to Race Conditions Postgres Read Replicas: Stop Serving Stale Data to Your Users Cursor Pagination: Why Offset Queries Explode at Scale and How to Fix Them Node.js Worker Threads: 60 Lines That Stop a CSV Upload from Timing Out Every Other Request Reliable Webhook Delivery: Architecture for Outbound HTTP You Can Trust Request Timeouts and Deadline Propagation: Stop the Chain of Slowness Advanced Security Practices in Node.js Graceful Shutdown in Node.js: The 40 Lines That Stop 502s During Deploys Finding Node.js Memory Leaks with Heap Snapshots Idempotency Keys in 30 Lines: Stop Your Webhook From Charging Customers Twice Backpressure In Node.js: The Fix For Slow-Motion Queue Meltdowns Retries Done Right: Jitter, Budgets, and the Stampede You Did Not See Coming The Cache Stampede: Why Your "Just Add Redis" Layer Crashes Postgres at 3 a.m. Postgres SKIP LOCKED: An 80-Line Job Queue You Can Run Without Redis Stop Doing Work Nobody Wants: AbortController in Node.js, Done Right The N+1 Query Problem: We Found 23 In One Codebase And Killed Every One I Tried 5 AI Coding Tools for a Month. Here Is What I Actually Use CI/CD From Zero to Production in 30 Minutes With GitHub Actions Node.js vs Bun vs Deno: Which Runtime Should You Pick in 2025? Kubernetes Resource Requests And Limits: The Numbers That Decide If Your Cluster Is Stable The Three Pillars of Observability Are A Myth: What Actually Matters In Production pnpm Vs npm Vs yarn Vs Bun For Monorepos: Which One Earns The Migration In 2024 JSONB Indexing In Postgres: GIN Vs Expression Indexes, And When Each Is The Right Choice A Code Review Checklist That Ends The Same Three Arguments Every Sprint gRPC Vs REST In 2024: When The Switch Pays For Itself React Suspense For Data Fetching: The Pattern That Replaces Half Your Loading State Code The Five-Stage Rollout: How To Ship A Risky Change Without Holding Your Breath GitHub Actions In A Monorepo: Caching, Path Filters, And Secret Boundaries That Actually Work The Blameless Postmortem That Actually Improves Things: A Template And Six Hard-Won Rules Recursive CTEs In Postgres: How To Query A Tree Without N Round Trips Node.js Streams: When They Actually Help, And When They Just Add Complexity Playwright Vs Cypress In 2024: The Honest Comparison Of Which One Earns The Test Time React Server Components: The Mental Model That Makes The "use client" Boundary Obvious Pod Disruption Budgets: The K8s Object That Keeps Your Service Up During Cluster Maintenance Postgres LISTEN/NOTIFY: The Pub/Sub You Already Have And Are Not Using Chaos Engineering Starter Kit: The Five Drills That Don't Need Netflix-Scale Spec-Driven API Development With OpenAPI: How To Stop Drifting From Your Docs Saga Pattern vs Two-Phase Commit: Distributed Transactions Without The Lies Kubernetes Autoscaling Beyond CPU: The Custom-Metric HPA Pattern That Actually Works Postgres Partitioning For Time-Series: The Boring Setup That Saves Your Database Distributed Locks With Redis: An Honest Look At Redlock And When You Don't Need It HTTP/2 vs HTTP/3: What Actually Changes For Your App, And What Doesn't Image Optimization For The Web In 2023: srcset, AVIF, And The Lighthouse Score You Actually Want Kafka vs RabbitMQ: A Decision Tree That Doesn't Hate You UUID vs Bigint Primary Keys In Postgres: The Index Math That Decides For You Flame Graphs: How To Find The Slow Function In 30 Seconds Without Profiling Theatre Postgres Streaming Vs. Logical Replication: Which One Solves Your Actual Problem ESLint Rules That Earn Their Keep: The Twelve I Enable On Every Project Pre-Commit Hooks That Pay For Themselves: Husky, lint-staged, And The Five Rules That Stick Zero-Downtime Database Migrations: The Six-Step Pattern That Rules Them All Circuit Breakers In Node.js: 50 Lines That Stop A Failing Dependency From Taking Down Your Service Postgres VACUUM Is Not Magic: How Your Hot Table Bloats To 80GB And How To Fix It Kubernetes Liveness And Readiness Probes: The Difference That Causes Half Your Outages Rate Limiting In Production: A Token Bucket In 30 Lines Of Redis The Outbox Pattern: How To Stop Losing Events When Postgres And Kafka Disagree Load Testing With k6: The Three Scenarios That Find Real Bugs (Not Synthetic Numbers) Postgres Row-Level Security For Multi-Tenant Apps: The Pattern That Stops You From Leaking Data Rebase vs. Merge: The Team Policy That Ends The Argument Forever OpenTelemetry in Node.js: Distributed Tracing That Actually Helps During an Incident Feature Flags That Pay Rent: The 4 Flag Types And When To Delete Each ETag, Last-Modified, and the Caching Headers Most APIs Get Wrong Connection Pooling Without the Cargo Cult: pgbouncer in 100 Lines of Config JSONB Is Not a Schema: When To Reach For It in Postgres, And When To Stop Bash Strict Mode: The Three Lines That Stop Your Deploy Script From Lying To You

The Practica · 2026-05-15 · via The Practical Developer

Your cache TTL expires. The next request fetches the user profile from Postgres, takes 200ms, and writes it back to Redis. The problem: that profile endpoint just handled a traffic spike, so there were not ten requests in that 200ms window. There were four hundred. Every single one of them saw an empty cache slot, ran the same SELECT, waited on the same row lock, and queued behind the same disk read. The database, which was comfortably under 20% CPU a moment ago, is now at 100%, latency is spiking, and the cache refill that should have been a quiet background event became the incident of the afternoon.

This is not a cache stampede (that is many different keys expiring at once, which this blog has already covered). This is a single-key miss under concurrency, and it is ruthlessly efficient at turning one expensive query into hundreds. The fix is request coalescing, also called the singleflight pattern. One process runs the query. Every other concurrent caller waits for that result and receives it when it is ready. The database sees one query, not four hundred.

This post builds a production-grade singleflight implementation in TypeScript, handles the failure modes most tutorials skip (timeouts, errors, memory leaks), and shows how to wire it into a cache layer without turning your data fetcher into a mess.

The shape of the problem

Here is a minimal cache-aside fetcher that looks reasonable and falls over the moment a hot key expires:

async function getUserProfile(userId: string) {
  const cacheKey = `user:${userId}`;
  const cached = await redis.get(cacheKey);
  if (cached) return JSON.parse(cached);

  const user = await db.query('SELECT * FROM users WHERE id = $1', [userId]);
  await redis.setex(cacheKey, 60, JSON.stringify(user));
  return user;
}

Under load, the race looks like this:

Request A: cache miss, starts DB query.
Requests B through Z: cache miss (A has not written back yet), each starts its own DB query.
The database runs the same query dozens or hundreds of times in parallel.
Every result gets written back to Redis, so you also pay the Redis write amplification.

The fix is not “use cache-through instead of cache-aside.” A cache-through store can still issue multiple backend fetches if it does not coalesce internally. The fix is coalescing at the application layer.

A naive coalescer and why it leaks

The first instinct is a Map of in-flight promises:

const inFlight = new Map<string, Promise<unknown>>();

async function getUserProfileNaive(userId: string) {
  const cacheKey = `user:${userId}`;
  const cached = await redis.get(cacheKey);
  if (cached) return JSON.parse(cached);

  if (inFlight.has(cacheKey)) {
    return inFlight.get(cacheKey)!;
  }

  const promise = db.query('SELECT * FROM users WHERE id = $1', [userId])
    .then(async (user) => {
      await redis.setex(cacheKey, 60, JSON.stringify(user));
      return user;
    });

  inFlight.set(cacheKey, promise);
  return promise;
}

The leak is obvious once you look for it: the promise never leaves the Map. Every key that is ever fetched stays in inFlight forever. After a day of production traffic, that Map contains millions of stale entries and the process is bloating. Worse, on the second cache miss for the same key, inFlight.has(cacheKey) is still true from six hours ago, so the next caller receives a resolved promise that was for the old data, not the fresh query.

The missing piece is cleanup. When the promise settles, remove the key. But even that is not enough, because a slow query that never returns (or hangs until the caller times out) keeps the key in the map indefinitely. You need a TTL on the in-flight entry itself.

The production version

Here is a bounded, self-cleaning singleflight implementation. It uses a Map of AbortController-backed entries, with automatic eviction on settlement, error, or timeout.

interface InFlightEntry<T> {
  promise: Promise<T>;
  controller: AbortController;
  startedAt: number;
}

class Singleflight<T> {
  private inFlight = new Map<string, InFlightEntry<T>>();
  private readonly maxEntries: number;
  private readonly maxAgeMs: number;

  constructor(options: { maxEntries?: number; maxAgeMs?: number } = {}) {
    this.maxEntries = options.maxEntries ?? 10_000;
    this.maxAgeMs = options.maxAgeMs ?? 30_000;
  }

  async do(key: string, fn: (signal: AbortSignal) => Promise<T>): Promise<T> {
    const existing = this.inFlight.get(key);
    if (existing) {
      if (Date.now() - existing.startedAt > this.maxAgeMs) {
        existing.controller.abort();
        this.inFlight.delete(key);
      } else {
        return existing.promise;
      }
    }

    const controller = new AbortController();
    const startedAt = Date.now();

    const promise = fn(controller.signal)
      .finally(() => {
        this.inFlight.delete(key);
      })
      .catch((err) => {
        this.inFlight.delete(key);
        throw err;
      });

    if (this.inFlight.size >= this.maxEntries) {
      const firstKey = this.inFlight.keys().next().value;
      if (firstKey) {
        const evicted = this.inFlight.get(firstKey);
        evicted?.controller.abort();
        this.inFlight.delete(firstKey);
      }
    }

    this.inFlight.set(key, { promise, controller, startedAt });
    return promise;
  }
}

The important details:

Cleanup on settlement and error. The .finally removes the key whether the promise resolves or rejects. The .catch rethrows after deleting, so failures do not poison the map.
Max age. If an entry sits unresolved longer than maxAgeMs, the next caller for the same key aborts the stale flight and starts a fresh one. This prevents a hung query from blocking all future cache misses for that key.
Bounded size. If the map hits the limit, the oldest entry is evicted and aborted. In a real system under a cache stampede, the number of distinct in-flight keys can explode; this cap prevents unbounded memory growth.
Abort signal propagation. The caller receives an AbortSignal, so the actual work (the database query, the HTTP fetch, the CPU-intensive computation) can observe cancellation and release resources early.

Wiring it into a cache fetcher

The integration should be invisible to the rest of your application. The fetcher checks cache, then singleflight, then falls back to the real source. The singleflight key must include everything that makes the query unique, not just the cache key.

const sf = new Singleflight<{ id: string; name: string }>({
  maxEntries: 5_000,
  maxAgeMs: 10_000,
});

async function getUserProfile(userId: string) {
  const cacheKey = `user:${userId}`;

  const cached = await redis.get(cacheKey);
  if (cached) return JSON.parse(cached);

  return sf.do(cacheKey, async (signal) => {
    const user = await db.query('SELECT * FROM users WHERE id = $1', [userId]);

    if (signal.aborted) {
      return user;
    }

    await redis.setex(cacheKey, 60, JSON.stringify(user));
    return user;
  });
}

Note the signal.aborted check before the Redis write. If the flight was evicted or aborted while the query was in flight, we still return the user object to the original caller (who is waiting on the promise), but we skip writing a stale result to the cache. The next cache miss will trigger a fresh query. This is a safety valve: it is better to miss the cache write than to cache data that the system has already decided is too old.

The timeout layer most people forget

Singleflight removes duplicate queries, but it does not make the single query faster. If the one query that is allowed through hangs for thirty seconds, every waiter hangs with it. You need a timeout on the coalesced work, not just on the individual HTTP requests.

A simple wrapper:

function withTimeout<T>(ms: number, fn: (signal: AbortSignal) => Promise<T>): Promise<T> {
  return new Promise((resolve, reject) => {
    const controller = new AbortController();
    const timer = setTimeout(() => {
      controller.abort();
      reject(new Error(`Timeout after ${ms}ms`));
    }, ms);

    fn(controller.signal)
      .then(resolve, reject)
      .finally(() => clearTimeout(timer));
  });
}

And in the fetcher:

return sf.do(cacheKey, async (signal) => {
  const user = await withTimeout(5_000, async (innerSignal) => {
    const combined = AbortSignal.any([signal, innerSignal]);
    return db.query('SELECT * FROM users WHERE id = $1', [userId], { signal: combined });
  });
  // ...
});

AbortSignal.any is available in Node.js 20+. It fires when either the singleflight eviction signal or the local timeout signal aborts. If you are on an older runtime, combine them manually by adding listeners to both. The point is that the database driver must actually observe the signal and cancel the query. Drivers like pg support cancellation via the signal option; if yours does not, the timeout will drop the promise but the query may still run to completion on the server. In that case, keep your database statement_timeout tight so the server cleans up for you.

What about errors?

If the single query throws, every waiter receives the same rejection. This is usually correct: if the database is down, all concurrent callers for that key should see the error rather than each one retrying independently and amplifying the failure.

But you probably do not want to cache the error. If you are using a cache layer with a “cache negative results” feature, keep the TTL short (one or two seconds) so a transient failure does not block that key for minutes. The singleflight map already handles this correctly: the entry is deleted on rejection, so the next caller will attempt a fresh query immediately.

One subtle bug: if you wrap the singleflight call in a retry loop, the retry loop on every waiter will all retry at the same moment, creating a synchronized retry storm. Move retries inside the singleflight work function, not outside it.

// Bad: every waiter retries together.
return sf.do(key, () => fetchWithRetry(key));

// Good: one fetcher retries, everyone else waits.
return sf.do(key, () => fetchWithRetry(key));

Wait, both lines look identical. The distinction is in the calling code. If each HTTP request handler wraps getUserProfile in its own retry loop, the retries happen outside singleflight. Keep retries at the data-source level, inside the function passed to sf.do.

Cross-process coalescing: do you need it?

The implementation above coalesces within one Node.js process. If you run four containers, a cache miss can still produce four database queries (one per container). For most systems, that is fine. Four queries is not four hundred.

If you genuinely need cross-process coalescing, Redis has a pattern for it: SET lock:user:42 NX EX 5 to elect a leader, LPUSH waiters:user:42 <client_id> for waiters, and BRPOP for blocking. It works, but it adds latency (network round-trips for the locking), complexity, and another failure mode (the leader dies, the waiters hang). In practice, application-level singleflight plus a short cache TTL solves the problem for 99% of teams. Do not build distributed singleflight until you have metrics proving that process-level coalescing is insufficient.

Metrics that prove it is working

Add three metrics so you can verify the behavior in production:

import client from 'prom-client';

const singleflightCoalesced = new client.Counter({
  name: 'singleflight_coalesced_total',
  help: 'Number of requests coalesced into an in-flight query',
  labelNames: ['key_prefix'],
});

const singleflightStarted = new client.Counter({
  name: 'singleflight_started_total',
  help: 'Number of backend queries actually started',
  labelNames: ['key_prefix'],
});

const singleflightDuration = new client.Histogram({
  name: 'singleflight_duration_seconds',
  help: 'Time from request to result for coalesced queries',
  labelNames: ['key_prefix', 'coalesced'],
  buckets: [0.001, 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1],
});

Emit singleflightCoalesced when a caller joins an existing flight, and singleflightStarted when a new flight begins. The ratio between them is your savings. If you see 50,000 coalesced and 100 started, you just prevented 49,900 redundant queries.

The practical checklist

Before you ship singleflight to production, verify these:

The key is fully deterministic. It must include every parameter that changes the result. user:${userId} is good. user:${userId} when the query also depends on a ?include=orders flag is a bug waiting to happen.
The map is bounded and evicted. Unbounded Map growth is a memory leak. Bound it, abort stale entries, and delete on settlement.
The work function handles abort signals. If the driver or client does not support cancellation, at least set a tight server-side timeout so the database does not accumulate zombie queries.
Errors are not cached by the singleflight map. Delete on rejection so the next caller tries again.
Retries live inside the singleflight work. Retries outside it turn a failure into a synchronized retry storm.
Metrics are in place. You need the ratio of coalesced to started to know whether this is actually helping.

The working code

Here is the complete, copy-pasteable module. It depends on no libraries beyond Node.js built-ins.

// singleflight.ts
export interface SingleflightOptions {
  maxEntries?: number;
  maxAgeMs?: number;
}

interface Entry<T> {
  promise: Promise<T>;
  controller: AbortController;
  startedAt: number;
}

export class Singleflight<T> {
  private inFlight = new Map<string, Entry<T>>();
  private readonly maxEntries: number;
  private readonly maxAgeMs: number;

  constructor(options: SingleflightOptions = {}) {
    this.maxEntries = options.maxEntries ?? 10_000;
    this.maxAgeMs = options.maxAgeMs ?? 30_000;
  }

  async do(key: string, fn: (signal: AbortSignal) => Promise<T>): Promise<T> {
    const existing = this.inFlight.get(key);
    if (existing) {
      if (Date.now() - existing.startedAt > this.maxAgeMs) {
        existing.controller.abort();
        this.inFlight.delete(key);
      } else {
        return existing.promise;
      }
    }

    const controller = new AbortController();
    const startedAt = Date.now();

    const promise = fn(controller.signal)
      .finally(() => {
        this.inFlight.delete(key);
      })
      .catch((err) => {
        this.inFlight.delete(key);
        throw err;
      });

    if (this.inFlight.size >= this.maxEntries) {
      const firstKey = this.inFlight.keys().next().value;
      if (firstKey) {
        const evicted = this.inFlight.get(firstKey);
        evicted?.controller.abort();
        this.inFlight.delete(firstKey);
      }
    }

    this.inFlight.set(key, { promise, controller, startedAt });
    return promise;
  }
}

And a minimal cache fetcher that uses it:

import { Singleflight } from './singleflight.js';

const sf = new Singleflight<unknown>({ maxEntries: 5_000, maxAgeMs: 10_000 });

export async function fetchThroughCache<T>(
  cacheKey: string,
  fetcher: (signal: AbortSignal) => Promise<T>,
  ttlSec: number,
): Promise<T> {
  const cached = await redis.get(cacheKey);
  if (cached) return JSON.parse(cached) as T;

  const result = await sf.do(cacheKey, async (signal) => {
    const data = await fetcher(signal);
    if (!signal.aborted) {
      await redis.setex(cacheKey, ttlSec, JSON.stringify(data));
    }
    return data;
  });

  return result;
}

That is it. One bounded map, one promise per key, and a database that sees one query instead of a stampede every time the cache hiccups.

A note from Yojji

The kind of backend performance work that turns a routine cache miss into a non-event (request coalescing, bounded in-flight maps, and careful abort propagation) is exactly the kind of infrastructure detail Yojji’s teams build into the systems they ship for clients.

Yojji is an international custom software development company founded in 2016, with teams across Europe, the US, and the UK. They specialize in the JavaScript ecosystem (React, Node.js, TypeScript), cloud platforms (AWS, Azure, GCP), and full-cycle product engineering, including the caching and data-layer patterns that keep backends stable when traffic patterns change.

此内容由惯性聚合(RSS阅读器)自动聚合整理，仅供阅读参考。原文来自 — 版权归原作者所有。

推荐订阅源

The Practical Developer

The shape of the problem

A naive coalescer and why it leaks

The production version

Wiring it into a cache fetcher

The timeout layer most people forget

What about errors?

Cross-process coalescing: do you need it?

Metrics that prove it is working

The practical checklist

The working code

A note from Yojji