The Libuv Thread Pool Trap: Why Node.js Async APIs Stall Under Load

The Practical Developer

Postgres Covering Indexes with INCLUDE: Eliminate Heap Fetches on Read-Heavy Workloads Postgres DISTINCT ON: The Fastest Way to Get the Latest Row Per Group Postgres Transaction Isolation: The Anomalies Your App Actually Faces in Production Linux TCP Tuning for Node.js Microservices: The Kernel Settings That Stop Silent Connection Drops Under Load Postgres HOT Updates and Fillfactor: Why Not All Writes Are Created Equal Database Connection Pool Leaks: Finding the Promise That Never Returns Its Seat Linux OOM Killer in Production: Why Your Node.js Containers Die Without a Stack Trace Postgres Materialized Views: Refresh Strategies That Do Not Lock Your Dashboards API Dependency Health Checks: Why /health Is Not Enough Authorization with Zanzibar Tuples: How Google Manages Permissions and How To Build the Same Check in Node.js Postgres Advisory Locks: The 20-Character Primitive That Replaces Redis for Coordination Dead Letter Queues: The Message Queue Pattern That Saves You at 2 a.m. File Descriptor Exhaustion: The Kernel Limit That Silently Drops Node.js Connections Graceful Degradation: The Pattern That Turns Total Outages into Partial Success PostgreSQL Full-Text Search: Dropping Elasticsearch for 90% of Use Cases S3 Presigned Multipart Uploads: Stop Your API Server from Being a File Upload Bottleneck MessagePack vs JSON: The Binary Serialization Switch That Cut Our Internal RPC Overhead by 40% DNS Caching in Node.js: The Silent Cause of Production Latency Spikes Reliable Cron Jobs: The Pattern That Stops Double Runs, Missed Executions, And The 2 AM Page GraphQL Query Complexity: Stop the OOM Query Before It Reaches Your Resolver Node.js Event Loop Lag: The Hidden Metric Behind Random Latency Spikes API Request Validation with Zod: The Schema That Catches Bad Input Before It Corrupts Your Database Load Shedding in Node.js: How to Reject Traffic Before You Drown Request Hedging: Cut Tail Latency In Half Without Overprovisioning Git Bisect: The Automated Binary Search That Finds Breaking Commits in Minutes Node.js Garbage Collection Tuning: Stop Letting V8 Pause Your Event Loop Node.js Server Timeouts: The Settings That Stop Slow Clients from Holding Sockets Hostage Postgres BRIN Indexes: The Time-Series Secret That Shrinks Indexes by 99% Event Sourcing with PostgreSQL: The Pragmatic 80% Solution Node.js Cluster Mode: Scaling the Event Loop Across CPU Cores Postgres Partial Indexes: Stopping Soft Deletes from Ruining Your Query Performance Request Coalescing with the Singleflight Pattern: Stop Drowning Your Database on Every Cache Miss The Bulkhead Pattern: Why One Slow Endpoint Should Not Drown Your Whole Service Node.js AsyncLocalStorage: End-to-End Request Context Without the Propagation Hell Postgres Deadlocks: Logging the Victim, Reproducing the Race, and Fixing the Lock Order Your Node.js HTTP Client Is the Bottleneck: Connection Pool Tuning That Works Optimistic Locking in Postgres: Stop Losing Data to Race Conditions Postgres Read Replicas: Stop Serving Stale Data to Your Users Cursor Pagination: Why Offset Queries Explode at Scale and How to Fix Them Node.js Worker Threads: 60 Lines That Stop a CSV Upload from Timing Out Every Other Request Reliable Webhook Delivery: Architecture for Outbound HTTP You Can Trust Request Timeouts and Deadline Propagation: Stop the Chain of Slowness Advanced Security Practices in Node.js Graceful Shutdown in Node.js: The 40 Lines That Stop 502s During Deploys Finding Node.js Memory Leaks with Heap Snapshots Idempotency Keys in 30 Lines: Stop Your Webhook From Charging Customers Twice Backpressure In Node.js: The Fix For Slow-Motion Queue Meltdowns Retries Done Right: Jitter, Budgets, and the Stampede You Did Not See Coming The Cache Stampede: Why Your "Just Add Redis" Layer Crashes Postgres at 3 a.m. Postgres SKIP LOCKED: An 80-Line Job Queue You Can Run Without Redis Stop Doing Work Nobody Wants: AbortController in Node.js, Done Right The N+1 Query Problem: We Found 23 In One Codebase And Killed Every One I Tried 5 AI Coding Tools for a Month. Here Is What I Actually Use CI/CD From Zero to Production in 30 Minutes With GitHub Actions Node.js vs Bun vs Deno: Which Runtime Should You Pick in 2025? Kubernetes Resource Requests And Limits: The Numbers That Decide If Your Cluster Is Stable The Three Pillars of Observability Are A Myth: What Actually Matters In Production pnpm Vs npm Vs yarn Vs Bun For Monorepos: Which One Earns The Migration In 2024 JSONB Indexing In Postgres: GIN Vs Expression Indexes, And When Each Is The Right Choice A Code Review Checklist That Ends The Same Three Arguments Every Sprint gRPC Vs REST In 2024: When The Switch Pays For Itself React Suspense For Data Fetching: The Pattern That Replaces Half Your Loading State Code The Five-Stage Rollout: How To Ship A Risky Change Without Holding Your Breath GitHub Actions In A Monorepo: Caching, Path Filters, And Secret Boundaries That Actually Work The Blameless Postmortem That Actually Improves Things: A Template And Six Hard-Won Rules Recursive CTEs In Postgres: How To Query A Tree Without N Round Trips Node.js Streams: When They Actually Help, And When They Just Add Complexity Playwright Vs Cypress In 2024: The Honest Comparison Of Which One Earns The Test Time React Server Components: The Mental Model That Makes The "use client" Boundary Obvious Pod Disruption Budgets: The K8s Object That Keeps Your Service Up During Cluster Maintenance Postgres LISTEN/NOTIFY: The Pub/Sub You Already Have And Are Not Using Chaos Engineering Starter Kit: The Five Drills That Don't Need Netflix-Scale Spec-Driven API Development With OpenAPI: How To Stop Drifting From Your Docs Saga Pattern vs Two-Phase Commit: Distributed Transactions Without The Lies Kubernetes Autoscaling Beyond CPU: The Custom-Metric HPA Pattern That Actually Works Postgres Partitioning For Time-Series: The Boring Setup That Saves Your Database Distributed Locks With Redis: An Honest Look At Redlock And When You Don't Need It HTTP/2 vs HTTP/3: What Actually Changes For Your App, And What Doesn't Image Optimization For The Web In 2023: srcset, AVIF, And The Lighthouse Score You Actually Want Kafka vs RabbitMQ: A Decision Tree That Doesn't Hate You UUID vs Bigint Primary Keys In Postgres: The Index Math That Decides For You Flame Graphs: How To Find The Slow Function In 30 Seconds Without Profiling Theatre Postgres Streaming Vs. Logical Replication: Which One Solves Your Actual Problem ESLint Rules That Earn Their Keep: The Twelve I Enable On Every Project Pre-Commit Hooks That Pay For Themselves: Husky, lint-staged, And The Five Rules That Stick Zero-Downtime Database Migrations: The Six-Step Pattern That Rules Them All Circuit Breakers In Node.js: 50 Lines That Stop A Failing Dependency From Taking Down Your Service Postgres VACUUM Is Not Magic: How Your Hot Table Bloats To 80GB And How To Fix It Kubernetes Liveness And Readiness Probes: The Difference That Causes Half Your Outages Rate Limiting In Production: A Token Bucket In 30 Lines Of Redis The Outbox Pattern: How To Stop Losing Events When Postgres And Kafka Disagree Load Testing With k6: The Three Scenarios That Find Real Bugs (Not Synthetic Numbers) Postgres Row-Level Security For Multi-Tenant Apps: The Pattern That Stops You From Leaking Data Rebase vs. Merge: The Team Policy That Ends The Argument Forever OpenTelemetry in Node.js: Distributed Tracing That Actually Helps During an Incident Feature Flags That Pay Rent: The 4 Flag Types And When To Delete Each ETag, Last-Modified, and the Caching Headers Most APIs Get Wrong Connection Pooling Without the Cargo Cult: pgbouncer in 100 Lines of Config JSONB Is Not a Schema: When To Reach For It in Postgres, And When To Stop Bash Strict Mode: The Three Lines That Stop Your Deploy Script From Lying To You

The Practica · 2026-05-28 · via The Practical Developer

The /export endpoint had been fast for months. A background worker read 200MB log files, compressed them with zlib.createGzip, and uploaded them to S3. It was entirely async: fs.createReadStream, pipeline, crypto.createHash for checksums. Nothing blocked the event loop. Then the traffic team doubled the number of export jobs, and everything went sideways.

Health checks stayed green. Event loop lag was under 2 ms. CPU sat at 15%. But p99 latency for the export endpoint jumped from 800 ms to 5.2 seconds. Worse, every other endpoint on the same service started sporadically timing out. fs.readFile calls that normally resolved in 5 ms were taking 400 ms. bcrypt.hash that should finish in 100 ms stretched past a second. There were no errors in the logs, no memory spikes, and no blocked event loop warnings. The service just felt sluggish in a way that did not make sense.

The culprit was the libuv thread pool, and here is why it is invisible to almost every monitoring stack.

What the libuv thread pool actually does

Node.js is single-threaded in JavaScript, but it is not single-threaded in C++. The event loop runs in the main thread, and the libuv library manages a separate pool of threads for work that cannot complete synchronously without blocking. Tasks that use blocking system calls (file system I/O, DNS resolution on older Node versions, some crypto operations, compression) are handed to this pool. When a thread finishes, it queues the callback back onto the event loop.

The default size of this pool is four threads.

That might have been generous in 2009. In 2026, four threads is a bottleneck waiting to happen. Every fs.readFile, fs.writeFile, crypto.pbkdf2, bcrypt.hash, zlib.deflate, and dns.lookup (on Node 18, and some paths on Node 20) that is active at the same time occupies one of those four threads. When all four are busy, the fifth request queues. The sixth queues behind it. They will wait, quietly, for a thread to free up, even though every single call is technically “async.”

This is not an event loop block. The event loop keeps ticking. Timers fire. HTTP requests parse. But the callbacks for any thread-pool task just sit in the worker queue until a thread becomes available. The symptoms are maddening because every metric you normally watch looks fine.

The APIs that quietly consume threads

Not every async Node.js API uses the thread pool. Native I/O that uses epoll or kqueues (network sockets, timers, signals) stays on the event loop via the kernel. But a surprising amount of common work drops into the pool:

API family	Examples
File system	`fs.readFile`, `fs.writeFile`, `fs.stat`, `fs.access`, `fs.mkdir`, `fs.copyFile`
Crypto	`crypto.pbkdf2`, `crypto.scrypt`, `bcrypt.hash`, `crypto.randomFill`
Compression	`zlib.deflate`, `zlib.inflate`, `zlib.gzip`, `zlib.brotliCompress`
DNS	`dns.lookup` (Node 18, some paths on Node 20)

The dns.lookup case is worth calling out. Before Node.js 20, it always ran on the thread pool. If your service opens thousands of HTTP connections to different hosts, every connection triggers a DNS lookup that consumes one of those four threads. We covered DNS caching in another post, but the thread pool angle is the reason lookups can stall even when the DNS server is fast.

fs calls are the silent killers in practice. They hit the pool because POSIX file system calls do not have async equivalents in the kernel (io_uring is changing this on Linux, but Node.js support is still emerging). Every file read is a blocking system call wrapped in a C++ wrapper and farmed to a thread.

Reproducing the stall in 30 lines

Here is a script that demonstrates the behavior without any external dependencies. It fires eight concurrent fs.readFile calls against the same file. With a four-thread pool, only four run in parallel. The rest queue.

import fs from 'node:fs';
import { performance } from 'node:perf_hooks';

const FILE = './test-file.txt';

// Create a 1MB file to ensure the reads take enough time to overlap
fs.writeFileSync(FILE, 'x'.repeat(1024 * 1024));

async function measureRead(id) {
  const start = performance.now();
  await fs.promises.readFile(FILE);
  const duration = performance.now() - start;
  console.log(`read ${id}: ${duration.toFixed(1)}ms`);
}

async function run() {
  const start = performance.now();
  await Promise.all(Array.from({ length: 8 }, (_, i) => measureRead(i)));
  console.log(`total wall time: ${(performance.now() - start).toFixed(1)}ms`);
}

run();

On a typical VM, the output looks like this:

read 0: 12.3ms
read 1: 12.1ms
read 2: 12.5ms
read 3: 12.4ms
read 4: 24.1ms
read 5: 24.3ms
read 6: 24.2ms
read 7: 24.5ms
total wall time: 24.8ms

The first four finish in parallel. The next four wait for a thread. The wall time is roughly 2x a single read because there are only four workers. Now imagine those reads are 50MB log files, or the pool is also occupied by gzip and bcrypt tasks. Eight concurrent exports can turn into a 20-second pipeline even though each file operation is “non-blocking.”

The silent symptom checklist

Because the event loop stays unblocked, most monitoring completely misses this. Look for these signals instead:

Latency spikes that do not correlate with CPU, memory, or event loop lag. The event loop is idle, but callbacks from fs or crypto arrive late.
Timeouts on health checks or outgoing requests that use fs or crypto. The task itself is fast, but it waited in the thread pool queue.
fs or crypto operations that get slower as concurrency rises, even on SSDs. Disk I/O is not the bottleneck. Thread availability is.
Metrics that show libuv metrics are unavailable, because you are not tracking them. Node.js does not expose thread pool queue depth by default.
A sudden fix when you switch to in-memory caches or streams, but you do not know why. You removed file system pressure from the pool.

How to actually measure it

Node.js exposes async_hooks and perf_hooks to observe thread pool behavior. The most direct diagnostic is measuring the duration from when an operation is requested to when its callback fires. If fs.readFile takes 400 ms wall-clock but only 5 ms of actual disk time, the difference is queue wait.

Here is a minimal instrumentation that patches the fs module to emit timing:

import fs from 'node:fs';
import { performance } from 'node:perf_hooks';

const originals = {
  readFile: fs.readFile.bind(fs),
  writeFile: fs.writeFile.bind(fs),
};

function instrument(name, fn) {
  return function (...args) {
    const start = performance.now();
    const cb = args[args.length - 1];

    if (typeof cb !== 'function') {
      return fn(...args); // promise variant, skip for brevity
    }

    args[args.length - 1] = function (err, result) {
      const duration = performance.now() - start;
      console.log(JSON.stringify({
        event: 'fs_timing',
        op: name,
        durationMs: Math.round(duration * 100) / 100,
        timestamp: new Date().toISOString(),
      }));
      cb(err, result);
    };

    return fn(...args);
  };
}

fs.readFile = instrument('readFile', originals.readFile);
fs.writeFile = instrument('writeFile', originals.writeFile);

For a production-grade version, hook into perf_hooks directly. Node.js emits fs performance entries that include start and end timestamps for the underlying work, which is the closest proxy to actual work time. Compare it to wall time. A large gap between wall and work means queue contention.

You can also use trace_event profiling with --trace-event-categories node.async_hooks,node.perf and inspect the trace in Chrome DevTools. Look for long gaps between init and before hooks on thread-pool operations.

Fix 1: raise the thread pool size (carefully)

The blunt fix is UV_THREADPOOL_SIZE. It controls the number of threads libuv spawns at startup. The maximum is 1024, but you almost never want that.

export UV_THREADPOOL_SIZE=16
node server.js

Or in a Dockerfile:

ENV UV_THREADPOOL_SIZE=16
CMD ["node", "server.js"]

More threads mean more concurrent file system, crypto, and DNS operations. The trade-off is memory and context-switching cost. Each thread consumes a small amount of RSS (typically 1-2MB for the stack, plus whatever work it holds). Going from 4 to 16 is usually safe. Going to 128 on a 512MB container is not, unless you know the workload is thread-pool bound and you have the memory.

The right size depends on your workload profile. If your service does occasional bcrypt hashing and file reads, 8-12 is usually enough. If you are running a log-processing pipeline with heavy zlib and concurrent large readFile calls, you might need 32 or more.

Do not guess. Measure queue depth (via the wall-vs-work timing above), pick a size that flattens latency without ballooning RSS, and cap it.

Fix 2: limit concurrency at the application layer

Another approach is to stop overloading the pool in the first place. If your service runs eight export jobs in parallel but each job hits fs and zlib, you are fighting yourself. Use a semaphore or p-retry-style concurrency limit to keep only N jobs running at once, where N matches your thread pool capacity.

Here is a minimal semaphore that works without dependencies:

class Semaphore {
  constructor(max) {
    this.max = max;
    this.count = 0;
    this.queue = [];
  }

  async acquire() {
    if (this.count < this.max) {
      this.count++;
      return;
    }
    await new Promise((resolve) => this.queue.push(resolve));
    this.count++;
  }

  release() {
    this.count--;
    if (this.queue.length > 0) {
      const next = this.queue.shift();
      next();
    }
  }
}

const pool = new Semaphore(4); // match UV_THREADPOOL_SIZE

async function safeExport(job) {
  await pool.acquire();
  try {
    await runExport(job); // uses fs, zlib, crypto
  } finally {
    pool.release();
  }
}

This does not speed up a single export. What it does is protect the rest of your service. With concurrency capped at four, the fifth export waits at the application layer instead of silently queueing in libuv where you cannot see it. Health checks, other endpoints, and unrelated fs calls are not starved.

Fix 3: move heavy work out of the thread pool entirely

UV_THREADPOOL_SIZE helps, but it is not a panacea. Some work does not belong in the main Node.js process at all.

Heavy CPU-bound hashing ( bcrypt, Argon2, PBKDF2 with high rounds): Move to a dedicated microservice or a worker thread pool. Worker threads use V8 isolates, not libuv threads, and they do not compete with file I/O.
Large file compression: Stream with zlib in small chunks rather than buffering the whole file, or offload to a background job worker that runs on separate nodes.
Bulk file reads: If you are reading 50MB files repeatedly, consider an in-memory cache or a shared volume read once at startup. Each readFile blocks a thread for the full duration of the kernel read.

Worker threads and child_process are both better homes for work that would otherwise pin a libuv thread for hundreds of milliseconds. The event loop stays free, and the libuv pool stays available for short tasks.

The decision tree

Symptom	Likely cause	Fix
Event loop is idle, but `fs`/`crypto` callbacks are late	Thread pool queueing	Raise `UV_THREADPOOL_SIZE`, limit concurrency
DNS lookups spike under load (Node 18)	`dns.lookup` hitting the 4-thread pool	Upgrade to Node 20+, add DNS caching
`bcrypt.hash` slows down all requests during signups	One slow task pinning a pool thread	Move to worker threads or dedicated service
Large gzip operations block smaller `fs` reads	Mixed workload overwhelming fixed pool	Separate heavy work to workers or cap concurrency
Timeouts that vanish when you add pods	Fewer jobs per process = less pool contention	Right-size pool or limit concurrency instead of scaling horizontally

The takeaway

Node.js async APIs are not magic. fs.readFile and bcrypt.hash are async at the JavaScript level, but underneath they run on a small, fixed thread pool that defaults to four workers. When that pool saturates, the queue grows silently. Your event loop is healthy. Your CPU is bored. Your service is still slow.

Start by measuring. Instrument fs and crypto wall times, compare them to actual work times, and look for queue wait. If you see it, raise UV_THREADPOOL_SIZE modestly, cap concurrency at the application layer so you do not fight yourself, and move long-running work to worker threads or external services. Do all three, and the “ghost latency” that had no explanation disappears.

Your code is not slow. It is just waiting for a thread.

A note from Yojji

Building production Node.js services means understanding every layer between the HTTP request and the kernel. Yojji engineers regularly diagnose hidden thread pool contention, DNS resolution storms, and event loop stalls in high-throughput systems. Yojji is an international custom software development company founded in 2016, with offices in Europe, the US, and the UK. Their team of 50+ senior engineers has completed hundreds of projects using Node.js, TypeScript, and cloud-native architecture. If your team is chasing latency ghosts that do not show up in CPU or memory graphs, Yojji can help you instrument and fix the layers no one else is looking at.

此内容由惯性聚合(RSS阅读器)自动聚合整理，仅供阅读参考。原文来自 — 版权归原作者所有。

推荐订阅源

The Practical Developer

What the libuv thread pool actually does

The APIs that quietly consume threads

Reproducing the stall in 30 lines

The silent symptom checklist

How to actually measure it

Fix 1: raise the thread pool size (carefully)

Fix 2: limit concurrency at the application layer

Fix 3: move heavy work out of the thread pool entirely

The decision tree

The takeaway

A note from Yojji