Node.js Worker Threads: 60 Lines That Stop a CSV Upload from Timing Out Every Other Request

The Practical Developer

The Libuv Thread Pool Trap: Why Node.js Async APIs Stall Under Load Postgres Covering Indexes with INCLUDE: Eliminate Heap Fetches on Read-Heavy Workloads Postgres DISTINCT ON: The Fastest Way to Get the Latest Row Per Group Postgres Transaction Isolation: The Anomalies Your App Actually Faces in Production Linux TCP Tuning for Node.js Microservices: The Kernel Settings That Stop Silent Connection Drops Under Load Postgres HOT Updates and Fillfactor: Why Not All Writes Are Created Equal Database Connection Pool Leaks: Finding the Promise That Never Returns Its Seat Linux OOM Killer in Production: Why Your Node.js Containers Die Without a Stack Trace Postgres Materialized Views: Refresh Strategies That Do Not Lock Your Dashboards API Dependency Health Checks: Why /health Is Not Enough Authorization with Zanzibar Tuples: How Google Manages Permissions and How To Build the Same Check in Node.js Postgres Advisory Locks: The 20-Character Primitive That Replaces Redis for Coordination Dead Letter Queues: The Message Queue Pattern That Saves You at 2 a.m. File Descriptor Exhaustion: The Kernel Limit That Silently Drops Node.js Connections Graceful Degradation: The Pattern That Turns Total Outages into Partial Success PostgreSQL Full-Text Search: Dropping Elasticsearch for 90% of Use Cases S3 Presigned Multipart Uploads: Stop Your API Server from Being a File Upload Bottleneck MessagePack vs JSON: The Binary Serialization Switch That Cut Our Internal RPC Overhead by 40% DNS Caching in Node.js: The Silent Cause of Production Latency Spikes Reliable Cron Jobs: The Pattern That Stops Double Runs, Missed Executions, And The 2 AM Page GraphQL Query Complexity: Stop the OOM Query Before It Reaches Your Resolver Node.js Event Loop Lag: The Hidden Metric Behind Random Latency Spikes API Request Validation with Zod: The Schema That Catches Bad Input Before It Corrupts Your Database Load Shedding in Node.js: How to Reject Traffic Before You Drown Request Hedging: Cut Tail Latency In Half Without Overprovisioning Git Bisect: The Automated Binary Search That Finds Breaking Commits in Minutes Node.js Garbage Collection Tuning: Stop Letting V8 Pause Your Event Loop Node.js Server Timeouts: The Settings That Stop Slow Clients from Holding Sockets Hostage Postgres BRIN Indexes: The Time-Series Secret That Shrinks Indexes by 99% Event Sourcing with PostgreSQL: The Pragmatic 80% Solution Node.js Cluster Mode: Scaling the Event Loop Across CPU Cores Postgres Partial Indexes: Stopping Soft Deletes from Ruining Your Query Performance Request Coalescing with the Singleflight Pattern: Stop Drowning Your Database on Every Cache Miss The Bulkhead Pattern: Why One Slow Endpoint Should Not Drown Your Whole Service Node.js AsyncLocalStorage: End-to-End Request Context Without the Propagation Hell Postgres Deadlocks: Logging the Victim, Reproducing the Race, and Fixing the Lock Order Your Node.js HTTP Client Is the Bottleneck: Connection Pool Tuning That Works Optimistic Locking in Postgres: Stop Losing Data to Race Conditions Postgres Read Replicas: Stop Serving Stale Data to Your Users Cursor Pagination: Why Offset Queries Explode at Scale and How to Fix Them Reliable Webhook Delivery: Architecture for Outbound HTTP You Can Trust Request Timeouts and Deadline Propagation: Stop the Chain of Slowness Advanced Security Practices in Node.js Graceful Shutdown in Node.js: The 40 Lines That Stop 502s During Deploys Finding Node.js Memory Leaks with Heap Snapshots Idempotency Keys in 30 Lines: Stop Your Webhook From Charging Customers Twice Backpressure In Node.js: The Fix For Slow-Motion Queue Meltdowns Retries Done Right: Jitter, Budgets, and the Stampede You Did Not See Coming The Cache Stampede: Why Your "Just Add Redis" Layer Crashes Postgres at 3 a.m. Postgres SKIP LOCKED: An 80-Line Job Queue You Can Run Without Redis Stop Doing Work Nobody Wants: AbortController in Node.js, Done Right The N+1 Query Problem: We Found 23 In One Codebase And Killed Every One I Tried 5 AI Coding Tools for a Month. Here Is What I Actually Use CI/CD From Zero to Production in 30 Minutes With GitHub Actions Node.js vs Bun vs Deno: Which Runtime Should You Pick in 2025? Kubernetes Resource Requests And Limits: The Numbers That Decide If Your Cluster Is Stable The Three Pillars of Observability Are A Myth: What Actually Matters In Production pnpm Vs npm Vs yarn Vs Bun For Monorepos: Which One Earns The Migration In 2024 JSONB Indexing In Postgres: GIN Vs Expression Indexes, And When Each Is The Right Choice A Code Review Checklist That Ends The Same Three Arguments Every Sprint gRPC Vs REST In 2024: When The Switch Pays For Itself React Suspense For Data Fetching: The Pattern That Replaces Half Your Loading State Code The Five-Stage Rollout: How To Ship A Risky Change Without Holding Your Breath GitHub Actions In A Monorepo: Caching, Path Filters, And Secret Boundaries That Actually Work The Blameless Postmortem That Actually Improves Things: A Template And Six Hard-Won Rules Recursive CTEs In Postgres: How To Query A Tree Without N Round Trips Node.js Streams: When They Actually Help, And When They Just Add Complexity Playwright Vs Cypress In 2024: The Honest Comparison Of Which One Earns The Test Time React Server Components: The Mental Model That Makes The "use client" Boundary Obvious Pod Disruption Budgets: The K8s Object That Keeps Your Service Up During Cluster Maintenance Postgres LISTEN/NOTIFY: The Pub/Sub You Already Have And Are Not Using Chaos Engineering Starter Kit: The Five Drills That Don't Need Netflix-Scale Spec-Driven API Development With OpenAPI: How To Stop Drifting From Your Docs Saga Pattern vs Two-Phase Commit: Distributed Transactions Without The Lies Kubernetes Autoscaling Beyond CPU: The Custom-Metric HPA Pattern That Actually Works Postgres Partitioning For Time-Series: The Boring Setup That Saves Your Database Distributed Locks With Redis: An Honest Look At Redlock And When You Don't Need It HTTP/2 vs HTTP/3: What Actually Changes For Your App, And What Doesn't Image Optimization For The Web In 2023: srcset, AVIF, And The Lighthouse Score You Actually Want Kafka vs RabbitMQ: A Decision Tree That Doesn't Hate You UUID vs Bigint Primary Keys In Postgres: The Index Math That Decides For You Flame Graphs: How To Find The Slow Function In 30 Seconds Without Profiling Theatre Postgres Streaming Vs. Logical Replication: Which One Solves Your Actual Problem ESLint Rules That Earn Their Keep: The Twelve I Enable On Every Project Pre-Commit Hooks That Pay For Themselves: Husky, lint-staged, And The Five Rules That Stick Zero-Downtime Database Migrations: The Six-Step Pattern That Rules Them All Circuit Breakers In Node.js: 50 Lines That Stop A Failing Dependency From Taking Down Your Service Postgres VACUUM Is Not Magic: How Your Hot Table Bloats To 80GB And How To Fix It Kubernetes Liveness And Readiness Probes: The Difference That Causes Half Your Outages Rate Limiting In Production: A Token Bucket In 30 Lines Of Redis The Outbox Pattern: How To Stop Losing Events When Postgres And Kafka Disagree Load Testing With k6: The Three Scenarios That Find Real Bugs (Not Synthetic Numbers) Postgres Row-Level Security For Multi-Tenant Apps: The Pattern That Stops You From Leaking Data Rebase vs. Merge: The Team Policy That Ends The Argument Forever OpenTelemetry in Node.js: Distributed Tracing That Actually Helps During an Incident Feature Flags That Pay Rent: The 4 Flag Types And When To Delete Each ETag, Last-Modified, and the Caching Headers Most APIs Get Wrong Connection Pooling Without the Cargo Cult: pgbouncer in 100 Lines of Config JSONB Is Not a Schema: When To Reach For It in Postgres, And When To Stop Bash Strict Mode: The Three Lines That Stop Your Deploy Script From Lying To You

The Practica · 2026-05-12 · via The Practical Developer

Your /upload-csv endpoint works fine in local testing. A 2MB file parses in 80ms. Then a customer uploads a 40MB export from Salesforce on a Tuesday morning. Your p50 latency does not change. Your p95 jumps from 120ms to 4.2s. Health checks start timing out. Kubernetes restarts the pod. The CSV upload itself succeeds, eventually, but every other request that arrived during those two seconds sat in the event loop queue waiting for JSON.parse or csv-parse to finish.

This is not a memory problem. It is not a downstream problem. It is an event-loop prison problem. Node.js runs your JavaScript on a single OS thread, and any CPU-heavy operation (parsing, serializing, image resizing, PDF generation) blocks every other timer, I/O callback, and incoming HTTP request until it is done.

The fix is not “add more pods.” The fix is not cluster mode. The fix is moving the CPU-bound work to a Node.js Worker Thread so the event loop stays free to do what it does best: handle I/O and respond to requests.

Here is the 60-line pool, the worker script, and the numbers that show why this matters.

Why cluster mode is the wrong answer first

cluster forks your entire process across CPU cores. That helps throughput when your workload is I/O-bound and you want multiple event loops accepting connections. It does nothing when a single request triggers a CPU-bound task; that task still blocks one event loop, and the request still times out. Worse, if you run four workers and four users upload big files at once, you now have four blocked event loops instead of one.

Cluster mode scales the number of prisoners. It does not break anyone out of jail.

What Worker Threads actually do

Worker Threads give you real OS threads inside the same Node.js process. Each worker has:

Its own V8 isolate (separate heap, separate event loop)
The ability to run JavaScript in parallel with the main thread
Shared memory via SharedArrayBuffer when you need zero-copy data transfer
MessageChannel for structured cloning of data between threads

The catch: spawning a worker has a startup cost (~10–30ms), and passing data between threads copies it via structured clone unless you use transferables. You do not want to spawn a worker per request. You want a pool.

The 60-line thread pool

This pool spawns N workers, maintains a task queue, routes work to the next idle worker, and replaces dead workers automatically. It lives in the main thread.

import { Worker } from 'node:worker_threads';
import * as os from 'node:os';

type Task<R> = {
  payload: unknown;
  resolve: (v: R) => void;
  reject: (e: unknown) => void;
  timer: ReturnType<typeof setTimeout>;
};

export class WorkerPool<R> {
  private workers: Worker[] = [];
  private queue: Task<R>[] = [];
  private active = new Map<Worker, Task<R>>();

  constructor(
    private script: string,
    private size = Math.max(1, os.cpus().length - 1),
    private timeoutMs = 30_000,
  ) {
    for (let i = 0; i < size; i++) this.addWorker();
  }

  execute(payload: unknown): Promise<R> {
    return new Promise((resolve, reject) => {
      const timer = setTimeout(
        () => reject(new Error('Worker task timeout')),
        this.timeoutMs,
      );
      this.queue.push({ payload, resolve, reject, timer });
      this.flush();
    });
  }

  private addWorker() {
    const w = new Worker(this.script);
    w.on('message', (res) => {
      const t = this.active.get(w)!;
      this.active.delete(w);
      clearTimeout(t.timer);
      if (res && typeof res === 'object' && 'error' in res)
        t.reject(new Error(res.error));
      else t.resolve(res);
      this.flush();
    });
    w.on('error', (err) => {
      const t = this.active.get(w);
      if (t) { this.active.delete(w); clearTimeout(t.timer); t.reject(err); }
      const i = this.workers.indexOf(w);
      if (i >= 0) { this.workers.splice(i, 1); this.addWorker(); }
      this.flush();
    });
    this.workers.push(w);
  }

  private flush() {
    for (const w of this.workers) {
      if (!this.active.has(w) && this.queue.length) {
        const t = this.queue.shift()!;
        this.active.set(w, t);
        w.postMessage(t.payload);
      }
    }
  }

  terminate() {
    return Promise.all(this.workers.map((w) => w.terminate()));
  }
}

That is the entire pool. No external dependencies. It handles queuing, timeouts, worker death, and backpressure via the queue length.

The worker script: a CPU-bound CSV parser

Here is what runs inside the worker. It receives a Buffer, parses it, and posts the result back.

// csv-worker.js
const { parentPort } = require('node:worker_threads');
const { parse } = require('csv-parse/sync');

parentPort?.on('message', (buffer) => {
  try {
    const rows = parse(buffer, { columns: true, skip_empty_lines: true });
    parentPort.postMessage({ count: rows.length, preview: rows.slice(0, 5) });
  } catch (err) {
    parentPort.postMessage({ error: err.message });
  }
});

Wire it into your handler:

import { WorkerPool } from './worker-pool';
import { readFile } from 'node:fs/promises';

const pool = new WorkerPool<{ count: number; preview: unknown[] }>(
  './csv-worker.js',
  Math.max(1, os.cpus().length - 1),
  10_000,
);

app.post('/upload-csv', async (req, res) => {
  const buf = await readFile(req.file.path);
  const result = await pool.execute(buf);
  res.json({ parsed: result.count });
});

The main thread never runs csv-parse. It reads the file asynchronously, hands the buffer to the pool, and keeps processing HTTP requests while the worker grinds through the CSV.

The benchmark: before and after

Test setup: 40MB CSV (≈400k rows), Express server, autocannon running 100 concurrent connections against a health-check endpoint GET /health while a single POST /upload-csv runs in the background.

Without worker threads (parsing on the main thread):

Metric	Baseline (no upload)	During upload
`/health` p50	3ms	1,840ms
`/health` p99	8ms	4,200ms
`/health` errors	0	12% timeout
Upload duration	N/A	2,100ms

With worker thread pool (4 workers, parsing off main thread):

Metric	Baseline (no upload)	During upload
`/health` p50	3ms	4ms
`/health` p99	8ms	18ms
`/health` errors	0	0
Upload duration	N/A	2,050ms

The CSV still takes two seconds to parse; that is physics. But the health checks and every other request stay fast because the main thread event loop is free. The only cost is ~15ms of overhead to queue and transfer the buffer.

Transferables and zero-copy for large buffers

When you postMessage a Buffer, Node.js structured-clones it. For a 40MB file that means a 40MB copy in the main thread and another in the worker. That copy is fast enough for most cases, but if you are moving hundreds of megabytes, use a SharedArrayBuffer or transfer ownership:

// Transfer ownership: the buffer moves to the worker and becomes unusable in the main thread
const u8 = new Uint8Array(buffer);
worker.postMessage({ buffer: u8 }, [u8.buffer]);

After the transfer, u8.buffer is detached in the main thread. The worker owns the memory. This removes the copy entirely. Only use it if the main thread no longer needs the buffer, which is true for most upload handlers after they have handed it off.

What to watch in production

Worker crashes. The pool above auto-replaces a worker that throws, but if your worker script has a syntax error on startup, every replacement also dies. Add a one-shot health worker at process boot:

const probe = new Worker('./csv-worker.js');
await once(probe, 'online');
await probe.terminate();

If this throws, fail fast during deployment instead of discovering it at runtime.

Queue depth. If workers are saturated, tasks pile up in this.queue. Add a gauge:

// inside flush()
metrics.gauge('worker_pool.queue_depth', this.queue.length);

Alert when queue depth > poolSize * 2 for more than 60s; it means your workers are slower than your arrival rate.

Worker memory. Each worker has its own V8 heap. A worker parsing 100MB CSV can OOM independently of the main thread. Set --max-old-space-size per worker if you spawn them with execArgv:

const w = new Worker('./csv-worker.js', { execArgv: ['--max-old-space-size=512'] });

Logging from workers. console.log inside a worker prints to stdout of the main process, but it is interleaved and timestamps are messy. If you need structured logs from workers, post log messages back to the main thread and emit them from there, or write directly to a file descriptor that is safe to share.

The diagnostic: how to detect event loop lag

You do not need APM to know this is happening. A three-line monitor tells you:

import { performance } from 'node:perf_hooks';

let last = performance.now();
setInterval(() => {
  const lag = performance.now() - last - 1000;
  if (lag > 50) console.warn(`Event loop lag: ${lag.toFixed(1)}ms`);
  last = performance.now();
}, 1000).unref();

If this prints anything above 100ms during normal traffic, something is blocking the event loop. Profile the blocking function with clinic doctor or 0x, then decide whether it belongs in a worker.

When NOT to use worker threads

I/O-bound work. Database queries, HTTP requests, and file system reads are already non-blocking in Node.js. Moving them to a worker adds overhead for no gain.
Trivial compute. If your operation takes <5ms, worker transfer latency is a bigger cost than the blocking itself.
Stateful shared-memory algorithms. Workers do not share the same JavaScript heap (unless you use SharedArrayBuffer + Atomics). If your algorithm needs constant random object access across threads, Workers force you into a C-style memory model. Sometimes that is worth it; sometimes it is simpler to use a different runtime for that job.
Hot-path microservices with microsecond budgets. The ~10ms worker startup and structured-clone overhead do not fit latency-sensitive trading engines. Use Rust or C++ for that.

Practical defaults you can copy

Pool size: Math.max(1, os.cpus().length - 1). Reserve one core for the main event loop.
Task timeout: 10_000–30_000 ms. CPU work should have a ceiling; an infinite CSV parse is a memory leak waiting to happen.
Max queue depth: poolSize * 4. Reject or return 503 beyond that. Do not buffer infinite work in memory.
Transfer buffers when the payload is >1MB and the main thread does not need it afterward.
Health probe the worker script at boot. Do not wait for the first user request to discover a syntax error.

The takeaway

CPU-bound work in Node.js is a silent denial-of-service attack you launch against your own API every time a user sends a big JSON payload, a CSV export, or an image that needs resizing. The event loop does not complain. It just queues every other request until the work is done, and your health checks fail first.

Worker Threads are not a silver bullet; they have startup cost, memory overhead, and no shared heap by default. But a small reusable pool moves the heavy lifting off the event loop and keeps your API responsive under the exact load that would otherwise kill it. Wire the pool once, set the timeout, monitor the queue depth, and stop letting a single CSV upload time out every other request on the server.

A note from Yojji

Moving CPU-bound parsing off the main thread so health checks survive a 40MB CSV upload is the kind of unglamorous backend work that separates a system that handles real traffic from one that looks fine in local demos. It is also the kind of production-hardened Node.js engineering Yojji’s teams ship regularly.

Yojji is an international custom software development company founded in 2016, with offices in Europe, the US, and the UK. Their ~50+ person team specializes in the JavaScript ecosystem (React, Node.js, TypeScript), cloud platforms (AWS, Azure, Google Cloud), and microservices architecture, building the kind of systems that stay responsive when a customer drops a massive file on them at 9 a.m. on a Tuesday.

此内容由惯性聚合(RSS阅读器)自动聚合整理，仅供阅读参考。原文来自 — 版权归原作者所有。

推荐订阅源

The Practical Developer

Why cluster mode is the wrong answer first

What Worker Threads actually do

The 60-line thread pool

The worker script: a CPU-bound CSV parser

The benchmark: before and after

Transferables and zero-copy for large buffers

What to watch in production

The diagnostic: how to detect event loop lag

When NOT to use worker threads

Practical defaults you can copy

The takeaway

A note from Yojji