






















Your /upload-csv endpoint works fine in local testing. A 2MB file parses in 80ms. Then a customer uploads a 40MB export from Salesforce on a Tuesday morning. Your p50 latency does not change. Your p95 jumps from 120ms to 4.2s. Health checks start timing out. Kubernetes restarts the pod. The CSV upload itself succeeds, eventually, but every other request that arrived during those two seconds sat in the event loop queue waiting for JSON.parse or csv-parse to finish.
This is not a memory problem. It is not a downstream problem. It is an event-loop prison problem. Node.js runs your JavaScript on a single OS thread, and any CPU-heavy operation (parsing, serializing, image resizing, PDF generation) blocks every other timer, I/O callback, and incoming HTTP request until it is done.
The fix is not “add more pods.” The fix is not cluster mode. The fix is moving the CPU-bound work to a Node.js Worker Thread so the event loop stays free to do what it does best: handle I/O and respond to requests.
Here is the 60-line pool, the worker script, and the numbers that show why this matters.
cluster forks your entire process across CPU cores. That helps throughput when your workload is I/O-bound and you want multiple event loops accepting connections. It does nothing when a single request triggers a CPU-bound task; that task still blocks one event loop, and the request still times out. Worse, if you run four workers and four users upload big files at once, you now have four blocked event loops instead of one.
Cluster mode scales the number of prisoners. It does not break anyone out of jail.
Worker Threads give you real OS threads inside the same Node.js process. Each worker has:
SharedArrayBuffer when you need zero-copy data transferMessageChannel for structured cloning of data between threadsThe catch: spawning a worker has a startup cost (~10–30ms), and passing data between threads copies it via structured clone unless you use transferables. You do not want to spawn a worker per request. You want a pool.
This pool spawns N workers, maintains a task queue, routes work to the next idle worker, and replaces dead workers automatically. It lives in the main thread.
import { Worker } from 'node:worker_threads';
import * as os from 'node:os';
type Task<R> = {
payload: unknown;
resolve: (v: R) => void;
reject: (e: unknown) => void;
timer: ReturnType<typeof setTimeout>;
};
export class WorkerPool<R> {
private workers: Worker[] = [];
private queue: Task<R>[] = [];
private active = new Map<Worker, Task<R>>();
constructor(
private script: string,
private size = Math.max(1, os.cpus().length - 1),
private timeoutMs = 30_000,
) {
for (let i = 0; i < size; i++) this.addWorker();
}
execute(payload: unknown): Promise<R> {
return new Promise((resolve, reject) => {
const timer = setTimeout(
() => reject(new Error('Worker task timeout')),
this.timeoutMs,
);
this.queue.push({ payload, resolve, reject, timer });
this.flush();
});
}
private addWorker() {
const w = new Worker(this.script);
w.on('message', (res) => {
const t = this.active.get(w)!;
this.active.delete(w);
clearTimeout(t.timer);
if (res && typeof res === 'object' && 'error' in res)
t.reject(new Error(res.error));
else t.resolve(res);
this.flush();
});
w.on('error', (err) => {
const t = this.active.get(w);
if (t) { this.active.delete(w); clearTimeout(t.timer); t.reject(err); }
const i = this.workers.indexOf(w);
if (i >= 0) { this.workers.splice(i, 1); this.addWorker(); }
this.flush();
});
this.workers.push(w);
}
private flush() {
for (const w of this.workers) {
if (!this.active.has(w) && this.queue.length) {
const t = this.queue.shift()!;
this.active.set(w, t);
w.postMessage(t.payload);
}
}
}
terminate() {
return Promise.all(this.workers.map((w) => w.terminate()));
}
}
That is the entire pool. No external dependencies. It handles queuing, timeouts, worker death, and backpressure via the queue length.
Here is what runs inside the worker. It receives a Buffer, parses it, and posts the result back.
// csv-worker.js
const { parentPort } = require('node:worker_threads');
const { parse } = require('csv-parse/sync');
parentPort?.on('message', (buffer) => {
try {
const rows = parse(buffer, { columns: true, skip_empty_lines: true });
parentPort.postMessage({ count: rows.length, preview: rows.slice(0, 5) });
} catch (err) {
parentPort.postMessage({ error: err.message });
}
});
Wire it into your handler:
import { WorkerPool } from './worker-pool';
import { readFile } from 'node:fs/promises';
const pool = new WorkerPool<{ count: number; preview: unknown[] }>(
'./csv-worker.js',
Math.max(1, os.cpus().length - 1),
10_000,
);
app.post('/upload-csv', async (req, res) => {
const buf = await readFile(req.file.path);
const result = await pool.execute(buf);
res.json({ parsed: result.count });
});
The main thread never runs csv-parse. It reads the file asynchronously, hands the buffer to the pool, and keeps processing HTTP requests while the worker grinds through the CSV.
Test setup: 40MB CSV (≈400k rows), Express server, autocannon running 100 concurrent connections against a health-check endpoint GET /health while a single POST /upload-csv runs in the background.
Without worker threads (parsing on the main thread):
| Metric | Baseline (no upload) | During upload |
|---|---|---|
/health p50 | 3ms | 1,840ms |
/health p99 | 8ms | 4,200ms |
/health errors | 0 | 12% timeout |
| Upload duration | N/A | 2,100ms |
With worker thread pool (4 workers, parsing off main thread):
| Metric | Baseline (no upload) | During upload |
|---|---|---|
/health p50 | 3ms | 4ms |
/health p99 | 8ms | 18ms |
/health errors | 0 | 0 |
| Upload duration | N/A | 2,050ms |
The CSV still takes two seconds to parse; that is physics. But the health checks and every other request stay fast because the main thread event loop is free. The only cost is ~15ms of overhead to queue and transfer the buffer.
When you postMessage a Buffer, Node.js structured-clones it. For a 40MB file that means a 40MB copy in the main thread and another in the worker. That copy is fast enough for most cases, but if you are moving hundreds of megabytes, use a SharedArrayBuffer or transfer ownership:
// Transfer ownership: the buffer moves to the worker and becomes unusable in the main thread
const u8 = new Uint8Array(buffer);
worker.postMessage({ buffer: u8 }, [u8.buffer]);
After the transfer, u8.buffer is detached in the main thread. The worker owns the memory. This removes the copy entirely. Only use it if the main thread no longer needs the buffer, which is true for most upload handlers after they have handed it off.
Worker crashes. The pool above auto-replaces a worker that throws, but if your worker script has a syntax error on startup, every replacement also dies. Add a one-shot health worker at process boot:
const probe = new Worker('./csv-worker.js');
await once(probe, 'online');
await probe.terminate();
If this throws, fail fast during deployment instead of discovering it at runtime.
Queue depth. If workers are saturated, tasks pile up in this.queue. Add a gauge:
// inside flush()
metrics.gauge('worker_pool.queue_depth', this.queue.length);
Alert when queue depth > poolSize * 2 for more than 60s; it means your workers are slower than your arrival rate.
Worker memory. Each worker has its own V8 heap. A worker parsing 100MB CSV can OOM independently of the main thread. Set --max-old-space-size per worker if you spawn them with execArgv:
const w = new Worker('./csv-worker.js', { execArgv: ['--max-old-space-size=512'] });
Logging from workers. console.log inside a worker prints to stdout of the main process, but it is interleaved and timestamps are messy. If you need structured logs from workers, post log messages back to the main thread and emit them from there, or write directly to a file descriptor that is safe to share.
You do not need APM to know this is happening. A three-line monitor tells you:
import { performance } from 'node:perf_hooks';
let last = performance.now();
setInterval(() => {
const lag = performance.now() - last - 1000;
if (lag > 50) console.warn(`Event loop lag: ${lag.toFixed(1)}ms`);
last = performance.now();
}, 1000).unref();
If this prints anything above 100ms during normal traffic, something is blocking the event loop. Profile the blocking function with clinic doctor or 0x, then decide whether it belongs in a worker.
SharedArrayBuffer + Atomics). If your algorithm needs constant random object access across threads, Workers force you into a C-style memory model. Sometimes that is worth it; sometimes it is simpler to use a different runtime for that job.Math.max(1, os.cpus().length - 1). Reserve one core for the main event loop.10_000–30_000 ms. CPU work should have a ceiling; an infinite CSV parse is a memory leak waiting to happen.poolSize * 4. Reject or return 503 beyond that. Do not buffer infinite work in memory.CPU-bound work in Node.js is a silent denial-of-service attack you launch against your own API every time a user sends a big JSON payload, a CSV export, or an image that needs resizing. The event loop does not complain. It just queues every other request until the work is done, and your health checks fail first.
Worker Threads are not a silver bullet; they have startup cost, memory overhead, and no shared heap by default. But a small reusable pool moves the heavy lifting off the event loop and keeps your API responsive under the exact load that would otherwise kill it. Wire the pool once, set the timeout, monitor the queue depth, and stop letting a single CSV upload time out every other request on the server.
Moving CPU-bound parsing off the main thread so health checks survive a 40MB CSV upload is the kind of unglamorous backend work that separates a system that handles real traffic from one that looks fine in local demos. It is also the kind of production-hardened Node.js engineering Yojji’s teams ship regularly.
Yojji is an international custom software development company founded in 2016, with offices in Europe, the US, and the UK. Their ~50+ person team specializes in the JavaScript ecosystem (React, Node.js, TypeScript), cloud platforms (AWS, Azure, Google Cloud), and microservices architecture, building the kind of systems that stay responsive when a customer drops a massive file on them at 9 a.m. on a Tuesday.
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。