






















The team needs to process a 4 GB CSV file. The naive code reads the whole file with fs.readFile, runs out of memory, and the worker crashes. Somebody mentions Node streams. The next attempt has 60 lines of pipe chains, custom Transform streams, and three different ways of handling errors. It works but nobody understands it.
Node streams are powerful and confusing. They are the right tool for a specific set of problems: large files, network responses, anything where data arrives over time and shouldn’t all live in memory at once. For everything else, they add complexity for no benefit.
This post is the rule for when to reach for streams, the modern async-iterable pattern that replaces most classic stream code, and the four-liner that handles 80% of real-world cases.
Use streams when the data does not fit in memory or you don’t want to wait for all of it before processing starts. Use async iterables (a newer, simpler API) when you can. Use whole-data APIs (fs.readFile, fetch().json()) when the data is small and you want it all anyway.
Examples that pass:
Examples that fail:
fs.readFile. Streams are overkill.await fetch().json(). The response is small.For 80% of streaming work, this is the right shape:
import { pipeline } from 'node:stream/promises';
import { createReadStream, createWriteStream } from 'node:fs';
import { createGzip } from 'node:zlib';
await pipeline(
createReadStream('input.txt'),
createGzip(),
createWriteStream('output.gz'),
);
pipeline from node:stream/promises is the modern API. It:
.pipe() did not, leading to memory leaks).For most “read from A, transform, write to B” tasks, this four-liner is the answer. Don’t write for await loops or Transform stream classes unless you genuinely need them.
For row-by-row processing, async iterables are simpler than Transform streams. Any readable stream is also an async iterable:
import { createReadStream } from 'node:fs';
import { createInterface } from 'node:readline';
const rl = createInterface({ input: createReadStream('large.csv') });
let rows = 0;
for await (const line of rl) {
if (rows === 0) { rows++; continue; } // skip header
const [id, name, email] = line.split(',');
await processRow({ id, name, email });
rows++;
}
console.log(`processed ${rows} rows`);
Compare to the equivalent classic Transform-stream code (twice as long, twice as confusing). The async-iterable version is sequential, easy to read, easy to add error handling.
The reason “just await for each row” works well: backpressure. When processRow is slow, the iteration pauses waiting for the await; the underlying read stream doesn’t keep filling memory. This is a property the stream API gives you for free.
The mistake is to bypass it:
// WRONG: fires off a thousand promises in parallel.
for await (const line of rl) {
processRow(line); // not awaited
}
Now you have unbounded parallelism. The 1000-row CSV starts 1000 simultaneous processRow calls. Memory explodes.
If you genuinely want bounded parallelism:
import pLimit from 'p-limit';
const limit = pLimit(10);
for await (const line of rl) {
limit(() => processRow(line));
}
10-way parallel. Backpressure-friendly because p-limit queues only that many.
For a service that proxies large responses, streaming saves memory:
// Express
app.get('/download/:id', async (req, res) => {
const s3Stream = s3.getObject({ Bucket: 'b', Key: req.params.id }).createReadStream();
res.setHeader('Content-Type', 'application/octet-stream');
await pipeline(s3Stream, res);
});
The bytes flow from S3 through your service to the client without ever buffering in memory. A 4 GB download uses kilobytes of server memory.
Same pattern for the inbound side, streaming a large upload to S3:
app.post('/upload', async (req, res) => {
await pipeline(req, s3.upload({ Bucket: 'b', Key: 'file.bin', Body: req }).createReadStream());
res.sendStatus(204);
});
A Transform stream takes input chunks and emits output chunks. Useful when you genuinely need a reusable, composable transformation step.
import { Transform } from 'node:stream';
const csvToJson = new Transform({
objectMode: true,
transform(line, _enc, callback) {
const [id, name] = line.split(',');
callback(null, { id: +id, name });
},
});
await pipeline(
createReadStream('data.csv'),
createInterface({ input: process.stdin }), // line-by-line
csvToJson,
jsonStringifyStream,
process.stdout,
);
Most teams should not write Transforms by hand. Use async iterables instead; same effect, more readable.
Classic streams have a quiet killer: an error on one stream in a .pipe() chain doesn’t propagate. Use pipeline:
// AVOID
readStream.pipe(transformStream).pipe(writeStream);
// errors on any stream are silently dropped
// USE
await pipeline(readStream, transformStream, writeStream);
// any error rejects the promise
pipeline is the difference between “wait, why didn’t my stream finish?” and an actual exception you can handle.
For async iterables, normal try/catch works:
try {
for await (const line of rl) {
await processRow(line);
}
} catch (err) {
console.error('processing failed', err);
}
A common need: parse JSON without loading the whole file. For a JSON file that’s a top-level array of objects, stream-json does the work:
import { parser } from 'stream-json';
import { streamArray } from 'stream-json/streamers/StreamArray';
await pipeline(
createReadStream('big.json'),
parser(),
streamArray(),
async function* (source) {
for await (const { value } of source) {
await processItem(value);
}
},
);
For NDJSON (one JSON object per line), the readline approach is simpler: split on newlines, parse each line.
A few cases where streams add complexity for no benefit:
Small files. A 50 KB config file. Just fs.readFile. The stream overhead exceeds the file size.
JSON APIs returning small responses. await fetch().json(). Most APIs return < 1 MB; load it all.
Sequential async work. If you’re processing data through several async steps and don’t need backpressure, a plain await chain is simpler.
Anywhere you’re tempted to write a custom Transform stream. Try async iterables first.
The Node.js Streams API has been stable for over a decade and has accumulated cruft. Modern alternatives:
Web Streams API. Standardized, used in browsers and Deno, supported in Node 18+. Cleaner ergonomics. The future, but interop with Node Streams is still imperfect.
const response = await fetch('https://api.example.com/large');
for await (const chunk of response.body) {
// process
}
Async generators. A function that yields values asynchronously is functionally a stream. Use them directly when transforming data.
For new code, prefer async iterables (Node Streams) or Web Streams. Reach for the classic Stream API only when interop demands it.
Use streams (or async iterables) when the data is too big to fit in memory or when processing should start before all data has arrived. For everything smaller, use the simpler whole-data APIs. The pipeline helper from node:stream/promises covers most “read, transform, write” cases in four lines. Async iterables (for await) replace most Transform-stream code.
Streams are not a religious topic. They are a tool for a specific shape of problem, and reaching for them when the problem doesn’t have that shape is over-engineering.
The kind of senior Node.js judgment that picks streams when they’re the right tool, and avoids the “write a custom Transform” rabbit hole when async iterables are simpler, is the kind of detail Yojji’s backend teams bring to client work.
Yojji is an international custom software development company founded in 2016, with teams across Europe, the US, and the UK. They specialize in the JavaScript ecosystem (React, Node.js, TypeScript), cloud platforms (AWS, Azure, GCP), and full-cycle backend engineering, including the data-processing patterns that decide whether your service is fast and lean or buffering everything in memory.
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。