Node.js Garbage Collection Tuning: Stop Letting V8 Pause Your Event Loop

The Practical Developer

The Libuv Thread Pool Trap: Why Node.js Async APIs Stall Under Load Postgres Covering Indexes with INCLUDE: Eliminate Heap Fetches on Read-Heavy Workloads Postgres DISTINCT ON: The Fastest Way to Get the Latest Row Per Group Postgres Transaction Isolation: The Anomalies Your App Actually Faces in Production Linux TCP Tuning for Node.js Microservices: The Kernel Settings That Stop Silent Connection Drops Under Load Postgres HOT Updates and Fillfactor: Why Not All Writes Are Created Equal Database Connection Pool Leaks: Finding the Promise That Never Returns Its Seat Linux OOM Killer in Production: Why Your Node.js Containers Die Without a Stack Trace Postgres Materialized Views: Refresh Strategies That Do Not Lock Your Dashboards API Dependency Health Checks: Why /health Is Not Enough Authorization with Zanzibar Tuples: How Google Manages Permissions and How To Build the Same Check in Node.js Postgres Advisory Locks: The 20-Character Primitive That Replaces Redis for Coordination Dead Letter Queues: The Message Queue Pattern That Saves You at 2 a.m. File Descriptor Exhaustion: The Kernel Limit That Silently Drops Node.js Connections Graceful Degradation: The Pattern That Turns Total Outages into Partial Success PostgreSQL Full-Text Search: Dropping Elasticsearch for 90% of Use Cases S3 Presigned Multipart Uploads: Stop Your API Server from Being a File Upload Bottleneck MessagePack vs JSON: The Binary Serialization Switch That Cut Our Internal RPC Overhead by 40% DNS Caching in Node.js: The Silent Cause of Production Latency Spikes Reliable Cron Jobs: The Pattern That Stops Double Runs, Missed Executions, And The 2 AM Page GraphQL Query Complexity: Stop the OOM Query Before It Reaches Your Resolver Node.js Event Loop Lag: The Hidden Metric Behind Random Latency Spikes API Request Validation with Zod: The Schema That Catches Bad Input Before It Corrupts Your Database Load Shedding in Node.js: How to Reject Traffic Before You Drown Request Hedging: Cut Tail Latency In Half Without Overprovisioning Git Bisect: The Automated Binary Search That Finds Breaking Commits in Minutes Node.js Server Timeouts: The Settings That Stop Slow Clients from Holding Sockets Hostage Postgres BRIN Indexes: The Time-Series Secret That Shrinks Indexes by 99% Event Sourcing with PostgreSQL: The Pragmatic 80% Solution Node.js Cluster Mode: Scaling the Event Loop Across CPU Cores Postgres Partial Indexes: Stopping Soft Deletes from Ruining Your Query Performance Request Coalescing with the Singleflight Pattern: Stop Drowning Your Database on Every Cache Miss The Bulkhead Pattern: Why One Slow Endpoint Should Not Drown Your Whole Service Node.js AsyncLocalStorage: End-to-End Request Context Without the Propagation Hell Postgres Deadlocks: Logging the Victim, Reproducing the Race, and Fixing the Lock Order Your Node.js HTTP Client Is the Bottleneck: Connection Pool Tuning That Works Optimistic Locking in Postgres: Stop Losing Data to Race Conditions Postgres Read Replicas: Stop Serving Stale Data to Your Users Cursor Pagination: Why Offset Queries Explode at Scale and How to Fix Them Node.js Worker Threads: 60 Lines That Stop a CSV Upload from Timing Out Every Other Request Reliable Webhook Delivery: Architecture for Outbound HTTP You Can Trust Request Timeouts and Deadline Propagation: Stop the Chain of Slowness Advanced Security Practices in Node.js Graceful Shutdown in Node.js: The 40 Lines That Stop 502s During Deploys Finding Node.js Memory Leaks with Heap Snapshots Idempotency Keys in 30 Lines: Stop Your Webhook From Charging Customers Twice Backpressure In Node.js: The Fix For Slow-Motion Queue Meltdowns Retries Done Right: Jitter, Budgets, and the Stampede You Did Not See Coming The Cache Stampede: Why Your "Just Add Redis" Layer Crashes Postgres at 3 a.m. Postgres SKIP LOCKED: An 80-Line Job Queue You Can Run Without Redis Stop Doing Work Nobody Wants: AbortController in Node.js, Done Right The N+1 Query Problem: We Found 23 In One Codebase And Killed Every One I Tried 5 AI Coding Tools for a Month. Here Is What I Actually Use CI/CD From Zero to Production in 30 Minutes With GitHub Actions Node.js vs Bun vs Deno: Which Runtime Should You Pick in 2025? Kubernetes Resource Requests And Limits: The Numbers That Decide If Your Cluster Is Stable The Three Pillars of Observability Are A Myth: What Actually Matters In Production pnpm Vs npm Vs yarn Vs Bun For Monorepos: Which One Earns The Migration In 2024 JSONB Indexing In Postgres: GIN Vs Expression Indexes, And When Each Is The Right Choice A Code Review Checklist That Ends The Same Three Arguments Every Sprint gRPC Vs REST In 2024: When The Switch Pays For Itself React Suspense For Data Fetching: The Pattern That Replaces Half Your Loading State Code The Five-Stage Rollout: How To Ship A Risky Change Without Holding Your Breath GitHub Actions In A Monorepo: Caching, Path Filters, And Secret Boundaries That Actually Work The Blameless Postmortem That Actually Improves Things: A Template And Six Hard-Won Rules Recursive CTEs In Postgres: How To Query A Tree Without N Round Trips Node.js Streams: When They Actually Help, And When They Just Add Complexity Playwright Vs Cypress In 2024: The Honest Comparison Of Which One Earns The Test Time React Server Components: The Mental Model That Makes The "use client" Boundary Obvious Pod Disruption Budgets: The K8s Object That Keeps Your Service Up During Cluster Maintenance Postgres LISTEN/NOTIFY: The Pub/Sub You Already Have And Are Not Using Chaos Engineering Starter Kit: The Five Drills That Don't Need Netflix-Scale Spec-Driven API Development With OpenAPI: How To Stop Drifting From Your Docs Saga Pattern vs Two-Phase Commit: Distributed Transactions Without The Lies Kubernetes Autoscaling Beyond CPU: The Custom-Metric HPA Pattern That Actually Works Postgres Partitioning For Time-Series: The Boring Setup That Saves Your Database Distributed Locks With Redis: An Honest Look At Redlock And When You Don't Need It HTTP/2 vs HTTP/3: What Actually Changes For Your App, And What Doesn't Image Optimization For The Web In 2023: srcset, AVIF, And The Lighthouse Score You Actually Want Kafka vs RabbitMQ: A Decision Tree That Doesn't Hate You UUID vs Bigint Primary Keys In Postgres: The Index Math That Decides For You Flame Graphs: How To Find The Slow Function In 30 Seconds Without Profiling Theatre Postgres Streaming Vs. Logical Replication: Which One Solves Your Actual Problem ESLint Rules That Earn Their Keep: The Twelve I Enable On Every Project Pre-Commit Hooks That Pay For Themselves: Husky, lint-staged, And The Five Rules That Stick Zero-Downtime Database Migrations: The Six-Step Pattern That Rules Them All Circuit Breakers In Node.js: 50 Lines That Stop A Failing Dependency From Taking Down Your Service Postgres VACUUM Is Not Magic: How Your Hot Table Bloats To 80GB And How To Fix It Kubernetes Liveness And Readiness Probes: The Difference That Causes Half Your Outages Rate Limiting In Production: A Token Bucket In 30 Lines Of Redis The Outbox Pattern: How To Stop Losing Events When Postgres And Kafka Disagree Load Testing With k6: The Three Scenarios That Find Real Bugs (Not Synthetic Numbers) Postgres Row-Level Security For Multi-Tenant Apps: The Pattern That Stops You From Leaking Data Rebase vs. Merge: The Team Policy That Ends The Argument Forever OpenTelemetry in Node.js: Distributed Tracing That Actually Helps During an Incident Feature Flags That Pay Rent: The 4 Flag Types And When To Delete Each ETag, Last-Modified, and the Caching Headers Most APIs Get Wrong Connection Pooling Without the Cargo Cult: pgbouncer in 100 Lines of Config JSONB Is Not a Schema: When To Reach For It in Postgres, And When To Stop Bash Strict Mode: The Three Lines That Stop Your Deploy Script From Lying To You

The Practica · 2026-05-16 · via The Practical Developer

Your latency graph is clean for hours, then a 400 ms spike appears out of nowhere. It does not correlate with traffic, database slow queries, or deployments. It correlates with nothing you can find in application logs. Then you enable --trace-gc and realize the spikes are exactly aligned with V8s full mark-sweep-compact collections. The garbage collector is doing its job, but it is doing it at the worst possible moment, and the default heap limits mean it waits until the last second to do the expensive work.

Most Node.js services run with default V8 heap settings. That means the garbage collector grows the old generation until it either hits a computed limit based on available memory or the container OOM killer intervenes. On a 4 GB container, the old space can balloon to 1.8 GB before V8 decides a full collection is necessary. At that size, a mark-sweep-compact pause can take hundreds of milliseconds. For a service handling 10,000 RPS, that is a catastrophe.

This post is not a computer science lecture. It is the three heap parameters you set, the one monitoring snippet you add, and the deployment rule that prevents your next latency spike from being a GC pause.

How V8 decides when to collect

V8 splits the heap into two generations: young and old. Young generation collections, called scavenges, are fast and frequent. They copy live objects out of the “from” semi-space into the “to” semi-space, discard the rest, and pay only for the objects that survive. Most objects die young, so scavenges are cheap.

Old generation collections, called mark-sweep-compact, are the expensive ones. V8 walks the entire old heap, marks reachable objects, sweeps dead ones, and compacts live objects to reduce fragmentation. The cost is proportional to the size of the live set, not the allocation rate. A 2 GB heap with 1.5 GB live takes longer to collect than a 1 GB heap with 500 MB live, even if both allocate at the same rate.

The default max-old-space-size is computed at startup based on available physical memory. On a container with a 1 GB limit, it might default to roughly 1.4 GB on a 64-bit machine, which sounds generous until you remember that RSS includes C++ memory, Buffers, TLS overhead, and the heap itself. V8 will push the heap close to that limit, then trigger a full GC. If the live set is large, the pause is long.

The three flags that matter

1. —max-old-space-size: cap the heap before the container does

The single most important flag is --max-old-space-size. It sets the hard ceiling for the old generation. You want this ceiling to be lower than your container memory limit, because Node.js uses memory outside the V8 heap.

A practical rule: set --max-old-space-size to 70% of your containers memory limit, then subtract a fixed buffer for large Buffers or native modules. On a 1 GB container:

node --max-old-space-size=700 server.js

This forces V8 to run full collections earlier and more often. That sounds bad, but a 50 ms collection every minute is usually cheaper than a 400 ms collection every ten minutes. Your p99 thanks you.

2. —max-semi-space-size: tame the scavenges

The young generation uses two semi-spaces. By default, each is 16 MB on 64-bit systems. If your service allocates large temporary objects (JSON parsing, image processing, buffer transforms), objects that do not fit in the young space are promoted directly to old space. This is premature promotion, and it means more expensive full collections.

You can increase the semi-space size to give large temporary objects more room to die young:

node --max-semi-space-size=64 --max-old-space-size=700 server.js

Do not set this to half your heap. Scavenges copy live objects between semi-spaces, so a 512 MB semi-space means a 1 GB young generation and a scavenge copies the live set twice. The sweet spot is usually 32-128 MB for typical API workloads.

3. —heapsnapshot-near-heap-limit: debug the pause, not just the crash

When a full GC does not free enough memory, V8 will try again, then again, then crash with an out-of-memory error. By then, the container is already unhealthy. The flag --heapsnapshot-near-heap-limit=1 tells V8 to write a heap snapshot to disk just before the final GC attempts:

node --max-old-space-size=700 --max-semi-space-size=64 --heapsnapshot-near-heap-limit=1 server.js

The snapshot lands in the working directory. You can load it into Chrome DevTools and see what was alive at the peak. This is invaluable because it tells you whether the pause was caused by a leak (unbounded growth) or simply a heap that is too large for the workload.

Reading —trace-gc before you add instrumentation

You do not need a PerformanceObserver to get a quick signal. The --trace-gc flag prints every collection to stderr. A typical line looks like this:

[12345:0x...]  12345 ms: Mark-sweep 234.5 (289.2) -> 189.2 (289.2) MB, 42.1 / 0.0 ms

The format is: [pid:isolate] timestamp ms: type before_heap (total_heap) -> live_heap (total_heap) MB, pause_ms / incremental_ms.

The first number after the arrow is the live set after the collection. If that number climbs steadily over time, you have a leak. If it stays flat but the before_heap grows, you simply have a large working set and need a bigger --max-old-space-size or more pods.

Add --trace-gc to your container startup for a single day, grep the logs for Mark-sweep, and plot pause duration against time. If the pauses exceed your latency budget, you have a GC tuning problem, not a code problem. Once you see the pattern, remove the flag and switch to the PerformanceObserver approach above for continuous monitoring. You do not want --trace-gc enabled permanently, because the stderr volume can drown your logging pipeline.

The production server setup

Here is the Dockerfile entrypoint and the server bootstrap that applies the tuning and exposes the monitoring endpoint.

FROM node:20-alpine
WORKDIR /app
COPY . .
ENV NODE_ENV=production
ENV UV_THREADPOOL_SIZE=128
CMD ["node", "--max-old-space-size=700", "--max-semi-space-size=64", "--heapsnapshot-near-heap-limit=1", "server.js"]

And the health check endpoint that reports heap pressure:

const v8 = require('v8');
const http = require('http');

function getHeapPressure() {
  const stats = v8.getHeapStatistics();
  const used = stats.used_heap_size;
  const limit = stats.heap_size_limit;
  return {
    usedMb: Math.round(used / 1024 / 1024),
    limitMb: Math.round(limit / 1024 / 1024),
    percentUsed: Math.round((used / limit) * 100)
  };
}

const server = http.createServer((req, res) => {
  if (req.url === '/health') {
    const pressure = getHeapPressure();
    res.writeHead(200, { 'Content-Type': 'application/json' });
    res.end(JSON.stringify({
      status: 'ok',
      heap: pressure,
      gcTuned: true
    }));
    return;
  }
  res.writeHead(200);
  res.end('ok');
});

server.listen(3000, () => {
  console.log('Server listening on port 3000');
  console.log('Heap limit:', getHeapPressure().limitMb, 'MB');
});

Monitoring GC events in application code

Flags are static. Runtime monitoring tells you if the tuning worked. Node.js exposes GC events through perf_hooks. The following snippet logs every old-generation collection and its duration:

const { PerformanceObserver } = require('perf_hooks');

const GC_NAMES = {
  1: 'scavenge',
  2: 'markSweepCompact',
  4: 'incrementalMarking',
  8: 'weakPhantom',
  16: 'weakPhantomGlobal'
};

const obs = new PerformanceObserver((list) => {
  for (const entry of list.getEntries()) {
    // Node.js exposes kind and flags directly on the entry for gc events
    const kind = GC_NAMES[entry.kind] || `kind-${entry.kind}`;
    const duration = entry.duration;

    // Only log expensive events (mark-sweep-compact or incremental phases)
    if (entry.kind === 2 || entry.kind === 4) {
      console.log(JSON.stringify({
        event: 'gc',
        kind,
        durationMs: Math.round(duration * 100) / 100,
        flags: entry.flags || 0,
        timestamp: new Date().toISOString()
      }));
    }
  }
});

obs.observe({ entryTypes: ['gc'] });

Feed this to your structured logging pipeline. Alert when durationMs exceeds 50 ms for markSweepCompact. That is your signal that the live set is too large for the heap size you picked.

The deployment rule

Set your Kubernetes memory limit, then compute the Node.js flag from it. Never set --max-old-space-size equal to the container limit. A service with a 512 MB limit and --max-old-space-size=512 will OOM during every full GC because V8 needs headroom for the collector itself, plus the C++ memory for libuv, OpenSSL, and any native addons.

Here is the rule we use:

max_old_space = floor(container_limit_mb * 0.7) - 64
max_semi_space = min(128, floor(container_limit_mb * 0.05))

For a 2 GB container: --max-old-space-size=1360 --max-semi-space-size=64.

For a 512 MB container: --max-old-space-size=294 --max-semi-space-size=25 (round to 32).

Add a startup log that prints the effective heap limit. When your next incident starts, the first line in the logs should tell you whether the process was tuned or running defaults.

What this does not fix

If your live set is growing because of a leak, no amount of heap tuning will save you. A smaller heap will just OOM faster. Use the flags to make GC predictable, then use heap snapshots to find the leak.

If your workload is genuinely memory-heavy (image processing, large ML models), consider worker threads for the heavy work and keep the main thread heap small. Worker threads get their own V8 isolate and their own heap limit.

Summary

The default V8 heap behavior is optimized for desktop Chrome, not a containerized API server. It grows lazily and collects rarely, which turns every full GC into a latency event.

Cap the old space at 70% of your container memory minus a buffer.
Increase semi-space size if you see premature promotion in heap snapshots.
Enable heap snapshots near the limit so you can inspect the peak.
Monitor markSweepCompact duration via perf_hooks and alert on it.
Log the configured heap limit at startup.

That is the tuning. The result is not zero GC cost, but predictable GC cost that fits inside your latency budget.

A note from Yojji

Tuning V8 garbage collection for predictable latency in containerized environments is exactly the kind of low-level backend refinement that separates prototypes from production systems. Yojji is an international custom software development company with offices in Europe, the US, and the UK. Their senior engineers routinely work through these kinds of Node.js runtime details to keep backend services stable under real traffic.

此内容由惯性聚合(RSS阅读器)自动聚合整理，仅供阅读参考。原文来自 — 版权归原作者所有。

推荐订阅源

The Practical Developer

How V8 decides when to collect

The three flags that matter

1. —max-old-space-size: cap the heap before the container does

2. —max-semi-space-size: tame the scavenges

3. —heapsnapshot-near-heap-limit: debug the pause, not just the crash

Reading —trace-gc before you add instrumentation

The production server setup

Monitoring GC events in application code

The deployment rule

What this does not fix

Summary

A note from Yojji