Linux OOM Killer in Production: Why Your Node.js Containers Die Without a Stack Trace

The Practical Developer

The Libuv Thread Pool Trap: Why Node.js Async APIs Stall Under Load Postgres Covering Indexes with INCLUDE: Eliminate Heap Fetches on Read-Heavy Workloads Postgres DISTINCT ON: The Fastest Way to Get the Latest Row Per Group Postgres Transaction Isolation: The Anomalies Your App Actually Faces in Production Linux TCP Tuning for Node.js Microservices: The Kernel Settings That Stop Silent Connection Drops Under Load Postgres HOT Updates and Fillfactor: Why Not All Writes Are Created Equal Database Connection Pool Leaks: Finding the Promise That Never Returns Its Seat Postgres Materialized Views: Refresh Strategies That Do Not Lock Your Dashboards API Dependency Health Checks: Why /health Is Not Enough Authorization with Zanzibar Tuples: How Google Manages Permissions and How To Build the Same Check in Node.js Postgres Advisory Locks: The 20-Character Primitive That Replaces Redis for Coordination Dead Letter Queues: The Message Queue Pattern That Saves You at 2 a.m. File Descriptor Exhaustion: The Kernel Limit That Silently Drops Node.js Connections Graceful Degradation: The Pattern That Turns Total Outages into Partial Success PostgreSQL Full-Text Search: Dropping Elasticsearch for 90% of Use Cases S3 Presigned Multipart Uploads: Stop Your API Server from Being a File Upload Bottleneck MessagePack vs JSON: The Binary Serialization Switch That Cut Our Internal RPC Overhead by 40% DNS Caching in Node.js: The Silent Cause of Production Latency Spikes Reliable Cron Jobs: The Pattern That Stops Double Runs, Missed Executions, And The 2 AM Page GraphQL Query Complexity: Stop the OOM Query Before It Reaches Your Resolver Node.js Event Loop Lag: The Hidden Metric Behind Random Latency Spikes API Request Validation with Zod: The Schema That Catches Bad Input Before It Corrupts Your Database Load Shedding in Node.js: How to Reject Traffic Before You Drown Request Hedging: Cut Tail Latency In Half Without Overprovisioning Git Bisect: The Automated Binary Search That Finds Breaking Commits in Minutes Node.js Garbage Collection Tuning: Stop Letting V8 Pause Your Event Loop Node.js Server Timeouts: The Settings That Stop Slow Clients from Holding Sockets Hostage Postgres BRIN Indexes: The Time-Series Secret That Shrinks Indexes by 99% Event Sourcing with PostgreSQL: The Pragmatic 80% Solution Node.js Cluster Mode: Scaling the Event Loop Across CPU Cores Postgres Partial Indexes: Stopping Soft Deletes from Ruining Your Query Performance Request Coalescing with the Singleflight Pattern: Stop Drowning Your Database on Every Cache Miss The Bulkhead Pattern: Why One Slow Endpoint Should Not Drown Your Whole Service Node.js AsyncLocalStorage: End-to-End Request Context Without the Propagation Hell Postgres Deadlocks: Logging the Victim, Reproducing the Race, and Fixing the Lock Order Your Node.js HTTP Client Is the Bottleneck: Connection Pool Tuning That Works Optimistic Locking in Postgres: Stop Losing Data to Race Conditions Postgres Read Replicas: Stop Serving Stale Data to Your Users Cursor Pagination: Why Offset Queries Explode at Scale and How to Fix Them Node.js Worker Threads: 60 Lines That Stop a CSV Upload from Timing Out Every Other Request Reliable Webhook Delivery: Architecture for Outbound HTTP You Can Trust Request Timeouts and Deadline Propagation: Stop the Chain of Slowness Advanced Security Practices in Node.js Graceful Shutdown in Node.js: The 40 Lines That Stop 502s During Deploys Finding Node.js Memory Leaks with Heap Snapshots Idempotency Keys in 30 Lines: Stop Your Webhook From Charging Customers Twice Backpressure In Node.js: The Fix For Slow-Motion Queue Meltdowns Retries Done Right: Jitter, Budgets, and the Stampede You Did Not See Coming The Cache Stampede: Why Your "Just Add Redis" Layer Crashes Postgres at 3 a.m. Postgres SKIP LOCKED: An 80-Line Job Queue You Can Run Without Redis Stop Doing Work Nobody Wants: AbortController in Node.js, Done Right The N+1 Query Problem: We Found 23 In One Codebase And Killed Every One I Tried 5 AI Coding Tools for a Month. Here Is What I Actually Use CI/CD From Zero to Production in 30 Minutes With GitHub Actions Node.js vs Bun vs Deno: Which Runtime Should You Pick in 2025? Kubernetes Resource Requests And Limits: The Numbers That Decide If Your Cluster Is Stable The Three Pillars of Observability Are A Myth: What Actually Matters In Production pnpm Vs npm Vs yarn Vs Bun For Monorepos: Which One Earns The Migration In 2024 JSONB Indexing In Postgres: GIN Vs Expression Indexes, And When Each Is The Right Choice A Code Review Checklist That Ends The Same Three Arguments Every Sprint gRPC Vs REST In 2024: When The Switch Pays For Itself React Suspense For Data Fetching: The Pattern That Replaces Half Your Loading State Code The Five-Stage Rollout: How To Ship A Risky Change Without Holding Your Breath GitHub Actions In A Monorepo: Caching, Path Filters, And Secret Boundaries That Actually Work The Blameless Postmortem That Actually Improves Things: A Template And Six Hard-Won Rules Recursive CTEs In Postgres: How To Query A Tree Without N Round Trips Node.js Streams: When They Actually Help, And When They Just Add Complexity Playwright Vs Cypress In 2024: The Honest Comparison Of Which One Earns The Test Time React Server Components: The Mental Model That Makes The "use client" Boundary Obvious Pod Disruption Budgets: The K8s Object That Keeps Your Service Up During Cluster Maintenance Postgres LISTEN/NOTIFY: The Pub/Sub You Already Have And Are Not Using Chaos Engineering Starter Kit: The Five Drills That Don't Need Netflix-Scale Spec-Driven API Development With OpenAPI: How To Stop Drifting From Your Docs Saga Pattern vs Two-Phase Commit: Distributed Transactions Without The Lies Kubernetes Autoscaling Beyond CPU: The Custom-Metric HPA Pattern That Actually Works Postgres Partitioning For Time-Series: The Boring Setup That Saves Your Database Distributed Locks With Redis: An Honest Look At Redlock And When You Don't Need It HTTP/2 vs HTTP/3: What Actually Changes For Your App, And What Doesn't Image Optimization For The Web In 2023: srcset, AVIF, And The Lighthouse Score You Actually Want Kafka vs RabbitMQ: A Decision Tree That Doesn't Hate You UUID vs Bigint Primary Keys In Postgres: The Index Math That Decides For You Flame Graphs: How To Find The Slow Function In 30 Seconds Without Profiling Theatre Postgres Streaming Vs. Logical Replication: Which One Solves Your Actual Problem ESLint Rules That Earn Their Keep: The Twelve I Enable On Every Project Pre-Commit Hooks That Pay For Themselves: Husky, lint-staged, And The Five Rules That Stick Zero-Downtime Database Migrations: The Six-Step Pattern That Rules Them All Circuit Breakers In Node.js: 50 Lines That Stop A Failing Dependency From Taking Down Your Service Postgres VACUUM Is Not Magic: How Your Hot Table Bloats To 80GB And How To Fix It Kubernetes Liveness And Readiness Probes: The Difference That Causes Half Your Outages Rate Limiting In Production: A Token Bucket In 30 Lines Of Redis The Outbox Pattern: How To Stop Losing Events When Postgres And Kafka Disagree Load Testing With k6: The Three Scenarios That Find Real Bugs (Not Synthetic Numbers) Postgres Row-Level Security For Multi-Tenant Apps: The Pattern That Stops You From Leaking Data Rebase vs. Merge: The Team Policy That Ends The Argument Forever OpenTelemetry in Node.js: Distributed Tracing That Actually Helps During an Incident Feature Flags That Pay Rent: The 4 Flag Types And When To Delete Each ETag, Last-Modified, and the Caching Headers Most APIs Get Wrong Connection Pooling Without the Cargo Cult: pgbouncer in 100 Lines of Config JSONB Is Not a Schema: When To Reach For It in Postgres, And When To Stop Bash Strict Mode: The Three Lines That Stop Your Deploy Script From Lying To You

The Practica · 2026-05-25 · via The Practical Developer

The alert was short and useless: CrashLoopBackOff. I pulled the application logs and found the last entry was a routine GET /health at 14:23:07. Then nothing. No stack trace. No FATAL. No uncaughtException. The process simply stopped writing logs and the container restarted eight seconds later.

Kubernetes reported the reason immediately if you knew where to look: OOMKilled. The pod had hit its memory limit and the Linux kernel had stepped in to protect the rest of the node. But the application had no idea it was dying. V8 never ran out of heap. The garbage collector was not complaining. From Node.js’s perspective, everything was fine until the kernel sent SIGKILL, which cannot be caught, blocked, or ignored.

This is the OOM (Out-Of-Memory) killer, and it is one of the most confusing production failures because your application code is usually innocent. The problem lives in the gap between what Node.js thinks it is using, what the container runtime thinks it is using, and what the kernel actually counts against the cgroup limit. This post covers how the OOM killer makes its decisions, why containers amplify the confusion, how to read the evidence after the fact, and the application and platform changes that stop it from happening in the middle of your Tuesday afternoon.

How the OOM killer decides who dies

When a Linux system runs out of available memory, the kernel cannot allocate a page for a process that requests it. At that moment it has two choices: wait (and hope something frees memory) or kill something. The OOM killer chooses the latter. It walks every process, assigns an oom_score, and sends SIGKILL to the highest one.

The score is calculated from several factors:

RSS (Resident Set Size): How much physical RAM the process occupies. Bigger processes score higher.
Memory usage ratio: RSS divided by total available memory.
Process niceness: Lower-nice (higher priority) processes get a slight reduction in score.
Runtime: Long-running processes get a small bonus, though it is usually dominated by memory size.
oom_score_adj: A user-configurable adjustment. -1000 means “never kill this.” +1000 means “kill this first.”

In a containerized world, there is a critical twist. Kubernetes sets a cgroup memory limit on every container. When a container’s memory usage (as counted by the cgroup memory controller) crosses that limit, the kernel triggers the OOM killer inside that cgroup scope. It does not wait for the whole node to run out of RAM. A single container can OOM itself even if the node has 80% memory free.

This is where the confusion starts. Your container limit is 1 GiB. Node.js process.memoryUsage().heapUsed reports 600 MB. You should have 424 MB of headroom. But the kernel kills the pod anyway. Why?

Cgroup memory accounting: what counts against your limit

The cgroup memory controller tracks more than just your process RSS. In Kubernetes, the memory.limit_in_bytes (or memory.max in cgroup v2) is enforced by counting:

Process RSS and cache: The actual physical pages mapped by your application.
Page cache: Files read from disk that Linux keeps in memory. Node.js itself does not directly create much page cache, but log shippers, temporary file uploads, and npm install in init containers do.
Kernel memory: Sockets, TCP buffers, inode caches, and slab allocations charged to your cgroup.
Buffer cache: In older kernels and certain runtimes, I/O buffers for files opened by the process.
tmpfs mounts: If you mount an emptyDir without medium: Memory, it is disk-backed and page-cached. If you mount it with medium: Memory, it counts directly against memory limits.
Shared memory segments: If your application (or a sidecar) uses POSIX shared memory or /dev/shm, that counts.

The most common surprise for Node.js services is that the V8 heap itself is only part of the story. V8’s heap limit is typically set to about 1.5 GB on 64-bit systems unless you override it with --max-old-space-size. But V8 also allocates memory outside the heap for:

ArrayBuffers and WASM memory: These live in the V8 external memory space, not the JS heap.
Native addons: Any C++ addon (database drivers, image processing libraries, gRPC) allocates native memory via malloc or new.
Thread stacks: Worker threads each consume a few MB of stack space outside the heap.
Libuv and Node.js internals: Buffers for network I/O, event loop watchers, and TLS session caches.

When you add these up, a Node.js process whose heapUsed is 600 MB can easily have an RSS of 900 MB. If your Kubernetes limit is 1 GiB and you also have a 200 MB page cache from log files or npm caches in /tmp, the cgroup thinks you are at 1.1 GiB and the OOM killer fires.

This explains the most common OOM symptom: the application is not leaking, it is just paying for memory that does not show up in heapUsed.

Reading the evidence after the kill

When a pod is OOMKilled, Kubernetes stores the reason in the container status:

kubectl describe pod your-pod-name | grep -A 5 "Last State"

You want to see:

Last State:     Terminated
  Reason:       OOMKilled
  Exit Code:    137

Exit code 137 is 128 + 9, where 9 is SIGKILL. If you are not sure whether it was OOM or someone ran kubectl delete, look at the node kernel logs:

kubectl get node $NODE_NAME -o jsonpath='{.spec.providerID}'
# Then on the node:
journalctl -k | grep -i "killed process"

The kernel logs the exact process ID, the process name, and the memory statistics that triggered the kill:

oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=...,mems_allowed=0,...
Memory cgroup out of memory: Killed process 12345 (node) total-vm:...kB, anon-rss:...kB, file-rss:...kB, shmem-rss:...kB

The critical numbers are anon-rss (anonymous memory, mostly your heap and native allocations) and file-rss (page cache). If file-rss is large, your application or sidecar is creating disk cache inside the cgroup. If anon-rss alone is close to the limit, your process itself is heavy.

Inside the container, before the kill, you can also read the cgroup memory statistics directly:

cat /sys/fs/cgroup/memory/memory.stat

On cgroup v2 systems (most modern Kubernetes clusters):

cat /sys/fs/cgroup/memory.stat

Look for anon and file. The sum of these plus kernel_stack, slab, and sock gives you the memory usage that the kernel compares against your limit.

Node.js memory visibility: what the process can see

Node.js exposes some memory metrics via process.memoryUsage():

const usage = process.memoryUsage();
console.log({
  rss: usage.rss,
  heapTotal: usage.heapTotal,
  heapUsed: usage.heapUsed,
  external: usage.external,
  arrayBuffers: usage.arrayBuffers,
});

rss: What the operating system thinks the process uses. This is the closest to anon-rss from the cgroup perspective.
heapTotal / heapUsed: What V8 has allocated for JavaScript objects.
external: Memory allocated by V8 on behalf of JS objects but outside the JS heap, such as Buffer internal data (before Node.js 20) and external strings.
arrayBuffers: Memory backing ArrayBuffer and SharedArrayBuffer instances. This is counted separately because it is the most common source of “heap is low but RSS is high.”

If arrayBuffers or external is climbing while heapUsed stays flat, you have native memory growth. Common causes:

Streaming large payloads into Buffer objects without pipeline backpressure.
Loading large files into ArrayBuffer via fs.readFile.
Native database drivers queuing result sets in unmanaged memory.
WASM modules with large linear memories.

Here is a small diagnostic function you can drop into an existing /metrics endpoint to track the dangerous gap:

import os from 'node:os';
import fs from 'node:fs';

function getMemoryMetrics() {
  const mem = process.memoryUsage();
  const systemFree = os.freemem();
  const systemTotal = os.totalmem();

  // Best-effort cgroup memory limit detection
  let cgroupLimit;
  try {
    const v1 = fs.readFileSync('/sys/fs/cgroup/memory/memory.limit_in_bytes', 'utf8');
    cgroupLimit = parseInt(v1, 10);
  } catch {
    try {
      const v2 = fs.readFileSync('/sys/fs/cgroup/memory.max', 'utf8');
      cgroupLimit = v2.trim() === 'max' ? systemTotal : parseInt(v2, 10);
    } catch {
      cgroupLimit = systemTotal;
    }
  }

  const effectiveRss = mem.rss + (mem.external ?? 0);

  return {
    process_heap_used_bytes: mem.heapUsed,
    process_rss_bytes: mem.rss,
    process_external_bytes: mem.external ?? 0,
    process_array_buffers_bytes: mem.arrayBuffers ?? 0,
    process_effective_rss_bytes: effectiveRss,
    cgroup_memory_limit_bytes: cgroupLimit,
    cgroup_usage_ratio: Number((effectiveRss / cgroupLimit).toFixed(4)),
    system_free_bytes: systemFree,
  };
}

Export cgroup_usage_ratio to Prometheus and alert when it exceeds 0.75. Do not alert on heapUsed alone. It will lie to you.

Common root causes in Node.js services

1. Large file uploads into memory.

If you accept file uploads and store them in Buffer or ArrayBuffer before streaming to S3, every concurrent upload adds to rss and external. The fix is pipeline streaming:

import { pipeline } from 'node:stream/promises';
import { Upload } from '@aws-sdk/lib-storage';

// Bad: loads entire file into memory
// const buffer = await fs.readFile(uploadPath);

// Good: streams through without buffering
await pipeline(
  req,
  new Upload({
    client: s3Client,
    params: { Bucket: 'uploads', Key: filename, Body: req },
  }),
);

2. Native addons that allocate outside V8.

Sharp (image processing), libxmljs, and some database client libraries allocate native buffers that do not count against heapUsed. Profile these with process.memoryUsage().external or track RSS directly.

3. Worker thread memory not visible in the main thread heap.

Each worker thread has its own V8 heap and its own RSS contribution. The main thread’s process.memoryUsage() does not include worker memory. If you spawn workers for CPU-intensive tasks, you must account for them in your container limit:

const workerCount = os.cpus().length;
const baseRssEstimate = 300 * 1024 * 1024; // 300 MB base
const workerRssEstimate = 200 * 1024 * 1024; // 200 MB per worker
const containerLimit = (baseRssEstimate + workerCount * workerRssEstimate) * 1.3;

4. Sidecars stealing the cgroup budget.

Istio proxy, Fluent Bit log shippers, and vault agents all run in the same pod and share the same memory limit unless you set container-level limits individually. A log shipper that buffers a burst of stderr output can OOM the entire pod, killing your Node.js app in the crossfire.

Always set per-container limits in your deployment spec:

spec:
  containers:
    - name: api
      resources:
        limits:
          memory: "1Gi"
    - name: istio-proxy
      resources:
        limits:
          memory: "256Mi"

If you only set the pod-level limit, the sum of all containers must fit inside it, but any single container can grow until the pod limit is hit, taking the others down with it.

5. --max-old-space-size mismatched to the cgroup limit.

By default, V8 caps the old generation heap at about 1.5 GB on 64-bit systems. If your Kubernetes limit is 1 GB, V8 will happily try to grow the heap to 1.5 GB and the OOM killer will stop it at 1 GB. The result is a process that behaves like it ran out of heap (frequent GC, growing latency) but actually died from the kernel.

Set --max-old-space-size to roughly 75% of your container memory limit:

env:
  - name: NODE_OPTIONS
    value: "--max-old-space-size=768"
resources:
  limits:
    memory: "1Gi"

This gives V8 a clear ceiling below the cgroup limit, so the garbage collector has a chance to reclaim memory before the kernel intervenes. The remaining 25% is headroom for external memory, native allocations, and page cache.

Kernel OOM behavior tuning

You cannot disable the OOM killer without fundamentally changing how Linux handles memory pressure. What you can do is make it less surprising and more informative.

Enable the OOM killer log

Ensure your kernel is configured to log kills (it is by default on most distributions, but verify):

sysctl vm.oom_dump_tasks=1
sysctl vm.oom_kill_allocating_task=0

oom_dump_tasks=1 logs every process in the cgroup when the kill happens, which helps you identify whether a sidecar or the main process was the largest consumer.

oom_kill_allocating_task=0 lets the kernel kill the largest process, which is usually what you want. Setting it to 1 kills whichever process triggered the allocation that crossed the limit, which might be an innocent process that happened to allocate at the wrong moment.

Consider memory overcommit

Linux defaults to vm.overcommit_memory=0, which uses a heuristic to allow or deny allocations. Set it to 1 only if you know your workload does not over-promise memory. For containerized Node.js, leave it at 0 or 2 (strict overcommit) and size your limits correctly.

Use memory.min or memory.low in cgroup v2

If your cluster runs cgroup v2, you can set memory.min to guarantee a baseline reservation for the main container, making it less likely the kernel will choose your app when memory pressure hits:

# Not natively supported in Kubernetes Pod specs as of 1.30,
# but achievable via a custom scheduler or init container that writes to cgroupfs.

For most teams, the simpler fix is accurate sizing and explicit per-container limits.

Preventing OOM: the sizing checklist

Before your next deploy, verify:

--max-old-space-size is set to 70-80% of the container memory limit.
process.memoryUsage().rss + external is exported as a metric, not just heapUsed.
An alert fires when cgroup_usage_ratio exceeds 0.75.
File uploads and large responses use streaming, not in-memory buffering.
Worker thread count and memory usage are included in the container limit estimate.
Every container in the pod has its own memory limit, not just the pod-level limit.
Sidecars are sized explicitly and their logs are checked after any OOM incident.
Native addons are audited for external memory allocation.

The takeaway

The OOM killer is not a bug. It is Linux doing exactly what it was designed to do when memory is exhausted. The problem is that containers create a layer of indirection between your application and the kernel, and the metrics Node.js exposes by default do not show the full picture.

If you are only watching heapUsed, you are flying blind. Start tracking RSS and external memory. Size your V8 heap limit below your cgroup limit. Stream large payloads. Give every sidecar its own budget. And when a pod dies without a log entry, go straight to kubectl describe and journalctl -k before you spend an afternoon looking for a leak that was never there.

A note from Yojji

The difference between a container that survives traffic spikes and one that vanishes without a trace is often not the application code but the resource accounting layer beneath it. Understanding how the Linux kernel, cgroup controllers, and V8 negotiate memory boundaries is the kind of systems-level discipline that separates a functioning deployment from a reliable one.

Yojji is an international custom software development company founded in 2016, with offices in Europe, the US, and the UK. Their engineering teams specialize in the JavaScript ecosystem, cloud-native infrastructure on AWS, Azure, and Google Cloud, and the operational rigor that keeps production systems predictable when resources get tight.

此内容由惯性聚合(RSS阅读器)自动聚合整理，仅供阅读参考。原文来自 — 版权归原作者所有。

推荐订阅源

The Practical Developer

How the OOM killer decides who dies

Cgroup memory accounting: what counts against your limit

Reading the evidence after the kill

Node.js memory visibility: what the process can see

Common root causes in Node.js services

Kernel OOM behavior tuning

Enable the OOM killer log

Consider memory overcommit

Use memory.min or memory.low in cgroup v2

Preventing OOM: the sizing checklist

The takeaway

A note from Yojji