Stop Swallowing Your Errors: Error Cause Chaining in Node.js

The Practical Developer

The Libuv Thread Pool Trap: Why Node.js Async APIs Stall Under Load Postgres Covering Indexes with INCLUDE: Eliminate Heap Fetches on Read-Heavy Workloads Postgres DISTINCT ON: The Fastest Way to Get the Latest Row Per Group Postgres Transaction Isolation: The Anomalies Your App Actually Faces in Production Linux TCP Tuning for Node.js Microservices: The Kernel Settings That Stop Silent Connection Drops Under Load Postgres HOT Updates and Fillfactor: Why Not All Writes Are Created Equal Database Connection Pool Leaks: Finding the Promise That Never Returns Its Seat Linux OOM Killer in Production: Why Your Node.js Containers Die Without a Stack Trace Postgres Materialized Views: Refresh Strategies That Do Not Lock Your Dashboards API Dependency Health Checks: Why /health Is Not Enough Authorization with Zanzibar Tuples: How Google Manages Permissions and How To Build the Same Check in Node.js Postgres Advisory Locks: The 20-Character Primitive That Replaces Redis for Coordination Dead Letter Queues: The Message Queue Pattern That Saves You at 2 a.m. File Descriptor Exhaustion: The Kernel Limit That Silently Drops Node.js Connections Graceful Degradation: The Pattern That Turns Total Outages into Partial Success PostgreSQL Full-Text Search: Dropping Elasticsearch for 90% of Use Cases S3 Presigned Multipart Uploads: Stop Your API Server from Being a File Upload Bottleneck MessagePack vs JSON: The Binary Serialization Switch That Cut Our Internal RPC Overhead by 40% DNS Caching in Node.js: The Silent Cause of Production Latency Spikes Reliable Cron Jobs: The Pattern That Stops Double Runs, Missed Executions, And The 2 AM Page GraphQL Query Complexity: Stop the OOM Query Before It Reaches Your Resolver Node.js Event Loop Lag: The Hidden Metric Behind Random Latency Spikes API Request Validation with Zod: The Schema That Catches Bad Input Before It Corrupts Your Database Load Shedding in Node.js: How to Reject Traffic Before You Drown Request Hedging: Cut Tail Latency In Half Without Overprovisioning Git Bisect: The Automated Binary Search That Finds Breaking Commits in Minutes Node.js Garbage Collection Tuning: Stop Letting V8 Pause Your Event Loop Node.js Server Timeouts: The Settings That Stop Slow Clients from Holding Sockets Hostage Postgres BRIN Indexes: The Time-Series Secret That Shrinks Indexes by 99% Event Sourcing with PostgreSQL: The Pragmatic 80% Solution Node.js Cluster Mode: Scaling the Event Loop Across CPU Cores Postgres Partial Indexes: Stopping Soft Deletes from Ruining Your Query Performance Request Coalescing with the Singleflight Pattern: Stop Drowning Your Database on Every Cache Miss The Bulkhead Pattern: Why One Slow Endpoint Should Not Drown Your Whole Service Node.js AsyncLocalStorage: End-to-End Request Context Without the Propagation Hell Postgres Deadlocks: Logging the Victim, Reproducing the Race, and Fixing the Lock Order Your Node.js HTTP Client Is the Bottleneck: Connection Pool Tuning That Works Optimistic Locking in Postgres: Stop Losing Data to Race Conditions Postgres Read Replicas: Stop Serving Stale Data to Your Users Cursor Pagination: Why Offset Queries Explode at Scale and How to Fix Them Node.js Worker Threads: 60 Lines That Stop a CSV Upload from Timing Out Every Other Request Reliable Webhook Delivery: Architecture for Outbound HTTP You Can Trust Request Timeouts and Deadline Propagation: Stop the Chain of Slowness Advanced Security Practices in Node.js Graceful Shutdown in Node.js: The 40 Lines That Stop 502s During Deploys Finding Node.js Memory Leaks with Heap Snapshots Idempotency Keys in 30 Lines: Stop Your Webhook From Charging Customers Twice Backpressure In Node.js: The Fix For Slow-Motion Queue Meltdowns Retries Done Right: Jitter, Budgets, and the Stampede You Did Not See Coming The Cache Stampede: Why Your "Just Add Redis" Layer Crashes Postgres at 3 a.m. Postgres SKIP LOCKED: An 80-Line Job Queue You Can Run Without Redis Stop Doing Work Nobody Wants: AbortController in Node.js, Done Right The N+1 Query Problem: We Found 23 In One Codebase And Killed Every One I Tried 5 AI Coding Tools for a Month. Here Is What I Actually Use CI/CD From Zero to Production in 30 Minutes With GitHub Actions Node.js vs Bun vs Deno: Which Runtime Should You Pick in 2025? Kubernetes Resource Requests And Limits: The Numbers That Decide If Your Cluster Is Stable The Three Pillars of Observability Are A Myth: What Actually Matters In Production pnpm Vs npm Vs yarn Vs Bun For Monorepos: Which One Earns The Migration In 2024 JSONB Indexing In Postgres: GIN Vs Expression Indexes, And When Each Is The Right Choice A Code Review Checklist That Ends The Same Three Arguments Every Sprint gRPC Vs REST In 2024: When The Switch Pays For Itself React Suspense For Data Fetching: The Pattern That Replaces Half Your Loading State Code The Five-Stage Rollout: How To Ship A Risky Change Without Holding Your Breath GitHub Actions In A Monorepo: Caching, Path Filters, And Secret Boundaries That Actually Work The Blameless Postmortem That Actually Improves Things: A Template And Six Hard-Won Rules Recursive CTEs In Postgres: How To Query A Tree Without N Round Trips Node.js Streams: When They Actually Help, And When They Just Add Complexity Playwright Vs Cypress In 2024: The Honest Comparison Of Which One Earns The Test Time React Server Components: The Mental Model That Makes The "use client" Boundary Obvious Pod Disruption Budgets: The K8s Object That Keeps Your Service Up During Cluster Maintenance Postgres LISTEN/NOTIFY: The Pub/Sub You Already Have And Are Not Using Chaos Engineering Starter Kit: The Five Drills That Don't Need Netflix-Scale Spec-Driven API Development With OpenAPI: How To Stop Drifting From Your Docs Kubernetes Autoscaling Beyond CPU: The Custom-Metric HPA Pattern That Actually Works Postgres Partitioning For Time-Series: The Boring Setup That Saves Your Database Distributed Locks With Redis: An Honest Look At Redlock And When You Don't Need It HTTP/2 vs HTTP/3: What Actually Changes For Your App, And What Doesn't Image Optimization For The Web In 2023: srcset, AVIF, And The Lighthouse Score You Actually Want Kafka vs RabbitMQ: A Decision Tree That Doesn't Hate You UUID vs Bigint Primary Keys In Postgres: The Index Math That Decides For You Flame Graphs: How To Find The Slow Function In 30 Seconds Without Profiling Theatre Postgres Streaming Vs. Logical Replication: Which One Solves Your Actual Problem ESLint Rules That Earn Their Keep: The Twelve I Enable On Every Project Pre-Commit Hooks That Pay For Themselves: Husky, lint-staged, And The Five Rules That Stick Zero-Downtime Database Migrations: The Six-Step Pattern That Rules Them All Circuit Breakers In Node.js: 50 Lines That Stop A Failing Dependency From Taking Down Your Service Postgres VACUUM Is Not Magic: How Your Hot Table Bloats To 80GB And How To Fix It Kubernetes Liveness And Readiness Probes: The Difference That Causes Half Your Outages Rate Limiting In Production: A Token Bucket In 30 Lines Of Redis The Outbox Pattern: How To Stop Losing Events When Postgres And Kafka Disagree Load Testing With k6: The Three Scenarios That Find Real Bugs (Not Synthetic Numbers) Postgres Row-Level Security For Multi-Tenant Apps: The Pattern That Stops You From Leaking Data Rebase vs. Merge: The Team Policy That Ends The Argument Forever OpenTelemetry in Node.js: Distributed Tracing That Actually Helps During an Incident Feature Flags That Pay Rent: The 4 Flag Types And When To Delete Each ETag, Last-Modified, and the Caching Headers Most APIs Get Wrong Connection Pooling Without the Cargo Cult: pgbouncer in 100 Lines of Config JSONB Is Not a Schema: When To Reach For It in Postgres, And When To Stop Bash Strict Mode: The Three Lines That Stop Your Deploy Script From Lying To You

The Practica · 2026-06-09 · via The Practical Developer

You are debugging a production incident. A user’s payment failed. The logs show a single line: Error: Payment processing failed. No database query. No network call. No upstream service name. No hint about what actually broke. Somewhere in the call stack, a lower-level error was caught, wrapped in a generic message, and rethrown. The original error, the one with the useful stack trace and the actionable message, is gone.

This is not an edge case. It happens in every codebase that grows past a single file. You catch an error from a database driver, wrap it in a business-logic error, and throw it up to the next layer. That next layer wraps it again. By the time the error reaches your global handler, the original cause is buried under layers of generic messages, and you are left staring at Error: Something went wrong with no way to trace back to the SQL query that failed or the HTTP response code that came back wrong.

JavaScript has had a solution since ES2022. It is Error.cause, and it works in every modern Node.js version (16.9 and later), every shipping browser, and TypeScript 4.7+. It is one parameter on the Error constructor that preserves the full error chain, and most teams still do not use it.

The problem: wrapping errors destroys context

Every production Node.js service eventually does something like this:

async function processOrder(orderId: string) {
  try {
    await chargePayment(orderId);
  } catch (err) {
    throw new Error('Payment processing failed');
  }
}

The caller of processOrder sees Error: Payment processing failed. The err that had the actual details (a timeout from the payment gateway, an invalid response code, a network DNS failure) is gone. The original stack trace is lost. The HTTP status code that would tell you whether to retry is lost. The correlation ID that lives on the original error is lost.

Before Error.cause, teams hacked around this in several creative ways:

Hack 1: String concatenation on the message.

catch (err) {
  throw new Error(`Payment processing failed: ${err.message}`);
}

This preserves the message but destroys the stack trace. You lose the file and line number where the database call failed, the call chain that led there, and any structured metadata on the error. If err had a statusCode property or a retryable flag, you lost those too.

Hack 2: Attach the original error as a property.

catch (err) {
  const wrapped = new Error('Payment processing failed');
  wrapped.originalError = err;
  throw wrapped;
}

This works but only if every consumer knows about your custom originalError property. Your logging library does not know about it. Your error monitoring service does not know about it. Your team’s next hire does not know about it. It is a convention that requires documentation, training, and code review to survive. In practice, it survives for about two weeks before someone forgets.

Hack 3: Pass through without wrapping.

async function processOrder(orderId: string) {
  await chargePayment(orderId);
}

This preserves the original error but leaks implementation details. The caller now sees Error: timeout reading from payments.example.com:443 from a function called processOrder. The abstraction is broken. If you later switch payment providers, every caller’s error-handling logic breaks because the error messages changed.

None of these are good. Error.cause solves all of them.

Error.cause is a one-line fix

The cause option was added to the Error constructor in ES2022. It accepts any value (usually another Error instance) and stores it in a .cause property. The property is not printed in the default stack trace, but every serious error monitoring tool (Sentry, Datadog, OpenTelemetry) reads it natively.

Here is the same processOrder function with cause:

async function processOrder(orderId: string) {
  try {
    await chargePayment(orderId);
  } catch (err) {
    throw new Error('Payment processing failed', { cause: err });
  }
}

That is the entire change. One parameter. The original error is preserved intact, with its full stack trace, its message, its custom properties, and any nested cause of its own.

When you log this error, you get:

Error: Payment processing failed
    at processOrder (/app/orders.ts:12:11)
    ... (outer stack trace)
  cause: TimeoutError: Connection timed out after 5000ms
      at chargePayment (/app/payments.ts:42:17)
      ... (inner stack trace)
    cause: Error: connect ECONNREFUSED 10.0.1.5:443
        at TLSSocket.onConnectEnd (node:_tls_wrap:1702:19)
        ... (innermost stack trace)

Three layers deep. Each layer preserved. No string formatting. No custom properties. No lost context.

The pattern: wrap at domain boundaries

The real power of Error.cause emerges when you establish a consistent wrapping convention. The rule is simple: wrap errors at every domain boundary, and always use cause.

Domain boundaries in a typical Node.js service include:

Database layer to repository layer: wrap a driver-level error with a persistence error.
Repository layer to service layer: wrap a persistence error with a business-logic error.
Service layer to controller layer: wrap a business error with an HTTP-specific error.
Controller layer to global handler: add request context as a cause or aggregated error.
External API client: wrap network or protocol errors with a domain-specific error.

Here is a practical example with a user profile service:

// Database/Repository layer
async function findUserById(id: string): Promise<User> {
  try {
    return await db.query('SELECT * FROM users WHERE id = $1', [id]);
  } catch (err) {
    throw new Error('Database query failed', { cause: err });
  }
}

// Service layer
async function getUserProfile(userId: string): Promise<UserProfile> {
  try {
    const user = await findUserById(userId);
    return { id: user.id, name: user.name, email: user.email };
  } catch (err) {
    throw new Error('Failed to retrieve user profile', { cause: err });
  }
}

// Controller layer
app.get('/api/users/:id', async (req, res) => {
  try {
    const profile = await getUserProfile(req.params.id);
    res.json(profile);
  } catch (err) {
    // err.cause.cause is the original DB error
    reportError(err);
    res.status(500).json({ error: 'Internal server error' });
  }
});

When the database goes down, the error chain reads:

Error: Failed to retrieve user profile
    at getUserProfile
  cause: Error: Database query failed
      at findUserById
    cause: Error: connect ECONNREFUSED /var/run/postgresql/.s.PGSQL.5432
        at ...

Every layer added context. Every layer preserved the previous layer’s information. The controller knows exactly what happened, and if it needs to check whether the root cause is a connection error (to decide whether to retry), it can walk the .cause chain.

Walking the cause chain programmatically

The real benefit of Error.cause becomes visible when you need to make decisions based on the root cause. A retry policy, for example, should only retry on transient errors. With cause, you can look through the chain:

function isRetryable(error: Error): boolean {
  let current: unknown = error;
  while (current instanceof Error) {
    if (
      current.message.includes('ECONNREFUSED') ||
      current.message.includes('ETIMEDOUT') ||
      current.message.includes('ECONNRESET') ||
      current.name === 'TimeoutError'
    ) {
      return true;
    }
    current = (current as any).cause;
  }
  return false;
}

Or, more robustly, check for a property on the error:

interface RetryableError {
  retryable: boolean;
  cause?: unknown;
}

function isRetryable(error: unknown): boolean {
  let current: unknown = error;
  while (current instanceof Error) {
    if ((current as RetryableError).retryable) {
      return true;
    }
    current = (current as RetryableError).cause;
  }
  return false;
}

This allows a database driver to set retryable: true on its errors, and your retry logic to find that flag no matter how many times the error was wrapped. No fragile string matching. No instanceof checks that break when you swap libraries.

Structured logging with cause chains

If you use structured logging (Pino, Bunyan, Winston), you already have correlation IDs on every log line. The missing piece is including the full error chain. Here is a logging utility that extracts the entire cause chain:

import pino from 'pino';

const logger = pino();

function flattenErrorChain(error: Error): object[] {
  const chain: object[] = [];
  let current: unknown = error;
  while (current instanceof Error) {
    chain.push({
      message: current.message,
      name: current.name,
      stack: current.stack,
      ...extractCustomProps(current),
    });
    current = (current as any).cause;
  }
  return chain;
}

function extractCustomProps(error: Error): Record<string, unknown> {
  const props: Record<string, unknown> = {};
  for (const key of Object.getOwnPropertyNames(error)) {
    if (key !== 'message' && key !== 'stack' && key !== 'name' && key !== 'cause') {
      props[key] = (error as any)[key];
    }
  }
  return props;
}

// Usage
try {
  await processOrder('ord_123');
} catch (err) {
  logger.error({ err, causeChain: flattenErrorChain(err) }, 'Order processing failed');
}

The log output includes the full chain as a JSON array, which error monitoring tools can parse and display as a dependency graph. Sentry, for example, has native support for cause and displays the chain as linked issues.

Custom error classes that support cause

For TypeScript projects, the pattern is even cleaner with custom error classes:

export class AppError extends Error {
  constructor(
    message: string,
    public readonly statusCode: number = 500,
    options?: { cause?: Error; retryable?: boolean }
  ) {
    super(message, options);
    this.name = 'AppError';
  }
}

export class NotFoundError extends AppError {
  constructor(resource: string, options?: { cause?: Error }) {
    super(`${resource} not found`, 404, options);
    this.name = 'NotFoundError';
  }
}

export class ValidationError extends AppError {
  constructor(
    message: string,
    options?: { cause?: Error }
  ) {
    super(message, 400, options);
    this.name = 'ValidationError';
  }
}

Now every layer can wrap with typed errors:

async function getAccount(id: string) {
  try {
    return await db.query('SELECT * FROM accounts WHERE id = $1', [id]);
  } catch (err) {
    throw new AppError('Failed to query accounts', 500, {
      cause: err,
      retryable: true,
    });
  }
}

The error monitoring tool sees retryable: true on the outer error. The cause preserves the database-level stack trace. Everyone wins.

The three traps most teams hit

Trap 1: Circular cause chains.

If you accidentally set err.cause = err, you create an infinite loop. This is rare but devastating. Always validate that options.cause is not the same object as the error you are creating:

catch (err) {
  // BAD: creates circular reference
  throw new Error('failed', { cause: err });

  // Also BAD if err is the error you're creating
  const myErr = new Error('failed');
  myErr.cause = myErr; // circular!
  throw myErr;
}

JavaScript does not protect you from this. Your log serializer will either crash or hang when it tries to JSON.stringify the chain.

Trap 2: Losing non-Error causes.

The cause option accepts any value. Some libraries pass strings or plain objects:

throw new Error('Validation failed', { cause: 'email is invalid' });

This defeats the purpose. Always pass an Error instance as the cause, so you preserve the stack trace and type information. If you are wrapping an exception from a language that does not throw Error instances (some HTTP libraries throw strings), normalize it first:

catch (err) {
  const cause = err instanceof Error ? err : new Error(String(err));
  throw new AppError('Request failed', 500, { cause });
}

Trap 3: Forgetting that AggregateError also supports cause.

The AggregateError constructor (used by Promise.any and the Scheduler API) also accepts cause:

try {
  await Promise.any([unreliableService1(), unreliableService2()]);
} catch (err) {
  if (err instanceof AggregateError) {
    throw new Error('All services failed', { cause: err });
  }
}

The AggregateError.errors array preserves every individual rejection, and the outer cause preserves that as a single node in the chain. When you walk the chain, you need to handle this case:

function flattenErrorChain(error: Error): object[] {
  const chain: object[] = [];
  let current: unknown = error;
  while (current instanceof Error) {
    const entry: any = {
      message: current.message,
      name: current.name,
    };
    if (current instanceof AggregateError) {
      entry.errors = current.errors.map((e: unknown) =>
        e instanceof Error ? flattenErrorChain(e) : e
      );
    }
    chain.push(entry);
    current = (current as any).cause;
  }
  return chain;
}

What to do on Monday

The migration path is short enough to do in one afternoon:

Find every catch block that wraps an error. The pattern is catch(err) { throw new SomeError('message') } without passing err. Add { cause: err } to the constructor call.
Standardize your error classes. Create one base AppError class that accepts options including cause. Extend it for each error category. This makes the pattern automatic instead of a judgment call.
Update your logging utility. Add a flattenErrorChain function that extracts the full chain. Include it in every error log.
Configure your error monitoring tool. Sentry, Datadog, and New Relic all support cause natively. Make sure it is enabled. If you use a custom monitoring setup, walk the cause chain in your error reporter and send each layer as a separate span or event.
Remove all originalError, innerError, previousError custom properties. They are dead code now. Every tool reads cause. The convention is standard.

The one-line habit: every time you write throw new Error(...) in a catch block, add { cause: err }. It adds five characters and saves hours of debugging. Do not think about whether the consumer will use it. Just do it. The cost is zero. The payoff comes the night a payment fails and the chain tells you exactly which connection dropped.

A note from Yojji

Error handling that works in production (cause chains, structured logging, retry classification, and instrumentation that actually helps you find the root cause) is the difference between a 3am incident call that takes twenty minutes and one that takes twenty seconds. It is the kind of unglamorous backend infrastructure Yojji has been shipping since 2016.

Yojji is an international custom software development company with offices in Europe, the US, and the UK. Their teams work across the JavaScript ecosystem (Node.js, TypeScript, React), cloud platforms, and microservices architecture. They run dedicated senior outstaffed teams alongside full-cycle product engagements covering discovery, design, development, QA, and DevOps.

If your team spends more time debugging error chains than building features, Yojji is worth a conversation.

此内容由惯性聚合(RSS阅读器)自动聚合整理，仅供阅读参考。原文来自 — 版权归原作者所有。

推荐订阅源

The Practical Developer

The problem: wrapping errors destroys context

Error.cause is a one-line fix

The pattern: wrap at domain boundaries

Walking the cause chain programmatically

Structured logging with cause chains

Custom error classes that support cause

The three traps most teams hit

What to do on Monday

A note from Yojji