Your Node.js HTTP Client Is the Bottleneck: Connection Pool Tuning That Works

The Practical Developer

The Libuv Thread Pool Trap: Why Node.js Async APIs Stall Under Load Postgres Covering Indexes with INCLUDE: Eliminate Heap Fetches on Read-Heavy Workloads Postgres DISTINCT ON: The Fastest Way to Get the Latest Row Per Group Postgres Transaction Isolation: The Anomalies Your App Actually Faces in Production Linux TCP Tuning for Node.js Microservices: The Kernel Settings That Stop Silent Connection Drops Under Load Postgres HOT Updates and Fillfactor: Why Not All Writes Are Created Equal Database Connection Pool Leaks: Finding the Promise That Never Returns Its Seat Linux OOM Killer in Production: Why Your Node.js Containers Die Without a Stack Trace Postgres Materialized Views: Refresh Strategies That Do Not Lock Your Dashboards API Dependency Health Checks: Why /health Is Not Enough Authorization with Zanzibar Tuples: How Google Manages Permissions and How To Build the Same Check in Node.js Postgres Advisory Locks: The 20-Character Primitive That Replaces Redis for Coordination Dead Letter Queues: The Message Queue Pattern That Saves You at 2 a.m. File Descriptor Exhaustion: The Kernel Limit That Silently Drops Node.js Connections Graceful Degradation: The Pattern That Turns Total Outages into Partial Success PostgreSQL Full-Text Search: Dropping Elasticsearch for 90% of Use Cases S3 Presigned Multipart Uploads: Stop Your API Server from Being a File Upload Bottleneck MessagePack vs JSON: The Binary Serialization Switch That Cut Our Internal RPC Overhead by 40% DNS Caching in Node.js: The Silent Cause of Production Latency Spikes Reliable Cron Jobs: The Pattern That Stops Double Runs, Missed Executions, And The 2 AM Page GraphQL Query Complexity: Stop the OOM Query Before It Reaches Your Resolver Node.js Event Loop Lag: The Hidden Metric Behind Random Latency Spikes API Request Validation with Zod: The Schema That Catches Bad Input Before It Corrupts Your Database Load Shedding in Node.js: How to Reject Traffic Before You Drown Request Hedging: Cut Tail Latency In Half Without Overprovisioning Git Bisect: The Automated Binary Search That Finds Breaking Commits in Minutes Node.js Garbage Collection Tuning: Stop Letting V8 Pause Your Event Loop Node.js Server Timeouts: The Settings That Stop Slow Clients from Holding Sockets Hostage Postgres BRIN Indexes: The Time-Series Secret That Shrinks Indexes by 99% Event Sourcing with PostgreSQL: The Pragmatic 80% Solution Node.js Cluster Mode: Scaling the Event Loop Across CPU Cores Postgres Partial Indexes: Stopping Soft Deletes from Ruining Your Query Performance Request Coalescing with the Singleflight Pattern: Stop Drowning Your Database on Every Cache Miss The Bulkhead Pattern: Why One Slow Endpoint Should Not Drown Your Whole Service Node.js AsyncLocalStorage: End-to-End Request Context Without the Propagation Hell Postgres Deadlocks: Logging the Victim, Reproducing the Race, and Fixing the Lock Order Optimistic Locking in Postgres: Stop Losing Data to Race Conditions Postgres Read Replicas: Stop Serving Stale Data to Your Users Cursor Pagination: Why Offset Queries Explode at Scale and How to Fix Them Node.js Worker Threads: 60 Lines That Stop a CSV Upload from Timing Out Every Other Request Reliable Webhook Delivery: Architecture for Outbound HTTP You Can Trust Request Timeouts and Deadline Propagation: Stop the Chain of Slowness Advanced Security Practices in Node.js Graceful Shutdown in Node.js: The 40 Lines That Stop 502s During Deploys Finding Node.js Memory Leaks with Heap Snapshots Idempotency Keys in 30 Lines: Stop Your Webhook From Charging Customers Twice Backpressure In Node.js: The Fix For Slow-Motion Queue Meltdowns Retries Done Right: Jitter, Budgets, and the Stampede You Did Not See Coming The Cache Stampede: Why Your "Just Add Redis" Layer Crashes Postgres at 3 a.m. Postgres SKIP LOCKED: An 80-Line Job Queue You Can Run Without Redis Stop Doing Work Nobody Wants: AbortController in Node.js, Done Right The N+1 Query Problem: We Found 23 In One Codebase And Killed Every One I Tried 5 AI Coding Tools for a Month. Here Is What I Actually Use CI/CD From Zero to Production in 30 Minutes With GitHub Actions Node.js vs Bun vs Deno: Which Runtime Should You Pick in 2025? Kubernetes Resource Requests And Limits: The Numbers That Decide If Your Cluster Is Stable The Three Pillars of Observability Are A Myth: What Actually Matters In Production pnpm Vs npm Vs yarn Vs Bun For Monorepos: Which One Earns The Migration In 2024 JSONB Indexing In Postgres: GIN Vs Expression Indexes, And When Each Is The Right Choice A Code Review Checklist That Ends The Same Three Arguments Every Sprint gRPC Vs REST In 2024: When The Switch Pays For Itself React Suspense For Data Fetching: The Pattern That Replaces Half Your Loading State Code The Five-Stage Rollout: How To Ship A Risky Change Without Holding Your Breath GitHub Actions In A Monorepo: Caching, Path Filters, And Secret Boundaries That Actually Work The Blameless Postmortem That Actually Improves Things: A Template And Six Hard-Won Rules Recursive CTEs In Postgres: How To Query A Tree Without N Round Trips Node.js Streams: When They Actually Help, And When They Just Add Complexity Playwright Vs Cypress In 2024: The Honest Comparison Of Which One Earns The Test Time React Server Components: The Mental Model That Makes The "use client" Boundary Obvious Pod Disruption Budgets: The K8s Object That Keeps Your Service Up During Cluster Maintenance Postgres LISTEN/NOTIFY: The Pub/Sub You Already Have And Are Not Using Chaos Engineering Starter Kit: The Five Drills That Don't Need Netflix-Scale Spec-Driven API Development With OpenAPI: How To Stop Drifting From Your Docs Saga Pattern vs Two-Phase Commit: Distributed Transactions Without The Lies Kubernetes Autoscaling Beyond CPU: The Custom-Metric HPA Pattern That Actually Works Postgres Partitioning For Time-Series: The Boring Setup That Saves Your Database Distributed Locks With Redis: An Honest Look At Redlock And When You Don't Need It HTTP/2 vs HTTP/3: What Actually Changes For Your App, And What Doesn't Image Optimization For The Web In 2023: srcset, AVIF, And The Lighthouse Score You Actually Want Kafka vs RabbitMQ: A Decision Tree That Doesn't Hate You UUID vs Bigint Primary Keys In Postgres: The Index Math That Decides For You Flame Graphs: How To Find The Slow Function In 30 Seconds Without Profiling Theatre Postgres Streaming Vs. Logical Replication: Which One Solves Your Actual Problem ESLint Rules That Earn Their Keep: The Twelve I Enable On Every Project Pre-Commit Hooks That Pay For Themselves: Husky, lint-staged, And The Five Rules That Stick Zero-Downtime Database Migrations: The Six-Step Pattern That Rules Them All Circuit Breakers In Node.js: 50 Lines That Stop A Failing Dependency From Taking Down Your Service Postgres VACUUM Is Not Magic: How Your Hot Table Bloats To 80GB And How To Fix It Kubernetes Liveness And Readiness Probes: The Difference That Causes Half Your Outages Rate Limiting In Production: A Token Bucket In 30 Lines Of Redis The Outbox Pattern: How To Stop Losing Events When Postgres And Kafka Disagree Load Testing With k6: The Three Scenarios That Find Real Bugs (Not Synthetic Numbers) Postgres Row-Level Security For Multi-Tenant Apps: The Pattern That Stops You From Leaking Data Rebase vs. Merge: The Team Policy That Ends The Argument Forever OpenTelemetry in Node.js: Distributed Tracing That Actually Helps During an Incident Feature Flags That Pay Rent: The 4 Flag Types And When To Delete Each ETag, Last-Modified, and the Caching Headers Most APIs Get Wrong Connection Pooling Without the Cargo Cult: pgbouncer in 100 Lines of Config JSONB Is Not a Schema: When To Reach For It in Postgres, And When To Stop Bash Strict Mode: The Three Lines That Stop Your Deploy Script From Lying To You

The Practica · 2026-05-13 · via The Practical Developer

You run a load test against your API. CPU is at 40%, memory is flat, database p95 is under 10 ms. Everything looks healthy. Then the error rate jumps: ConnectTimeoutError, UND_ERR_CONNECT_TIMEOUT, or the generic socket hang up. You scale the API pods from four to eight. The spikes shrink but do not disappear. You scale to twelve. Same pattern, higher bill.

The downstream service you are calling is fast. It handles 10,000 RPS in its own load tests. The problem is not the service. The problem is that your Node.js client opens, negotiates, and closes a TCP connection for every single request, or it exhausts a small default pool and queues requests behind a gate that has nothing to do with your business logic.

This post shows how Node.js manages HTTP connections, how to read the real signals, and the two config lines that fix most pool-related latency spikes.

Where the default behavior hurts you

Node.js 18+ ships global.fetch powered by undici, which uses a connection pool under the hood. Before that, most production code used node:http, axios, or node-fetch with an http.Agent. In every case, there is a pool: a set of reusable TCP connections to the same origin.

The defaults are tuned for browsers, not servers:

undici’s default connections per origin: 6
axios with the default agent: Infinity (unbounded, which is its own disaster)
http.globalAgent.maxSockets: Infinity in older Node, capped behavior varies
undici’s default keepAliveTimeout: 4 seconds

Six connections per origin sounds fine until your service is a microservice that makes three downstream calls per request, each to a different origin, and you have twelve workers per pod. Under moderate load, eighteen requests hit the same origin simultaneously. Six grab a connection, twelve wait in a FIFO queue. The queue time shows up as latency, not CPU load, so your dashboards lie to you.

The Infinity case is worse. Every concurrent request opens its own TCP connection. Eventually you hit the local port range limit, the ephemeral port table fills with TIME_WAIT sockets, and new connections fail with ECONNREFUSED or EADDRNOTAVAIL even though the target is healthy.

Neither default is right for a backend. You need a bounded pool, sized to your concurrency, with keep-alive tuned to your infrastructure.

Measuring before you fix

Do not guess at pool size. Measure concurrent connections from the client side and active sockets on the host.

From inside the Node process, undici exposes pool stats if you use a custom dispatcher:

import { Agent, Pool } from 'undici';

const agent = new Agent({
  connections: 64,
  keepAliveTimeout: 30_000,
});

setInterval(() => {
  const stats = agent.getPoolStats('https://billing.internal');
  console.log(JSON.stringify({
    origin: 'https://billing.internal',
    connected: stats.connected,
    free: stats.free,
    pending: stats.pending,
    queued: stats.queued,
    running: stats.running,
  }));
}, 10_000).unref();

Watch pending and queued. pending means a TCP handshake is in flight. queued means a request is waiting for a free connection. If queued is consistently above zero under load, your pool is too small for your concurrency.

If you cannot instrument undici directly, fall back to operating-system metrics. You want the number of sockets in ESTABLISHED or TIME_WAIT to the downstream IP:

# Count sockets to a specific downstream by state
ss -tan state established dst 10.0.4.17 | wc -l
ss -tan state time-wait dst 10.0.4.17 | wc -l

If time-wait is in the tens of thousands, you are opening too many connections and not reusing them. If established flatlines at a suspicious round number like 6 or 12, you are likely hitting the default pool cap.

The fix: configure undici or the agent

If you are on Node 18+ and using global.fetch, the cleanest fix is a custom undici.Agent registered as the global dispatcher. This replaces the implicit default for every fetch call in the process.

import { Agent, setGlobalDispatcher } from 'undici';

const agent = new Agent({
  connections: 128,
  keepAliveTimeout: 30_000,
  keepAliveMaxTimeout: 30_000,
  connect: {
    timeout: 5_000,
    rejectUnauthorized: false, // only for internal mTLS you terminate elsewhere
  },
});

setGlobalDispatcher(agent);

connections: 128 means 128 sockets per origin. Tune this to your peak concurrency per origin, not some abstract multiple. A good starting point: (expected concurrent requests to this origin) × 1.5. If your API handles 200 concurrent requests and each calls the downstream once, 200 × 1.5 = 300. If the downstream is called multiple times per request, multiply accordingly.

keepAliveTimeout: 30_000 keeps idle sockets open for 30 seconds. The default 4 seconds is tuned for browsers where users change pages and tabs constantly. In a server process talking to a fixed set of upstreams, reconnecting every 4 seconds is pure waste.

If you are still on axios or raw http.request, pass an explicit agent:

import http from 'node:http';
import https from 'node:https';
import axios from 'axios';

const agent = new https.Agent({
  maxSockets: 128,
  maxFreeSockets: 128,
  keepAlive: true,
  keepAliveMsecs: 30_000,
  timeout: 5_000,
});

const client = axios.create({
  baseURL: 'https://billing.internal',
  httpAgent: agent,
  httpsAgent: agent,
  timeout: 5_000,
});

The key fields:

maxSockets: upper bound on concurrent connections per origin. The default Infinity is dangerous in a server.
maxFreeSockets: how many idle sockets to keep open. Set it equal to maxSockets so you do not throw away warm connections.
keepAlive: without this, every request is a new TCP handshake.
timeout: total socket timeout, distinct from the HTTP-level request timeout.

When pooling is not enough: TIME_WAIT and ephemeral ports

Even with a correctly sized pool, you can still exhaust ports if connections churn faster than the OS cleans them up. TCP requires the side that closes first to hold the socket in TIME_WAIT for twice the maximum segment lifetime, typically 60 seconds. A socket in TIME_WAIT still occupies an ephemeral port.

The default ephemeral port range on Linux is roughly 32,768–61,000, giving about 28,000 ports. If your process opens and closes 500 connections per second, you burn through 30,000 ports in a minute. New connections fail even though the downstream has capacity.

Fixes, in order of preference:

Reuse connections. A warm pool with keep-alive should rarely close sockets. If you see high TIME_WAIT, check whether your upstream is sending Connection: close or whether your own keepAliveTimeout is too short.
Enable net.ipv4.tcp_tw_reuse. This lets the kernel reuse TIME_WAIT sockets for outgoing connections when the timestamp is safe. It is safe on modern kernels and does not break TCP semantics. Do not use tcp_tw_recycle; it was removed for a reason.

sysctl -w net.ipv4.tcp_tw_reuse=1

Increase the ephemeral port range. Only if the above is not enough:

sysctl -w net.ipv4.ip_local_port_range="15000 65000"

Run a connection proxy. If you have thousands of short-lived connections, consider a local socks or HTTP proxy that multiplexes, or switch to HTTP/2 where a single TCP connection carries many streams. Node.js undici supports HTTP/2 with the allowH2 flag.

Do not forget DNS caching

Connection pooling helps only if the same origin string maps to the same pool. If you resolve a domain to a different IP on every request, the pool is effectively fragmented. Node.js does not cache DNS by default. Every fetch to https://billing.internal may trigger a getaddrinfo call.

Under load, DNS lookups become a hidden bottleneck. Either run a local resolver like systemd-resolved or dnsmasq on the host, or cache lookups in the process:

import dns from 'node:dns';
import { promisify } from 'node:util';

const lookupCache = new Map();
const dnsLookup = promisify(dns.lookup);

async function cachedLookup(hostname, options) {
  const key = `${hostname}:${options?.family ?? 0}`;
  if (lookupCache.has(key)) {
    const { address, family, ttl } = lookupCache.get(key);
    if (Date.now() < ttl) return { address, family };
  }
  const result = await dnsLookup(hostname, options);
  lookupCache.set(key, {
    address: result.address,
    family: result.family,
    ttl: Date.now() + 60_000,
  });
  return result;
}

Pass this to undici via the connect option:

const agent = new Agent({
  connections: 128,
  keepAliveTimeout: 30_000,
  connect: {
    lookup: cachedLookup,
    timeout: 5_000,
  },
});

This removes DNS latency from the hot path and ensures connections to the same hostname reuse the same TCP sockets regardless of whether your resolver is slow or fast.

Putting it together: a production-ready fetch wrapper

Here is a small module you can drop into a service. It wires pool sizing, DNS caching, keep-alive, and reasonable defaults.

// lib/httpClient.js
import { Agent, setGlobalDispatcher } from 'undici';
import dns from 'node:dns';
import { promisify } from 'node:util';

const dnsLookup = promisify(dns.lookup);
const dnsCache = new Map();

function makeCachedLookup(ttlMs = 60_000) {
  return async function cachedLookup(hostname, options) {
    const key = `${hostname}:${options?.family ?? 4}`;
    const cached = dnsCache.get(key);
    if (cached && Date.now() < cached.expiry) {
      return { address: cached.address, family: cached.family };
    }
    const { address, family } = await dnsLookup(hostname, options);
    dnsCache.set(key, { address, family, expiry: Date.now() + ttlMs });
    return { address, family };
  };
}

export function configureHttpClient(opts = {}) {
  const agent = new Agent({
    connections: opts.connections ?? 128,
    keepAliveTimeout: opts.keepAliveTimeout ?? 30_000,
    keepAliveMaxTimeout: opts.keepAliveMaxTimeout ?? 30_000,
    connect: {
      lookup: makeCachedLookup(opts.dnsTtlMs),
      timeout: opts.connectTimeout ?? 5_000,
    },
  });

  setGlobalDispatcher(agent);

  return {
    getPoolStats(origin) {
      try {
        return agent.getPoolStats(origin);
      } catch {
        return null;
      }
    },
  };
}

Initialize it once at startup:

import { configureHttpClient } from './lib/httpClient.js';

const metrics = configureHttpClient({ connections: 256 });

setInterval(() => {
  const stats = metrics.getPoolStats('https://billing.internal');
  if (stats) {
    console.log('billing_pool_queued', stats.queued);
    console.log('billing_pool_running', stats.running);
  }
}, 15_000).unref();

After this, fetch() anywhere in your code uses the tuned pool. No wrapper needed for every call. No accidental new Agent() in a helper library that erases your config.

Practical takeaway

Connection pool misconfiguration is one of those problems that looks like anything else. Your dashboards show CPU, memory, and database time, but they rarely export http_client_queued_requests. You chase the wrong metric, scale the wrong tier, and spend money on pods that are just waiting for a socket.

The fix is three steps:

Measure. Export pool stats or use ss to find queued requests and TIME_WAIT sockets.
Size the pool. Set connections (or maxSockets) to your peak concurrency per origin, not the default.
Keep connections alive. Set keep-alive to at least 30 seconds, cache DNS, and monitor queued as a first-class metric.

Throwing pods at a pool bottleneck is like adding lanes to a highway that ends in a toll booth with one gate. Fix the gate.

A note from Yojji

The gap between “it works on my machine” and “it stays up under production concurrency” is often not in the business logic. It is in the plumbing: connection pools, DNS caching, and kernel tuning. Yojji’s engineering teams handle these details as a matter of course when they build and scale backends for clients, whether that means Node.js microservices, cloud-native APIs, or infrastructure that does not fall over when traffic doubles.

Yojji is an international custom software development company founded in 2016, with teams across Europe, the US, and the UK. They specialize in the JavaScript ecosystem (React, Node.js, TypeScript), cloud platforms (AWS, Azure, GCP), and the kind of backend reliability engineering that keeps services responsive when the load test becomes real traffic.

此内容由惯性聚合(RSS阅读器)自动聚合整理，仅供阅读参考。原文来自 — 版权归原作者所有。

推荐订阅源

The Practical Developer

Where the default behavior hurts you

Measuring before you fix

The fix: configure undici or the agent

When pooling is not enough: TIME_WAIT and ephemeral ports

Do not forget DNS caching

Putting it together: a production-ready fetch wrapper

Practical takeaway

A note from Yojji