Nginx Reverse Proxy Patterns For Production Node.js Apps

The Practical Developer

The Libuv Thread Pool Trap: Why Node.js Async APIs Stall Under Load Postgres Covering Indexes with INCLUDE: Eliminate Heap Fetches on Read-Heavy Workloads Postgres DISTINCT ON: The Fastest Way to Get the Latest Row Per Group Postgres Transaction Isolation: The Anomalies Your App Actually Faces in Production Linux TCP Tuning for Node.js Microservices: The Kernel Settings That Stop Silent Connection Drops Under Load Postgres HOT Updates and Fillfactor: Why Not All Writes Are Created Equal Database Connection Pool Leaks: Finding the Promise That Never Returns Its Seat Linux OOM Killer in Production: Why Your Node.js Containers Die Without a Stack Trace Postgres Materialized Views: Refresh Strategies That Do Not Lock Your Dashboards API Dependency Health Checks: Why /health Is Not Enough Authorization with Zanzibar Tuples: How Google Manages Permissions and How To Build the Same Check in Node.js Postgres Advisory Locks: The 20-Character Primitive That Replaces Redis for Coordination Dead Letter Queues: The Message Queue Pattern That Saves You at 2 a.m. File Descriptor Exhaustion: The Kernel Limit That Silently Drops Node.js Connections Graceful Degradation: The Pattern That Turns Total Outages into Partial Success PostgreSQL Full-Text Search: Dropping Elasticsearch for 90% of Use Cases S3 Presigned Multipart Uploads: Stop Your API Server from Being a File Upload Bottleneck MessagePack vs JSON: The Binary Serialization Switch That Cut Our Internal RPC Overhead by 40% DNS Caching in Node.js: The Silent Cause of Production Latency Spikes Reliable Cron Jobs: The Pattern That Stops Double Runs, Missed Executions, And The 2 AM Page GraphQL Query Complexity: Stop the OOM Query Before It Reaches Your Resolver Node.js Event Loop Lag: The Hidden Metric Behind Random Latency Spikes API Request Validation with Zod: The Schema That Catches Bad Input Before It Corrupts Your Database Load Shedding in Node.js: How to Reject Traffic Before You Drown Request Hedging: Cut Tail Latency In Half Without Overprovisioning Git Bisect: The Automated Binary Search That Finds Breaking Commits in Minutes Node.js Garbage Collection Tuning: Stop Letting V8 Pause Your Event Loop Node.js Server Timeouts: The Settings That Stop Slow Clients from Holding Sockets Hostage Postgres BRIN Indexes: The Time-Series Secret That Shrinks Indexes by 99% Event Sourcing with PostgreSQL: The Pragmatic 80% Solution Node.js Cluster Mode: Scaling the Event Loop Across CPU Cores Postgres Partial Indexes: Stopping Soft Deletes from Ruining Your Query Performance Request Coalescing with the Singleflight Pattern: Stop Drowning Your Database on Every Cache Miss The Bulkhead Pattern: Why One Slow Endpoint Should Not Drown Your Whole Service Node.js AsyncLocalStorage: End-to-End Request Context Without the Propagation Hell Postgres Deadlocks: Logging the Victim, Reproducing the Race, and Fixing the Lock Order Your Node.js HTTP Client Is the Bottleneck: Connection Pool Tuning That Works Optimistic Locking in Postgres: Stop Losing Data to Race Conditions Postgres Read Replicas: Stop Serving Stale Data to Your Users Cursor Pagination: Why Offset Queries Explode at Scale and How to Fix Them Node.js Worker Threads: 60 Lines That Stop a CSV Upload from Timing Out Every Other Request Reliable Webhook Delivery: Architecture for Outbound HTTP You Can Trust Request Timeouts and Deadline Propagation: Stop the Chain of Slowness Advanced Security Practices in Node.js Graceful Shutdown in Node.js: The 40 Lines That Stop 502s During Deploys Finding Node.js Memory Leaks with Heap Snapshots Idempotency Keys in 30 Lines: Stop Your Webhook From Charging Customers Twice Backpressure In Node.js: The Fix For Slow-Motion Queue Meltdowns Retries Done Right: Jitter, Budgets, and the Stampede You Did Not See Coming The Cache Stampede: Why Your "Just Add Redis" Layer Crashes Postgres at 3 a.m. Postgres SKIP LOCKED: An 80-Line Job Queue You Can Run Without Redis Stop Doing Work Nobody Wants: AbortController in Node.js, Done Right The N+1 Query Problem: We Found 23 In One Codebase And Killed Every One I Tried 5 AI Coding Tools for a Month. Here Is What I Actually Use CI/CD From Zero to Production in 30 Minutes With GitHub Actions Node.js vs Bun vs Deno: Which Runtime Should You Pick in 2025? Kubernetes Resource Requests And Limits: The Numbers That Decide If Your Cluster Is Stable The Three Pillars of Observability Are A Myth: What Actually Matters In Production pnpm Vs npm Vs yarn Vs Bun For Monorepos: Which One Earns The Migration In 2024 JSONB Indexing In Postgres: GIN Vs Expression Indexes, And When Each Is The Right Choice A Code Review Checklist That Ends The Same Three Arguments Every Sprint gRPC Vs REST In 2024: When The Switch Pays For Itself React Suspense For Data Fetching: The Pattern That Replaces Half Your Loading State Code The Five-Stage Rollout: How To Ship A Risky Change Without Holding Your Breath GitHub Actions In A Monorepo: Caching, Path Filters, And Secret Boundaries That Actually Work The Blameless Postmortem That Actually Improves Things: A Template And Six Hard-Won Rules Recursive CTEs In Postgres: How To Query A Tree Without N Round Trips Node.js Streams: When They Actually Help, And When They Just Add Complexity Playwright Vs Cypress In 2024: The Honest Comparison Of Which One Earns The Test Time React Server Components: The Mental Model That Makes The "use client" Boundary Obvious Pod Disruption Budgets: The K8s Object That Keeps Your Service Up During Cluster Maintenance Postgres LISTEN/NOTIFY: The Pub/Sub You Already Have And Are Not Using Chaos Engineering Starter Kit: The Five Drills That Don't Need Netflix-Scale Spec-Driven API Development With OpenAPI: How To Stop Drifting From Your Docs Kubernetes Autoscaling Beyond CPU: The Custom-Metric HPA Pattern That Actually Works Postgres Partitioning For Time-Series: The Boring Setup That Saves Your Database Distributed Locks With Redis: An Honest Look At Redlock And When You Don't Need It HTTP/2 vs HTTP/3: What Actually Changes For Your App, And What Doesn't Image Optimization For The Web In 2023: srcset, AVIF, And The Lighthouse Score You Actually Want Kafka vs RabbitMQ: A Decision Tree That Doesn't Hate You UUID vs Bigint Primary Keys In Postgres: The Index Math That Decides For You Flame Graphs: How To Find The Slow Function In 30 Seconds Without Profiling Theatre Postgres Streaming Vs. Logical Replication: Which One Solves Your Actual Problem ESLint Rules That Earn Their Keep: The Twelve I Enable On Every Project Pre-Commit Hooks That Pay For Themselves: Husky, lint-staged, And The Five Rules That Stick Zero-Downtime Database Migrations: The Six-Step Pattern That Rules Them All Circuit Breakers In Node.js: 50 Lines That Stop A Failing Dependency From Taking Down Your Service Postgres VACUUM Is Not Magic: How Your Hot Table Bloats To 80GB And How To Fix It Kubernetes Liveness And Readiness Probes: The Difference That Causes Half Your Outages Rate Limiting In Production: A Token Bucket In 30 Lines Of Redis The Outbox Pattern: How To Stop Losing Events When Postgres And Kafka Disagree Load Testing With k6: The Three Scenarios That Find Real Bugs (Not Synthetic Numbers) Postgres Row-Level Security For Multi-Tenant Apps: The Pattern That Stops You From Leaking Data Rebase vs. Merge: The Team Policy That Ends The Argument Forever OpenTelemetry in Node.js: Distributed Tracing That Actually Helps During an Incident Feature Flags That Pay Rent: The 4 Flag Types And When To Delete Each ETag, Last-Modified, and the Caching Headers Most APIs Get Wrong Connection Pooling Without the Cargo Cult: pgbouncer in 100 Lines of Config JSONB Is Not a Schema: When To Reach For It in Postgres, And When To Stop Bash Strict Mode: The Three Lines That Stop Your Deploy Script From Lying To You

The Practica · 2026-06-11 · via The Practical Developer

Your Node.js app works fine in development. You hit localhost:3000 directly, everything is snappy, and you never think about reverse proxies.

Then you deploy.

Clients timeout. WebSocket connections drop after 60 seconds. Logs show client IPs as 10.0.0.1 instead of the real user IP. And that one endpoint that uploads a 15 MB file gets a 413 error that takes three hours to debug.

Every one of these is a Nginx misconfiguration, not a code bug.

This guide covers the five Nginx patterns that make the difference between a proxy that silently degrades your app and one that actively protects it. Each pattern includes the exact config you can copy, the reasoning behind it, and the failure mode it prevents.

Pattern 1: Upstream keepalive so Nginx does not hammer your Node.js process

The default Nginx proxying behavior reuses HTTP/1.0 connections to the backend. That means your Node.js server opens a new TCP connection for every proxied request, adds TLS handshake overhead (if you terminate TLS at the proxy), and burns through file descriptors under load.

The fix is an upstream keepalive pool.

http {
  upstream app {
    server 127.0.0.1:3000;

    # Keep up to 256 idle connections to Node.js
    keepalive 256;

    # Only evict idle connections when the pool is full (not per single request)
    keepalive_timeout 120s;

    # Max requests per connection before Nginx recycles it
    keepalive_requests 10000;
  }

  server {
    listen 80;

    location / {
      proxy_pass http://app;

      # Tell Nginx to speak HTTP/1.1 to the backend (required for keepalive)
      proxy_http_version 1.1;
      proxy_set_header Connection "";
    }
  }
}

Three things happen here:

proxy_http_version 1.1 changes Nginx from HTTP/1.0 to HTTP/1.1 when talking to your Node backend. HTTP/1.0 does not support keepalive by default.
proxy_set_header Connection "" strips Nginx’s own hop-by-hop Connection header so the backend does not close the socket after the first response.
keepalive 256 keeps a pool of reusable connections. Your Node process handles requests without the TCP handshake tax on every one.

The failure mode without this: Under load, your Node.js process opens and closes connections constantly. ss -s shows thousands of TIME_WAIT sockets. The event loop spends more time on socket lifecycle than actual request handling. Connection-pooling databases like PgBouncer or your Redis client see connection storms because every proxied request arrives on a fresh TCP stream.

Benchmark: A simple Express hello-world behind Nginx with default settings handles about 3,000 req/s on a 2-core machine. With upstream keepalive (pool of 64), that same setup hits 8,000+ req/s. The TCP handshake overhead is not free.

Pattern 2: Buffer tuning against slow-client attacks

Nginx buffers responses from the backend by default. That is good — it lets Nginx read the full response from Node quickly and trickle it to slow clients without tying up your Node process. But the default buffer sizes are too small for API responses, and the client-side buffer settings are too generous for request bodies.

Here is the tuned config:

http {
  proxy_buffering on;

  # Response buffer: size per buffer and number of buffers
  proxy_buffer_size    4k;
  proxy_buffers        8 16k;
  proxy_busy_buffers_size 32k;

  # Request body: reject oversized bodies before they reach Node
  client_body_buffer_size 16k;
  client_max_body_size    10m;
  client_body_timeout     30s;
}

The response buffers let Nginx read an entire API response (up to 128 KB across 8 buffers) from Node in one go, then serve it to the client at whatever speed the client can handle. Your Node process goes back to handling other requests instead of sitting idle waiting for a hotel WiFi client to acknowledge each TCP segment.

The critical setting few teams tune: client_body_buffer_size and client_max_body_size.

client_max_body_size 10m rejects request bodies above 10 MB with a 413 response, before Nginx sends a single byte to Node.js. Without this, a 2 GB upload request consumes memory and I/O on your backend until the timeout fires.
client_body_buffer_size 16k keeps small request bodies in memory (fast path) and spills larger ones to temp files (slow path). If your API handles JSON payloads under 16 KB, all of them stay in RAM.

The failure mode without this: A single slow client with a small receive window makes your Node process block on response.write() for seconds. If you have 500 concurrent slow clients, every process in your cluster is busy writing bytes to the network instead of running your application logic. The proxy_buffers config decouples response generation from response delivery.

Pattern 3: WebSocket proxy configuration (the one that always bites you)

WebSocket connections start as HTTP upgrades. Nginx handles this with the upgrade header dance. But the default proxy timeouts kill idle WebSocket connections after 60 seconds, which breaks any real-time feature that keeps a connection open for longer.

map $http_upgrade $connection_upgrade {
  default  upgrade;
  ''       close;
}

server {
  location /ws {
    proxy_pass http://app;
    proxy_http_version 1.1;

    # WebSocket upgrade headers
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection $connection_upgrade;

    # Kill the default 60s proxy timeout
    proxy_read_timeout    86400s;
    proxy_send_timeout    86400s;

    # Optional: enable HTTP/2 for the WebSocket connection
    # (Nginx 1.25+ supports WebSocket over HTTP/2)
    # http2 on;
  }
}

The map block handles the Connection header correctly for both regular and WebSocket requests. Without it, Nginx sends Connection: upgrade on every request, not just WebSocket upgrades.

The timeouts are the important part. Nginx defaults proxy_read_timeout to 60 seconds. If your WebSocket sends no data for 61 seconds (common in stock ticker apps, chat rooms, or dashboard UIs), Nginx closes the connection. Your client library fires a reconnect event, the app feels flaky, and someone files a bug titled “connection drops randomly.”

Setting both timeouts to 86400 seconds (24 hours) effectively removes the timeout. Your application handles disconnect logic instead.

The failure mode without this: WebSocket connections drop at exactly 60 seconds of idle time. You add a ping-pong heartbeat to your client code, but the real fix was the Nginx timeout. Teams waste days on this because they assume the disconnect is in the application layer.

Pattern 4: Real-IP forwarding and logging the right data

When Nginx proxies requests, req.ip in Express or Fastify shows Nginx’s IP (usually 127.0.0.1), not the client’s real IP. Every log line, rate limiter, and geo-IP middleware returns wrong data.

The fix is the X-Real-IP and X-Forwarded-For headers, combined with the realip module.

server {
  # Trust X-Forwarded-For from the proxy
  set_real_ip_from 127.0.0.1;
  set_real_ip_from 10.0.0.0/8;
  set_real_ip_from 172.16.0.0/12;
  set_real_ip_from 192.168.0.0/16;
  real_ip_header    X-Forwarded-For;
  real_ip_recursive on;

  proxy_set_header X-Real-IP       $remote_addr;
  proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
  proxy_set_header X-Forwarded-Proto $scheme;

  # Log format that actually helps debugging
  log_format detailed '$remote_addr - $remote_user [$time_local] '
                      '"$request" $status $body_bytes_sent '
                      '"$http_referer" "$http_user_agent" '
                      'rt=$request_time uct=$upstream_connect_time '
                      'uht=$upstream_header_time urt=$upstream_response_time '
                      'upstream=$upstream_addr';

  access_log /var/log/nginx/app-access.log detailed;
  error_log  /var/log/nginx/app-error.log warn;
}

The set_real_ip_from directives tell Nginx which IPs are trusted proxies. When a request arrives from one of those IPs with a X-Forwarded-For header, Nginx replaces $remote_addr with the real client IP. The real_ip_recursive on setting handles chains of proxies by taking the rightmost trusted IP (the actual client).

The custom log_format includes upstream timing variables:

upstream_connect_time — how long Nginx took to connect to your Node process
upstream_header_time — time until Nginx received the first response byte
upstream_response_time — total time to receive the full response

These three values tell you exactly where time is spent. If upstream_connect_time is high, your connection pool is too small or your Node process is saturated. If upstream_header_time is high but upstream_connect_time is low, your Node process is slow to generate the first byte (maybe waiting on a database query). If upstream_response_time is high but upstream_header_time is low, the response body is large or the client is slow.

The failure mode without this: Rate limiters use 127.0.0.1 as the client key, so all traffic maps to one bucket and the rate limit is never enforced. Geo-IP middleware returns the datacenter location. Logs show internal IPs, making incident debugging useless.

Pattern 5: Load balancing that matches your Node.js process model

If you run multiple Node.js processes (via cluster, PM2, or multiple containers), Nginx distributes traffic across them. The default round-robin is rarely the best choice for Node.js.

upstream app {
  # least_conn: send to the backend with fewest active connections
  least_conn;

  # Option A: multiple processes on one machine (cluster mode)
  server 127.0.0.1:3001;
  server 127.0.0.1:3002;
  server 127.0.0.1:3003;
  server 127.0.0.1:3004;

  # Option B: multiple containers
  # server app1:3000;
  # server app2:3000;
  # server app3:3000;

  keepalive 256;
}

server {
  # Passive health checks: Nginx stops sending to a failing backend
  # after n failures within a time window
  location / {
    proxy_pass http://app;

    # Failures before marking backend as down
    proxy_next_upstream_tries    3;
    # Time window for the failure count
    proxy_next_upstream_timeout  10s;
    # Which failures count
    proxy_next_upstream          error timeout http_500 http_502 http_503;
  }

  # Active health check (requires nginx-plus or a hack with status endpoint)
  # Without nginx-plus, point Docker/K8s probes at the backend directly
}

least_conn matters because not all requests are equal. A request that hits a slow database query holds a connection for 2 seconds. Round-robin keeps sending new requests to that same backend. least_conn sends new requests to the backend with fewer active connections, which naturally balances by current load, not by request count.

The proxy_next_upstream settings tell Nginx to try another backend if the first one returns a 5xx error, times out, or drops the connection. Without this, a single backend crash takes down every request routed to it until you restart it.

The failure mode without this: Round-robin sends a burst of slow requests to process 1 while processes 2-4 sit idle. The event loop on process 1 lags. Health checks start failing. The orchestration layer restarts the “healthy” processes 2-4 unnecessarily while process 1 struggles. Connection queuing at Nginx builds up and users see 502 errors.

Putting it all together: a complete production config

Here is a single Nginx config that combines all five patterns:

worker_processes auto;
worker_rlimit_nofile 65535;

events {
  worker_connections 4096;
  multi_accept on;
  use epoll;
}

http {
  include       mime.types;
  default_type  application/octet-stream;

  sendfile        on;
  tcp_nopush      on;
  tcp_nodelay     on;
  keepalive_timeout 65;

  # Log format with upstream timing
  log_format detailed '$remote_addr - $remote_user [$time_local] '
                      '"$request" $status $body_bytes_sent '
                      '"$http_referer" "$http_user_agent" '
                      'rt=$request_time uct=$upstream_connect_time '
                      'uht=$upstream_header_time urt=$upstream_response_time '
                      'upstream=$upstream_addr';

  access_log /var/log/nginx/access.log detailed;
  error_log  /var/log/nginx/error.log warn;

  # Upstream keepalive (Pattern 1)
  upstream app {
    least_conn;  # Pattern 5
    server 127.0.0.1:3000;
    keepalive 256;
    keepalive_timeout 120s;
    keepalive_requests 10000;
  }

  # WebSocket upgrade map (Pattern 3)
  map $http_upgrade $connection_upgrade {
    default  upgrade;
    ''       close;
  }

  server {
    listen 80;
    server_name api.example.com;

    # Real-IP (Pattern 4)
    set_real_ip_from 10.0.0.0/8;
    set_real_ip_from 172.16.0.0/12;
    set_real_ip_from 192.168.0.0/16;
    real_ip_header X-Forwarded-For;
    real_ip_recursive on;

    # Buffer tuning (Pattern 2)
    client_body_buffer_size 16k;
    client_max_body_size    10m;
    client_body_timeout     30s;
    proxy_buffering on;
    proxy_buffer_size    4k;
    proxy_buffers        8 16k;
    proxy_busy_buffers_size 32k;

    # Generic proxy headers
    proxy_set_header Host              $host;
    proxy_set_header X-Real-IP         $remote_addr;
    proxy_set_header X-Forwarded-For   $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;

    # Keepalive to backend (Pattern 1)
    proxy_http_version 1.1;
    proxy_set_header Connection "";

    # Timeouts
    proxy_connect_timeout   5s;
    proxy_send_timeout      10s;
    proxy_read_timeout      30s;

    # Retry logic (Pattern 5)
    proxy_next_upstream_tries   3;
    proxy_next_upstream_timeout 10s;
    proxy_next_upstream         error timeout http_500 http_502 http_503;

    # API routes
    location / {
      proxy_pass http://app;
    }

    # WebSocket endpoint (Pattern 3)
    location /ws {
      proxy_pass http://app;
      proxy_http_version 1.1;
      proxy_set_header Upgrade $http_upgrade;
      proxy_set_header Connection $connection_upgrade;
      proxy_read_timeout    86400s;
      proxy_send_timeout    86400s;
    }

    # Health check endpoint -- bypass load balancing
    location /health {
      proxy_pass http://app;
      proxy_next_upstream off;
      access_log off;
    }
  }
}

Testing your config

Before you push this to production, validate it:

# Syntax check
nginx -t

# Reload without dropping connections
nginx -s reload

# Verify upstream keepalive is working
curl -I http://localhost/health

Then run a load test to confirm the keepalive pool is active:

# Watch upstream connection reuse
ss -tan | grep 3000

# Run a quick benchmark
wrk -t4 -c100 -d30s http://localhost/

Compare TIME_WAIT socket counts before and after the keepalive config. If you see hundreds of TIME_WAIT entries after the config, something is wrong — Nginx is still not using persistent connections to the backend.

When Nginx is not the right answer

Not every setup needs a bespoke Nginx config. If your app runs on Kubernetes with a service mesh (like Istio or Linkerd), the sidecar proxy handles most of these patterns (keepalive, retries, timeouts, load balancing). Running Nginx as an additional layer inside the mesh adds complexity without benefit.

If you run a single-region, single-instance API with fewer than 100 req/s, the default Nginx config in the official Docker image is good enough. Apply these patterns when you see concrete symptoms: slow clients causing event loop lag, WebSocket drops, or upstream connection exhaustion.

Closing

Nginx is the most boring part of your stack until it is not. The defaults are optimized for static file serving, not for proxying long-lived Node.js API connections. A few targeted config changes — upstream keepalive, response buffering, WebSocket timeouts, real-IP forwarding, and least-conn load balancing — turn Nginx from a passive pass-through into an active reliability layer.

Your Node.js app can handle more traffic with less memory and fewer timeouts when the proxy in front of it stops working against you.

Teams that build and deploy Node.js applications at scale often invest in this kind of infrastructure hygiene as a baseline, not an afterthought. Yojji, for example, treats reverse proxy configuration as part of the core delivery checklist in backend-heavy projects where every millisecond and every dropped connection counts toward the user experience.

此内容由惯性聚合(RSS阅读器)自动聚合整理，仅供阅读参考。原文来自 — 版权归原作者所有。

推荐订阅源

The Practical Developer

Pattern 1: Upstream keepalive so Nginx does not hammer your Node.js process

Pattern 2: Buffer tuning against slow-client attacks

Pattern 3: WebSocket proxy configuration (the one that always bites you)

Pattern 4: Real-IP forwarding and logging the right data

Pattern 5: Load balancing that matches your Node.js process model

Putting it all together: a complete production config

Testing your config

When Nginx is not the right answer

Closing