NodeJS vs Bun vs Go

Mon 25 May 2026,

Premise:
I came across a video discussing the new Bun.image. How the developers at Bun gave it native Image support that was 'Blazing fast'.
This lead me to wonder just how much more performant bun was at regular http server as compared to NodeJS.

I am a software developer with 4 years of experience mainly at creating Node services. I feel safe to say that I have good understanding of Node's architecture, its limits and production capabilities when handling high throughput network I/O

However, I am still very new to Bun, So, I decided to make this simple benchmark, where I run a simple http server in Node, Bun and GO.

GO is just a control here. Given the bare-metal nature of Go, I thought it would be a good control, and help me understand how much bloat do Node and bun carry, within their V8/JSC and libuv/zig.

The Cloud-Native Battle Royale: Node vs. Bun vs. Go

From Localhost to Cloud Reality—Tracking the Bottlenecks Step-by-Step.

Its easy to confuse localhost performance to that of hosted environments. You run a quick hello-world benchmark on localhost, watch Bun or Node spit out massive numbers inside your laptop's RAM, and declare your favorite runtime the undisputed king of speed.

But localhost is a lie. Inside RAM, you aren't testing software architecture—you're testing how fast your CPU can copy strings across a local memory bus.

To uncover the some of the production truth, I will be comparing three popular backend stacks, sandboxed them strictly inside Docker, and ran them through controlled configs:

from local memory buses,
over an encrypted Wi-Fi mesh network,
and finally unchained onto enterprise-grade Linux cloud hardware.

Node, Bun and Go all will be running default out of the box http servers. They will handle incoming traffic on /json and return a simple json response.
{ message: 'Hello from ${Node/ Bun/ Go}' }

Phase 1: The Localhost Baseline

Our journey started on a local Windows development machine. We pinned our containers to a single core, scaled up to 4 cores using standard single-threaded configurations, and finally introduced clustering.

Phase 1 Ledger: Local Memory Performance

Benchmark Phase	Node.js	Bun	Go (Optimized Raw Bytes)
1 CPU Core Baseline	~14,000 RPS	~28,000 RPS	~29,000 RPS
4 CPU Cores (No Cluster)	~16,000 RPS (Stuck)	~30,000 RPS (Stuck)	~115,000 RPS 🚀
4 CPU Cores (Clustered)	~110,000 RPS	~170,000 RPS 🏆	N/A (Go handles natively)

The Localhost Plot Twist

The Single-Threaded Wall: In the 4-core run with no clustering, Node and Bun completely stalled. Because JavaScript is single-threaded, they could only ever utilize 100% of a single core, leaving the other 3 cores sitting completely idle. Meanwhile, Go's native scheduler effortlessly utilized the available hardware, scaling to 115,000 RPS out of the box.
The Clustered Beast: The moment we turned on Node's cluster module and spawned 4 parallel instances of Bun, the JavaScript engines woke up. Bun rocketed to a staggering 170,000 RPS over local memory. On paper, it looked completely untouchable.

Phase 2: The Physical Reality Check (Over Tailscale)

Then came the first massive architectural shift. We forced the packets to leave the cozy confines of local laptop RAM and cross a real, physical network layer. I hooked up my MacBook over Tailscale virtual private mesh network, sending requests over the airwaves via a local Wi-Fi adapter.

The moment traffic hit the air, the 170,000 RPS powerhouses slammed headfirst into a concrete wall.

Phase 2 Ledger: The Encrypted Airwaves (4 Cores)

Metric	Node.js (4 Workers)	Bun (4 Instances)	Go (Optimized Raw Bytes)
Throughput (RPS)	7,954 RPS	12,519 RPS	12,873 RPS 🏆
Average Latency	26.79 ms	16.49 ms	15.69 ms 🏆
Max Outlier Latency	864.53 ms ⚠️	163.24 ms	152.21 ms 🏆
Data Transfer Rate	1.62 MB/s	1.76 MB/s	1.67 MB/s

Shifting from CPU to Network-Bound I/O

The engines were no longer fighting each other; they were all waiting in line for the physical Wi-Fi card to modulate radio waves and process encrypted WireGuard UDP wrappers.

Go Reclaimed the Crown: Because Go's native network poller integrates directly with the OS kernel stack, it handled network jitter with absolute stability, locking down the lowest average and outlier latencies.
Node.js Crumbled Under Jitter: Node's master process struggled to distribute incoming packets with uneven arrival times to its workers, causing outlier latencies to skyrocket to a near-fatal 864ms.

Important to note that my PC has a very basic tenda u10 wifi usb adapter. It only supports wifi3 and was very likely the bottleneck.

This highlighted that often, systems are network bottlenecked, and its unfair to continue testing with my current hardware and environment. So I decided to isolate the tests from my hardware limitations by hosting them on digital ocean droplets.

Phase 3: Unchained on the Cloud

To find out the absolute ceiling of our code without network noise or local virtualization layers, I moved the experiment onto enterprise-grade Linux cloud hardware (DigitalOcean Shared Droplets over a 10 Gbps data center backplane).

Deployed a target server VM and a separate attacker VM inside the same data center switch, isolated their resource environments natively via Docker runtime flags, and re-ran the gauntlet.

Step 3.1: Running the Baseline Cloud Sandboxes (1 CPU)

To run the isolated 1-Core tests on the target host, we spun up each server container using a strict resource quota constraint:

# Target VM - Launching Single Core Baselines
docker run --rm --cpus="1" -m="512m" -p 3000:3000 --name cloud-bench node-bench
docker run --rm --cpus="1" -m="512m" -p 3000:3000 --name cloud-bench bun-bench
docker run --rm --cpus="1" -m="512m" -p 3000:3000 --name cloud-bench go-bench

Phase 3 Ledger: Cloud-Native 1-CPU Baseline

Metric	Node.js (1 Core)	Go (1 Core - Optimized)	Bun (1 Core)
Throughput (RPS)	11,705 RPS	13,935 RPS	25,444 RPS 🏆
Average Latency	29.13 ms	22.83 ms	7.89 ms 🏆
Max Outlier Latency	2,000.00 ms ⚠️	135.97 ms	93.27 ms
Failed Requests	60 Timeouts ⚠️	0	0

On a single cloud core, Node’s V8 engine choked trying to handle 11k packets per second alongside its own internal event loop bookkeeping, dropping 60 socket timeouts. Bun's ultra-lightweight Zig event loop effortlessly handled the single-core constraint, smashing Go by double the speed.

Phase 4: Full Multi-Core Cloud Concurrency

Finally, we opened the valves completely. I scaled the target configurations to a 4 CPU Core allocation using clustered variants for the JavaScript runtimes to allow multi-threaded scheduling:

# Target VM - Launching Multi-Core Clusters
docker run --rm --cpus="4" -m="512m" -p 3000:3000 --name cloud-bench node-bench-c
docker run --rm --cpus="4" -m="512m" -p 3000:3000 --name cloud-bench bun-bench-c
docker run --rm --cpus="4" -m="512m" -p 3000:3000 --name cloud-bench go-bench

The Automation Attack Script

To collect clean, zero-friction metrics, the Attacker VM offloaded the load generator inside an Alpine container using an optimized execution thread limit:

# Attacker VM - The Native wrk Attack Command
docker run --rm alpine sh -c "apk add --no-cache wrk && wrk -t2 -c200 -d30s http://159.65.6.89:3000/json"

Phase 4 Ledger: Cloud-Native 4-CPU Maximum Performance

Metric	Node.js Cluster (4 Cores)	Go Optimized (4 Cores)	Bun Cluster (4 Cores)
Throughput (RPS)	31,025 RPS	37,617 RPS	53,446 RPS 🏆
Total Requests (30s)	933,074	1,130,171	1,605,818
Average Latency	8.62 ms	5.79 ms	4.04 ms 🏆
Worst-Case Spike	641.25 ms ⚠️	218.91 ms	76.54 ms 🏆
Peak CPU (Docker Stats)	>400% (Thrashing)	340% (Highly Efficient) 🏆	383% (Perfect Scaling)

The Architectural Post-Mortem

1. Bun is the New Raw I/O King (53,446 RPS)

Bun scaled almost perfectly linearly across the 4 vCPUs, hitting 383% CPU utilization in docker stats. By dropping V8's weight and building its HTTP server straight into Zig, Bun handles native OS socket interactions with almost zero orchestration loss. It spent practically zero time handling internal lock contention, keeping its average latency at a jaw-dropping 4.04ms under massive load.

2. Go is an Infrastructure Masterpiece (37,617 RPS)

Go generated elite throughput while using only 340% CPU, leaving an entire 60% of a physical core free for the OS to breathe.

We achieved this by bypassing the standard routing tree and writing pre-rendered raw bytes directly to the network buffer—completely eliminating runtime reflection and heap allocations. Go's native user-space goroutine scheduler managed the multi-core scaling in a single process, proving why it remains the gold standard for resource efficiency and predictability.

3. Node.js Suffers From the "Cluster Tax" (31,025 RPS)

Node bounced back, but it ran hot—constantly thrashing above 400% CPU utilization. Because Node requires separate cloned V8 processes communicating via Inter-Process Communication (IPC) to pass socket descriptors, the CPU burns valuable clock cycles handling internal kernel-space locking and thread synchronization instead of serving requests.

The Production Takeaway

Bun is no longer just a fast local test runner; its cloud-native I/O layer is a legitimate production speed-demon.
Go remains the rock-solid king of predictable infrastructure, giving you massive throughput with flatlined resource consumption and unparalleled stability.
Node.js works, but under extreme enterprise concurrency, its legacy multi-process architecture forces your hardware to work significantly harder to produce less output.

Code snippets

Bun - Clustered

Bun.serve({
  port: 3000,
  reusePort: true, // Key flag: allows multiple processes to bind to port 3000
  fetch(request) {
    const url = new URL(request.url);
    if (request.method === 'GET' && url.pathname === '/json') {
      return new Response(JSON.stringify({ message: "Hello from Clustered Bun!" }), {
        headers: { 'Content-Type': 'application/json' },
      });
    }
    return new Response("Not Found", { status: 404 });
  },
});

Node - Clustered

const cluster = require('cluster');
const http = require('http');
const numCPUs = 4; // Matching our --cpus="4" flag

if (cluster.isMaster) {
  // Fork workers for each core
  for (let i = 0; i < numCPUs; i++) {
    cluster.fork();
  }

  cluster.on('exit', (worker) => {
    cluster.fork(); // Simple failover
  });
} else {
  // Workers share the TCP connection
  http.createServer((req, res) => {
    if (req.method === 'GET' && req.url === '/json') {
      res.writeHead(200, { 'Content-Type': 'application/json' });
      res.end(JSON.stringify({ message: "Hello from Clustered Node!" }));
    } else {
      res.writeHead(404);
      res.end();
    }
  }).listen(3000);
}

package main

import (
    "fmt"
    "net/http"
)

// Pre-render the exact JSON byte slice at startup to completely eliminate allocation/reflection
var jsonResponse = []byte(`{"message":"Hello from Go!"}`)

func jsonHandler(w http.ResponseWriter, r *http.Request) {
    if r.Method != http.MethodGet {
        w.WriteHeader(http.StatusMethodNotAllowed)
        return
    }

    // Set headers and write raw bytes directly to the network buffer
    w.Header().Set("Content-Type", "application/json")
    w.Write(jsonResponse)
}

func main() {
    // Disable internal tracking features we don't need for a barebones benchmark
    server := &http.Server{
        Addr:    ":3000",
        Handler: http.HandlerFunc(jsonHandler),
    }

    fmt.Println("Optimized Go running on port 3000")
    server.ListenAndServe()
}

推荐订阅源

DEV Community