Mon 25 May 2026,
Premise:
I came across a video discussing the new Bun.image. How the developers at Bun gave it native Image support that was 'Blazing fast'.
This lead me to wonder just how much more performant bun was at regular http server as compared to NodeJS.
I am a software developer with 4 years of experience mainly at creating Node services. I feel safe to say that I have good understanding of Node's architecture, its limits and production capabilities when handling high throughput network I/O
However, I am still very new to Bun, So, I decided to make this simple benchmark, where I run a simple http server in Node, Bun and GO.
GO is just a control here. Given the bare-metal nature of Go, I thought it would be a good control, and help me understand how much bloat do Node and bun carry, within their V8/JSC and libuv/zig.
The Cloud-Native Battle Royale: Node vs. Bun vs. Go
From Localhost to Cloud Reality—Tracking the Bottlenecks Step-by-Step.
Its easy to confuse localhost performance to that of hosted environments. You run a quick hello-world benchmark on localhost, watch Bun or Node spit out massive numbers inside your laptop's RAM, and declare your favorite runtime the undisputed king of speed.
But localhost is a lie. Inside RAM, you aren't testing software architecture—you're testing how fast your CPU can copy strings across a local memory bus.
To uncover the some of the production truth, I will be comparing three popular backend stacks, sandboxed them strictly inside Docker, and ran them through controlled configs:
from local memory buses,
over an encrypted Wi-Fi mesh network,
and finally unchained onto enterprise-grade Linux cloud hardware.
Node, Bun and Go all will be running default out of the box http servers. They will handle incoming traffic on /json and return a simple json response.
{ message: 'Hello from ${Node/ Bun/ Go}' }
Phase 1: The Localhost Baseline
Our journey started on a local Windows development machine. We pinned our containers to a single core, scaled up to 4 cores using standard single-threaded configurations, and finally introduced clustering.
Phase 1 Ledger: Local Memory Performance
| Benchmark Phase | Node.js | Bun | Go (Optimized Raw Bytes) |
|---|---|---|---|
| 1 CPU Core Baseline | ~14,000 RPS | ~28,000 RPS | ~29,000 RPS |
| 4 CPU Cores (No Cluster) | ~16,000 RPS (Stuck) | ~30,000 RPS (Stuck) | ~115,000 RPS 🚀 |
| 4 CPU Cores (Clustered) | ~110,000 RPS | ~170,000 RPS 🏆 | N/A (Go handles natively) |
The Localhost Plot Twist
- The Single-Threaded Wall: In the 4-core run with no clustering, Node and Bun completely stalled. Because JavaScript is single-threaded, they could only ever utilize 100% of a single core, leaving the other 3 cores sitting completely idle. Meanwhile, Go's native scheduler effortlessly utilized the available hardware, scaling to 115,000 RPS out of the box.
-
The Clustered Beast: The moment we turned on Node's
clustermodule and spawned 4 parallel instances of Bun, the JavaScript engines woke up. Bun rocketed to a staggering 170,000 RPS over local memory. On paper, it looked completely untouchable.
Phase 2: The Physical Reality Check (Over Tailscale)
Then came the first massive architectural shift. We forced the packets to leave the cozy confines of local laptop RAM and cross a real, physical network layer. I hooked up my MacBook over Tailscale virtual private mesh network, sending requests over the airwaves via a local Wi-Fi adapter.
The moment traffic hit the air, the 170,000 RPS powerhouses slammed headfirst into a concrete wall.
Phase 2 Ledger: The Encrypted Airwaves (4 Cores)
| Metric | Node.js (4 Workers) | Bun (4 Instances) | Go (Optimized Raw Bytes) |
|---|---|---|---|
| Throughput (RPS) | 7,954 RPS | 12,519 RPS | 12,873 RPS 🏆 |
| Average Latency | 26.79 ms | 16.49 ms | 15.69 ms 🏆 |
| Max Outlier Latency | 864.53 ms ⚠️ | 163.24 ms | 152.21 ms 🏆 |
| Data Transfer Rate | 1.62 MB/s | 1.76 MB/s | 1.67 MB/s |
Shifting from CPU to Network-Bound I/O
The engines were no longer fighting each other; they were all waiting in line for the physical Wi-Fi card to modulate radio waves and process encrypted WireGuard UDP wrappers.
- Go Reclaimed the Crown: Because Go's native network poller integrates directly with the OS kernel stack, it handled network jitter with absolute stability, locking down the lowest average and outlier latencies.
- Node.js Crumbled Under Jitter: Node's master process struggled to distribute incoming packets with uneven arrival times to its workers, causing outlier latencies to skyrocket to a near-fatal 864ms.
Important to note that my PC has a very basic tenda u10 wifi usb adapter. It only supports wifi3 and was very likely the bottleneck.
This highlighted that often, systems are network bottlenecked, and its unfair to continue testing with my current hardware and environment. So I decided to isolate the tests from my hardware limitations by hosting them on digital ocean droplets.
Phase 3: Unchained on the Cloud
To find out the absolute ceiling of our code without network noise or local virtualization layers, I moved the experiment onto enterprise-grade Linux cloud hardware (DigitalOcean Shared Droplets over a 10 Gbps data center backplane).
Deployed a target server VM and a separate attacker VM inside the same data center switch, isolated their resource environments natively via Docker runtime flags, and re-ran the gauntlet.
Step 3.1: Running the Baseline Cloud Sandboxes (1 CPU)
To run the isolated 1-Core tests on the target host, we spun up each server container using a strict resource quota constraint:
# Target VM - Launching Single Core Baselines
docker run --rm --cpus="1" -m="512m" -p 3000:3000 --name cloud-bench node-bench
docker run --rm --cpus="1" -m="512m" -p 3000:3000 --name cloud-bench bun-bench
docker run --rm --cpus="1" -m="512m" -p 3000:3000 --name cloud-bench go-bench
Phase 3 Ledger: Cloud-Native 1-CPU Baseline
| Metric | Node.js (1 Core) | Go (1 Core - Optimized) | Bun (1 Core) |
|---|---|---|---|
| Throughput (RPS) | 11,705 RPS | 13,935 RPS | 25,444 RPS 🏆 |
| Average Latency | 29.13 ms | 22.83 ms | 7.89 ms 🏆 |
| Max Outlier Latency | 2,000.00 ms ⚠️ | 135.97 ms | 93.27 ms |
| Failed Requests | 60 Timeouts ⚠️ | 0 | 0 |
On a single cloud core, Node’s V8 engine choked trying to handle 11k packets per second alongside its own internal event loop bookkeeping, dropping 60 socket timeouts. Bun's ultra-lightweight Zig event loop effortlessly handled the single-core constraint, smashing Go by double the speed.
Phase 4: Full Multi-Core Cloud Concurrency
Finally, we opened the valves completely. I scaled the target configurations to a 4 CPU Core allocation using clustered variants for the JavaScript runtimes to allow multi-threaded scheduling:
# Target VM - Launching Multi-Core Clusters
docker run --rm --cpus="4" -m="512m" -p 3000:3000 --name cloud-bench node-bench-c
docker run --rm --cpus="4" -m="512m" -p 3000:3000 --name cloud-bench bun-bench-c
docker run --rm --cpus="4" -m="512m" -p 3000:3000 --name cloud-bench go-bench
The Automation Attack Script
To collect clean, zero-friction metrics, the Attacker VM offloaded the load generator inside an Alpine container using an optimized execution thread limit:
# Attacker VM - The Native wrk Attack Command
docker run --rm alpine sh -c "apk add --no-cache wrk && wrk -t2 -c200 -d30s http://159.65.6.89:3000/json"
Phase 4 Ledger: Cloud-Native 4-CPU Maximum Performance
| Metric | Node.js Cluster (4 Cores) | Go Optimized (4 Cores) | Bun Cluster (4 Cores) |
|---|---|---|---|
| Throughput (RPS) | 31,025 RPS | 37,617 RPS | 53,446 RPS 🏆 |
| Total Requests (30s) | 933,074 | 1,130,171 | 1,605,818 |
| Average Latency | 8.62 ms | 5.79 ms | 4.04 ms 🏆 |
| Worst-Case Spike | 641.25 ms ⚠️ | 218.91 ms | 76.54 ms 🏆 |
| Peak CPU (Docker Stats) | >400% (Thrashing) | 340% (Highly Efficient) 🏆 | 383% (Perfect Scaling) |
The Architectural Post-Mortem
1. Bun is the New Raw I/O King (53,446 RPS)
Bun scaled almost perfectly linearly across the 4 vCPUs, hitting 383% CPU utilization in docker stats. By dropping V8's weight and building its HTTP server straight into Zig, Bun handles native OS socket interactions with almost zero orchestration loss. It spent practically zero time handling internal lock contention, keeping its average latency at a jaw-dropping 4.04ms under massive load.
2. Go is an Infrastructure Masterpiece (37,617 RPS)
Go generated elite throughput while using only 340% CPU, leaving an entire 60% of a physical core free for the OS to breathe.
We achieved this by bypassing the standard routing tree and writing pre-rendered raw bytes directly to the network buffer—completely eliminating runtime reflection and heap allocations. Go's native user-space goroutine scheduler managed the multi-core scaling in a single process, proving why it remains the gold standard for resource efficiency and predictability.
3. Node.js Suffers From the "Cluster Tax" (31,025 RPS)
Node bounced back, but it ran hot—constantly thrashing above 400% CPU utilization. Because Node requires separate cloned V8 processes communicating via Inter-Process Communication (IPC) to pass socket descriptors, the CPU burns valuable clock cycles handling internal kernel-space locking and thread synchronization instead of serving requests.
The Production Takeaway
- Bun is no longer just a fast local test runner; its cloud-native I/O layer is a legitimate production speed-demon.
- Go remains the rock-solid king of predictable infrastructure, giving you massive throughput with flatlined resource consumption and unparalleled stability.
- Node.js works, but under extreme enterprise concurrency, its legacy multi-process architecture forces your hardware to work significantly harder to produce less output.
Code snippets
Bun - Clustered
Bun.serve({
port: 3000,
reusePort: true, // Key flag: allows multiple processes to bind to port 3000
fetch(request) {
const url = new URL(request.url);
if (request.method === 'GET' && url.pathname === '/json') {
return new Response(JSON.stringify({ message: "Hello from Clustered Bun!" }), {
headers: { 'Content-Type': 'application/json' },
});
}
return new Response("Not Found", { status: 404 });
},
});
Node - Clustered
const cluster = require('cluster');
const http = require('http');
const numCPUs = 4; // Matching our --cpus="4" flag
if (cluster.isMaster) {
// Fork workers for each core
for (let i = 0; i < numCPUs; i++) {
cluster.fork();
}
cluster.on('exit', (worker) => {
cluster.fork(); // Simple failover
});
} else {
// Workers share the TCP connection
http.createServer((req, res) => {
if (req.method === 'GET' && req.url === '/json') {
res.writeHead(200, { 'Content-Type': 'application/json' });
res.end(JSON.stringify({ message: "Hello from Clustered Node!" }));
} else {
res.writeHead(404);
res.end();
}
}).listen(3000);
}
Go
package main
import (
"fmt"
"net/http"
)
// Pre-render the exact JSON byte slice at startup to completely eliminate allocation/reflection
var jsonResponse = []byte(`{"message":"Hello from Go!"}`)
func jsonHandler(w http.ResponseWriter, r *http.Request) {
if r.Method != http.MethodGet {
w.WriteHeader(http.StatusMethodNotAllowed)
return
}
// Set headers and write raw bytes directly to the network buffer
w.Header().Set("Content-Type", "application/json")
w.Write(jsonResponse)
}
func main() {
// Disable internal tracking features we don't need for a barebones benchmark
server := &http.Server{
Addr: ":3000",
Handler: http.HandlerFunc(jsonHandler),
}
fmt.Println("Optimized Go running on port 3000")
server.ListenAndServe()
}




















