惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

H
Help Net Security
T
ThreatConnect
SecWiki News
SecWiki News
F
Future of Privacy Forum
AWS News Blog
AWS News Blog
C
Cisco Blogs
A
Arctic Wolf
Vercel News
Vercel News
The GitHub Blog
The GitHub Blog
Scott Helme
Scott Helme
V
V2EX
博客园 - 叶小钗
阮一峰的网络日志
阮一峰的网络日志
K
Kaspersky official blog
G
Google Developers Blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
P
Privacy International News Feed
C
Cyber Attacks, Cyber Crime and Cyber Security
N
News | PayPal Newsroom
Schneier on Security
Schneier on Security
NISL@THU
NISL@THU
Microsoft Azure Blog
Microsoft Azure Blog
量子位
The Hacker News
The Hacker News
Stack Overflow Blog
Stack Overflow Blog
Security Latest
Security Latest
M
Microsoft Research Blog - Microsoft Research
Google Online Security Blog
Google Online Security Blog
博客园_首页
C
CXSECURITY Database RSS Feed - CXSecurity.com
I
InfoQ
Google DeepMind News
Google DeepMind News
Y
Y Combinator Blog
The Cloudflare Blog
Microsoft Security Blog
Microsoft Security Blog
Martin Fowler
Martin Fowler
Cisco Talos Blog
Cisco Talos Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
T
Troy Hunt's Blog
F
Fox-IT International blog
S
Security @ Cisco Blogs
博客园 - 司徒正美
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
C
Comments on: Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
L
LINUX DO - 最新话题
GbyAI
GbyAI
Project Zero
Project Zero
腾讯CDC
T
Tailwind CSS Blog

DEV Community

My Skills How I Use AI as a Senior Engineer Construí um motor ATS determinístico porque estava cansado de adivinhar por que meu currículo era rejeitado SCS-Lab1 — CloudTrail: Trail + S3 + KMS + Log Validation LuisCore MCP server — daily syndication · 2026-05-25 Cursor vs JetBrains Rider for C#/.NET in 2026: which to pay for I built a local-first movie recommender with Corrective-RAG (cited explanations, hybrid retrieval, runs entirely on Ollama) How the Events Table That Looked Right Killed Our Queue Three Failures My AI Memory System Caught — And the Flaw It Revealed in Itself dotnet Framework life cycle tool LangGraph 워크플로우 템플릿 (v41) I built a free image compression API — no signup, just curl Designing TikTok from Scratch — A System Design Deep Dive PREDICTION-20260525-0007: boredom-with-asymmetric-leverage [2026-Q3 through 2027-Q3] [Boost] How to integrate the QuickBooks Invoice API in 2026 How I Cut My Anthropic API Bill by 50% With a Local Python Tool Vibe Coding Problems: 7 Visual Bugs AI Code Generators Always Ship Chinese AI Models 2026: The Agentic Revolution, Hardware Independence, and What It Means for Global Developers The Quiet AI War Inside Your Browser The 12-Line Anti-Bot Trick That Saved Our Airdrop Snapshot From Sybil Farms Building a production-ready SaaS dashboard in Next.js 16 — Recharts, TanStack Table, dark mode, and collapsible sidebar Why 2026 Belongs to Agentic AI (And How to Build Your First Local Agent) It Was 2024 When We Tried to Outsmart the Treasure Hunt Engine RAG 시스템 실전 구축 (v40) I Found a Tool That Generates a Complete .NET 8 or Java Spring Boot API From SQL Schema in 30 Seconds I Added a 4th Agent That Audits My Other Agents. It Caught My Strategist Procrastinating for 3 Weeks. Streaming LLM responses to the browser in Go (Server-Sent Events) How We Publish and Manage Educational Admission Updates at Scale on DailyAxom A prompt is not a conversation. It's a component contract. How to Pass the EAA 2025 Accessibility Audit — A Step-by-Step WCAG Checklist Building an Autonomous MCP Lead Generation System with Hermes Agent LangGraph 워크플로우 템플릿 (v40) How I Built 100 Browser-Based Image Tools With No Server (FFmpeg WASM, PDF-lib, AI Background Removal) Nginx CVE-2026-9256, AI Prompt Injection Defenses, and Claude AI Data Leak Demo Scaling RAG for 10M+ Docs, .md Agent Memory, & Claude Code for Motion Graphics Diagram as Code with draw.io DuckDB Delta, PostgreSQL 17 Migration, & SQLite Optimization Deep Dives Windows 11 Microsoft Account Login Recovery During Internet Restrictions The Linux Commands You Forgot Exist (And Why AI Workflows Make Them Relevant Again) Spec-Driven Development Without an IDE: I Generated NestJS, Go, Spring Boot, Laravel, and Rust Apps From a Single PRD File Components are states Edge SEO y Middleware: Cómo Interceptar a Googlebot y LLMs antes de llegar a tu Servidor Context window exceeded at turn 23. Here's how I track token usage without a tokenizer. My Hermes agent spent $3 before I noticed. Now it can't. My Hermes agent's stop condition was a 40-line if/elif chain. I replaced it with 3 lines. My agent kept hitting context limits. This one function fixed it. Create and configure Azure Firewall Your Hermes agent's audit log is leaking customer emails. Here's a 100-line lib that fixes that. My agent kept forgetting what it was doing. A scratchpad fixed it. I replaced 200 lines of ad-hoc state management in my Hermes agent with one object. Per-Key Rate Limiting for Agent Tool Calls: Stop One User From Breaking Everything Composable Output Guardrails: Filter Agent Responses Before They Reach Users Sanitize Your LLM Message Lists Before Every API Call Thread a Run ID Through Every Agent Call So You Can Debug Anything Normalize Provider Error JSON So Your Agent Can Actually Handle Failures Priority Queue for Agent Sub-Tasks: Stop Processing Low-Priority Work First Static Lint Rules for Your LLM Prompts (Before They Hit Production) tool-call-budgets: Stop Runaway Agent Loops Before They Hit Your Invoice Step Through Your Agent's Failures Like a Debugger The Simplest Stop Condition: A Hard Cap on Agent Loop Iterations Score Your Agent's Responses With a 0.0-1.0 Rubric (No LLM Judge Required) Fix Bad Structured Output by Feeding the Error Back to the Model Building an effective Storyblok Tool Plugin with SvelteKit How to Get Your Renault / Dacia Radio Code for Free RAG 시스템 실전 구축 (v39) Retraction — scrml’s Living Compiler I built a fitness app where the AI roasts you for eating pizza (and hypes you when you PR) The Top SaaS Founder Communities on Discord (Beyond the AI Hype) I Built a Production-Grade Async Job Queue from Scratch — Here's Everything That Actually Happened How to watch SMS from multiple Android phones in one iOS app We Didn’t Want Another AI Wrapper — So We Explored a High-Speed Hermes Orchestrator for Engineering Crews Multi-tenant além do TenantId: problemas reais e aprendizados em sistemas .NET After failing 23 times, I am sharing How I Actually Prepare for a Tech Interview Every Single Time Now. I built an app that works like a nutritionist for your brain. Here's what happened in 7 days. GoBadge Dynamic: From Module Stats to Universal Badges LangGraph 워크플로우 템플릿 (v39) The git Commands You Forgot Exist (And Why AI Workflows Make Them Relevant Again) Six Levels of MCP Servers One container to replace Grafana + Loki + Tempo + Prometheus The Request/Response Cycle, HTTP, Auth, JWT, OAuth & Sessions — Explained Properly Python Week 3: We Stopped Repeating Ourselves (Loops!) Creating a Custom Grid Editor tool in Unreal Engine 我做了个付费 Telegram bot。Telegram Stars 实际给开发者多少钱,我算了一笔账。 I Got 96% Recall on LLM Hallucination Detection With No ML Model – Just 50 Lines of Python A practitioner's guide to getting more value out of AI coding: agent quality & token optimization How to Handle Telegram Albums in Telegraf I Built a Multilingual Spam Detection Dataset with 149K+ Messages Across 23 Languages How to Handle Telegram Albums in grammY RAG 시스템 실전 구축 (v38) Beyond Pip Install: Why Your AI Agent Needs a "Hermetic" Life-Support System to Survive Resume Building using HTML & CSS SpecFlow: Multi-Agent SDD in Cursor (4 phases, /approve, single code writer) Running ASR for smart homes in the NPU of Intel processors "Building a CI/CD Pipeline From Scratch: A Practical Guide for Developers (with GitHub Actions)" SpecFlow: SDD multi-agente en Cursor (4 fases, /approve, un solo escritor de código) How to Extract Your Full Team Hierarchy from HubSpot (the API doesn't expose it) Adobe Commerce Cloud now costs $40k/year. We migrated from Adobe Commerce to Magento Open Source — here's the honest breakdown .klickd v4.0.0 — Portable AI memory with constraints, strict schemas, and test vectors We Trust Third Party Code, It’s Time to Trust AI Generated Code
Scaling to 1 Million Users : Load Balancing & Caching Strategies
Emmanuel Onu · 2026-05-26 · via DEV Community

You build a URL shortener in a weekend. It works perfectly. Then it goes viral.

At first it’s just your friends. Pages load instantly, and they love what you’ve built. They share the link with others, and you watch the user count tick towards a hundred. That quiet excitement hits; you’ve made something real. Then more people start using it.

You keep opening your dashboard. The numbers are climbing faster than you can refresh. You are half excited, half terrified. Hundreds is turning into thousands. Then the notifications start: the app is slow, links are not redirecting, users are complaining publicly. The same system that handled a hundred users without blinking is now falling apart under a thousand. You have not changed a single line of code.
So what went wrong?

Nothing went wrong. You just hit the wall that every growing system hits eventually.
The question is:

What do you actually do?

The Scaling Roadmap

Scaling isn’t a single decision, it’s a series of targeted upgrades, each one unlocking the next order of magnitude. Here’s the progression every high-traffic app follows:

•Single Server - handles your first few thousand users
•Load Balancer - distributes traffic across multiple servers
•Caching Layer - serves popular data from memory instead of the database
•Content Delivery Network (CDN) - pushes content closer to users globally
•Distributed Cache - spreads cache across multiple machines for millions of users

Single Server : Your first deployment runs everything on one machine: request handling, database queries, link generation, and page serving. This is perfectly fine up to a few thousand users; the pages load fast and the setup is simple. Don’t over-engineer this stage.

Load Balancer : Once you’re in the tens of thousands, a single server starts to buckle. Requests queue up, response times climb, and occasional timeouts start appearing. A load balancer sits in front of your servers and distributes incoming traffic across a pool of app servers, ensuring no single machine becomes a bottleneck. Traffic spikes that would have crashed your app are now absorbed gracefully.

Caching Layer : At hundreds of thousands of users, a pattern becomes obvious: the same short codes are being resolved over and over. Instead of hitting the database every time, a cache layer stores the most frequently accessed mappings in memory. A lookup that previously cost a 40ms database round-trip now completes in under 1ms. Database load drops dramatically, and your app can handle far more concurrent users on the same hardware.

Content Delivery Network (CDN) : Once your users are spread across the globe, physical distance becomes a problem. A CDN places copies of your static assets and cache-able responses at edge locations around the world. A user in Lagos, Berlin, or Sydney gets their redirect served from a nearby edge node rather than your origin server in, say, Virginia. Latency drops from hundreds of milliseconds to single digits.

Distributed Caching : At millions of users, even a single powerful cache server becomes a constraint. A distributed cache; like a Redis Cluster, spreads data across multiple nodes. The most popular short links are served instantly from memory, read throughput scales horizontally, and the system stays fast even under massive, sustained load.

Load Balancing: Distributing Traffic Across Servers

Round-Robin : Round-robin is the simplest traffic distribution strategy: each incoming request is sent to the next server in rotation, cycling back to the start. It works well when servers are equally capable and traffic is fairly uniform. For a URL shortener handling stateless redirect requests, round-robin is a reasonable starting point at modest scale.

But round-robin has a critical blind spot. It knows nothing about data locality. If one server has cached a hot short code in memory, round-robin may send the next request for that code to a different server entirely, causing a cache miss. At scale, this causes unnecessary database pressure and unpredictable latency. Adding or removing servers also reshuffles which server handles which requests, wiping out accumulated cache state.

The Rehashing Problem : Imagine your URL shortener has four servers, each caching a quarter of your popular short codes. You add a fifth server to handle increased load. With naive modulo hashing (short_code % number_of_servers), roughly 80% of your cache keys now map to different servers. Users experience redirect failures and slowdowns while servers frantically rebuild their caches.
It’s like rearranging a warehouse mid-shipment.

Consistent Hashing: The Production Solution
Consistent hashing solves this cleanly. Picture a ring. Servers occupy fixed positions along the ring, and each short code is hashed to a point on the ring. Requests route clockwise to the nearest server. When you add a new server, only the keys in the arc immediately preceding its position need to migrate roughly 1/N of total keys, where N is the number of servers. Virtual nodes (multiple positions per server) smooth out load distribution even further.

For your URL shortener, consistent hashing on the short_code ensures that popular links reliably route to the server holding their cache, and that adding capacity during a traffic spike doesn’t cascade into a cache stampede.

Here’s how round-robin looks in an NGINX upstream configuration:

upstream app_servers {
    server app1.example.com;
    server app2.example.com;
    server app3.example.com;
}

server {
    location / {
        proxy_pass http://app_servers;
    }
}

Enter fullscreen mode Exit fullscreen mode

Algorithm comparison:

Algorithm Keys Moved on Change Used By Best For
Round-Robin N/A (no cache affinity) NGINX default Stateless, uniform requests
Mod-N Hashing ~80% when N changes Legacy systems Static server pools only
Consistent Hashing ~1/N (minimum possible) DynamoDB, Cassandra, Akamai Dynamic scaling, cache affinity
Power of Two Choices N/A (load-aware) AWS Lambda, Envoy Multi-LB environments, service mesh

Real-world precedent : Netflix applies consistent hashing to route requests to the servers holding cached video segment data. Popular content is served without repeatedly querying origin storage, keeping playback smooth even under massive load. The same principle applies directly to your URL shortener.

HTTP Caching: Making the Web Faster

HTTP caching is built into the web protocol. When configured correctly, browsers and CDN edge nodes store responses locally, eliminating redundant trips to your origin servers. The key headers are:

•Cache-Control - defines how long content should be stored and by whom
•ETag - a fingerprint that lets clients check whether cached content is still fresh
•Vary - specifies which request headers affect the cached response

Understanding Cache-Control : A common misconception: Cache-Control: no-cache does not mean “don’t cache.” It means “cache, but revalidate before serving.” The response can live in memory; it just can’t be served without checking freshness first. Understanding this distinction is essential to using caching effectively.

A more powerful pattern is splitting browser and CDN TTLs:

Cache-Control: public, max-age=60, s-maxage=3600

This tells browsers to cache for 60 seconds (so users get fast responses on repeated clicks) and CDNs to cache for an hour (so your origin servers rarely see requests for popular links). Browsers validate frequently; CDNs absorb the bulk of the load.

ETags and Conditional Requests : On the first request, your server returns a response with an ETag header, a hash or version identifier. The browser stores it. On the next request, the browser sends the ETag back. If the content hasn’t changed, the server responds with 304 Not Modified. No body is sent, bandwidth is saved, and the user experiences an instant load. For a URL shortener, this matters for any metadata pages where content changes infrequently.

Stale-while-revalidate : stale-while-revalidate allows serving an expired cache entry immediately while fetching a fresh copy in the background. Applied to your URL shortener, this means a redirect response can be served from cache even after its TTL expires, with the cache refreshed transparently. Users never see a delay during high-traffic bursts.

The Vary Header Trap
Vary: User-Agent forces caches to store a separate copy for every distinct browser and device type. This silently destroys cache efficiency, every variation gets its own slot, and cache hit rates collapse. Avoid broad Vary headers unless you’re genuinely serving different content per device.

CDN Architecture: Bringing Content Closer to Users

At its core, a CDN is a distributed HTTP cache. Instead of routing every request back to your origin server, copies of your content live at dozens of edge locations worldwide. For a URL shortener, this means viral links, the small fraction that receive massive traffic can be served entirely from the edge, with zero database involvement.

Pull CDN vs Push CDN

Pull CDN : lazily fetches content from your origin only when a user first requests it. The cache fills naturally over time. Ideal for dynamic or unpredictable content, like short codes whose popularity you can’t know in advance.

Push CDN : requires you to proactively upload content to edge nodes. Best for static resources or pre-generated redirect tables for your most popular links.

Real-world precedent : Netflix Open Connect achieves a 98% CDN cache hit rate for video streams. Nearly every video chunk is served from the edge, not from Netflix’s origin data centers. The same model applies directly to a URL shortener: the top 0.1% of links can be handled entirely at the edge, leaving your database untouched.

Cache invalidation strategies:

Strategy How It Works Speed Use Case
TTL Expiration Content expires automatically after N seconds Delayed (waits for TTL) Slow-changing content:: blog posts, product pages
Purge API Manual API call instantly removes cached content Fastly: 150ms global News, e-commerce inventory, breaking content
Surrogate Keys Tag responses; purge all tagged objects at once Same as purge Complex relationships: purge all product-123 pages
Soft Purge Mark stale, serve old while refreshing in background Immediate serve High-traffic pages where downtime is unacceptable

CDN provider comparison:

Feature Cloudflare AWS CloudFront Fastly
PoPs 330+ cities 750+ PoPs + 1,140 embedded ~200 strategic
Routing Anycast (single IP, BGP routing) DNS-based (+ Anycast option) Anycast
Purge Speed Sub-150ms global Seconds to minutes 150ms global (since 2011)
Edge Compute Workers: V8 Isolates, <1ms cold start Lambda@Edge or CF Functions Compute@Edge: WebAssembly
Cache Invalidation Purge API + Cache Rules API (slow) + versioned URLs Surrogate keys: best in class
Free Tier Generous: unlimited bandwidth Pay per GB from first byte No free tier

Redis: Application-Level Caching

Beyond the HTTP layer, your application needs its own in-memory cache. Redis is the industry standard: it stores data in RAM rather than on disk, making look-ups orders of magnitude faster than a database query. For a URL shortener, Redis is the layer that makes redirect responses feel instantaneous.

Cache-Aside: The Recommended Pattern
When a user clicks a short link, your app checks Redis first. If the mapping is there, it’s returned immediately. If not, the app queries the database, returns the result, and stores it in Redis for future requests. Most subsequent clicks on that link never touch the database.

def get_short_url(short_code):
    url = cache.get(short_code)       # Step 1: Check cache
    if not url:                        # Step 2: Cache miss
        url = db.query(short_code)     #   Query database
        cache.set(short_code, url)     # Step 3: Populate cache
    return url

Enter fullscreen mode Exit fullscreen mode

Write-Through vs Write-Behind

Write-Through : writes to both cache and database simultaneously. Guarantees consistency but doubles write latency. Use this when data correctness is non-negotiable.

Write-Behind : writes to cache first and flushes to the database asynchronously. Faster writes, but risks data loss if the cache crashes before the flush completes. Use this for high-throughput analytics where some loss is acceptable.

Cache Stampede: The Failure Mode You Must Plan For
A cache stampede happens when a popular cache key expires and thousands of concurrent requests simultaneously find a miss. Each one fires a database query. The database buckles under the load. For a URL shortener, a single viral link expiring at the wrong moment can trigger exactly this scenario.

Three defences:

1.TTL jitter: Randomize expiry times slightly so keys don’t expire simultaneously
2.Distributed lock (Redis SET NX EX): Only one request rebuilds the cache; others wait
3.XFetch: proactively Refresh hot keys just before they expire, preventing the miss entirely

Memory Optimization

Real-world precedent: Instagram stored 300 million URL mappings in Redis using 21 GB of memory. By switching to Redis ziplist encoding (which compacts small structures), they reduced that to 5 GB - a 76% reduction. For your URL shortener, similar techniques (efficient serialization, compact data structures) can dramatically cut infrastructure costs at scale.

Eviction Policy
allkeys-lru(Least Recently Used) is the right default for general workloads. If your traffic follows an 80/20 pattern: 20% of links generating 80% of clicks, then allkeys-lfu (Least Frequently Used) keeps your hottest links in memory while evicting cold ones. Choosing the right policy ensures cache performance holds under sustained load.

Everything Applied: One Request, End to End

So let’s say a user in Nigeria clicks a short link in a tweet. Here’s exactly what happens across the full stack:

Step 1: CDN Edge (~5ms)

The request hits the nearest CDN edge location. For the top 0.1% of viral links, the ones cached at the edge via s-maxage, the redirect response is returned immediately. The request never reaches your servers. TTL jitter ensures popular links don’t expire in sync, preventing coordinated cache misses.

Step 2: Redis / Cache-Aside (~10ms)

If the CDN doesn’t have the link, the request reaches your app servers. Cache-Aside checks Redis for the short_code. A hit returns the mapping in under 10ms. A miss triggers the database path. Distributed locks or XFetch prevent simultaneous misses from cascading into a stampede.

Step 3: Database (~40ms)

On a cache miss, the app queries the sharded database (consistent hashing routes the query to the correct shard), retrieves the mapping, writes it back to Redis, and responds. Ziplist encoding and appropriate eviction policies keep the cache lean and performant for the next request.

Step 4: Redirect Response: 301 vs 302

Why 302 and not 301? Bitly famously uses 302 because click analytics are their core product. A 301 permanently caches the redirect in the browser, making future clicks invisible to their tracking. A 302 ensures every click is recorded. For your URL shortener, the answer depends on whether analytics matter more than marginal performance gains.

Performance at scale; back of the envelope:

Metric Calculation Result
New URLs created 100M per month / 30 days / 86,400 sec ~40 writes/second
URL redirects (100:1 read ratio) 40 writes/sec × 100 4,000 reads/sec (40K at peak)
Short code space (7 chars, Base62) 62^7 3.52 trillion combinations (~100 years)
Storage (5 years) 100M × 12 months × 5 years × 500 bytes ~3 TB before replication
Redis hot cache (top 1% = 90% of traffic) 1% of daily URLs × 500 bytes ~330 MB caches 90%+ of reads

What breaks at each scale stage:

Scale What Breaks The Fix
0-1K users Nothing Single server, SQLite or MySQL, no Redis needed
1K-10K users Database read bottleneck Add Redis cache-aside, add read replica
10K-100K users App server CPU ceiling Load balancer + 2-3 app servers (Round-Robin)
100K-500K users Cache miss spikes overwhelming DB CDN for redirects, TTL jitter, Redis cluster
500K-1M users Database write throughput ceiling Sharding with consistent hashing, async analytics
1M+ users Single-region latency for global users Multi-region, GeoDNS routing, regional Redis clusters

Key Trade-Offs

Decision Option A Option B Choose Based On
Redirect type 301 Permanent (browser caches; no return trips) 302 Temporary (every click reaches your servers) Need analytics? Use 302. Click data is the product for companies like Bitly.
Short code generation Auto-increment + Base62 encode (predictable, zero collisions) Hash (MD5 truncated; collision risk) At scale, auto-increment + XOR obfuscation beats hash complexity.
Database choice NoSQL (DynamoDB/Cassandra): horizontal sharding native SQL (MySQL): simpler, vertical scaling ceiling At 4,000 reads/sec, NoSQL with consistent hashing wins.

The Engineering Mindset

The best engineers are not the ones who know every tool. They are the ones who understand trade-offs deeply enough to make the right call for their specific system, their specific constraints, and their specific users.

For example, as a developer in Nigeria, your constraints are real: users are on expensive data plans, connectivity that drops without warning, servers that are geographically far away. Every caching decision you make is an act of empathy for the person on a 3G connection in Kano trying to load your app. Build accordingly. The goal isn’t to over-engineer early, it’s to design systems that evolve as demand increases.

Scaling to one million users is less about powerful hardware and more about smart architecture. Three principles drive it:

•Distribute traffic using load balancing
•Reduce repeated computation through caching
•Move content closer to users using CDNs

When combined, these strategies dramatically reduce database load, lower infrastructure costs, and keep your application fast, from your first hundred users to your first million.

Resources

•Designing Data-Intensive Applications. Kleppmann (chapters 5-6)
•RFC 9111. HTTP Caching (current standard)
•ByteByteGo YouTube. Alex Xu; visual system design
•Instagram Engineering Blog. Redis memory optimization
•Scaling Memcache at Facebook. NSDI 2013 (Nishtala et al.)
•Redis Official Docs. Caching patterns & eviction reference

What’s Next?

Once caching and load balancing are in place and your system is serving one million users reliably from cache, the next frontier is real-time communication at scale. Technologies like WebSockets and Server-Sent Events introduce a fundamentally different set of constraints; persistent connections, event fan-out, and stateful session management.

Next post will cover WebSockets, HTTP polling, and Server-Sent Events. Follow or subscribe for the next post.