惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Jina AI
Jina AI
NISL@THU
NISL@THU
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
GbyAI
GbyAI
SecWiki News
SecWiki News
Microsoft Azure Blog
Microsoft Azure Blog
J
Java Code Geeks
B
Blog RSS Feed
Blog — PlanetScale
Blog — PlanetScale
Schneier on Security
Schneier on Security
V
Vulnerabilities – Threatpost
C
CXSECURITY Database RSS Feed - CXSecurity.com
V
Visual Studio Blog
宝玉的分享
宝玉的分享
Recent Announcements
Recent Announcements
T
True Tiger Recordings
F
Full Disclosure
Martin Fowler
Martin Fowler
D
Docker
Stack Overflow Blog
Stack Overflow Blog
Security Latest
Security Latest
A
About on SuperTechFans
雷峰网
雷峰网
Know Your Adversary
Know Your Adversary
Application and Cybersecurity Blog
Application and Cybersecurity Blog
Hacker News: Ask HN
Hacker News: Ask HN
B
Blog
V
V2EX - 技术
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Google DeepMind News
Google DeepMind News
S
Security Archives - TechRepublic
Google DeepMind News
Google DeepMind News
人人都是产品经理
人人都是产品经理
Malwarebytes
Malwarebytes
C
Check Point Blog
美团技术团队
P
Privacy International News Feed
Recorded Future
Recorded Future
博客园 - 司徒正美
T
The Blog of Author Tim Ferriss
L
LangChain Blog
Project Zero
Project Zero
P
Proofpoint News Feed
有赞技术团队
有赞技术团队
P
Proofpoint News Feed
Scott Helme
Scott Helme
C
CERT Recently Published Vulnerability Notes
云风的 BLOG
云风的 BLOG
T
ThreatConnect
F
Fox-IT International blog

DEV Community

I Thought Coding Was The Job Beginning to market Introducing Batch Processing for ZeroGPU Kiln Crisis Management: Controlling Irregular Raw Meal in CCR Using Python The Grilling Optimizing a High-Throughput Browser-Based Box Shadow Generator: Debounced State Updates and Chunked File Readers I Was Spending $3,200/Month on GPT. Then I Tried Chinese Models. Why You Must Stop Pasting Production Payloads into Web Decoders: Building a Secure Base64 Decode Strategy Message Brokers Comparison 2026 — Kafka, RabbitMQ, NATS & Redis Streams: Which One Should You Choose? Your Git Tree Looks Like a Crime Scene: How to Write Commits That Don’t Suck I tried every popular library for programmatic PDF form filling. None of them survived production The const enum that took down our payments Architecture of Chaos Part 3 — Event Sourcing Saved Our Audit Trail, Then a Fiber Cable Broke Stop Paying Per Cert. It's Crazy. Building Embeddable Browser Games for Website Engagement Build a Privacy-First Tampermonkey Script for Long ChatGPT Conversations XSS Attacks Are Everywhere: Reflected, Stored, DOM-Based — How to Actually Fix Them (2026) Stop letting LLMs hallucinate dates — a tool for AI agents The Platform Team Became a Finance Team /align v0.8 — personal evals for Claude Code, maintained by an LLM agent Copilot helped me deploy my passion project to the App Store Software Engineering: The Art of Thinking Out Loud (with AI) Leaked Kubernetes Secrets: Impact Assessment and Mitigation Strategies First 90 days as a junior engineer on an AI-heavy team: what to learn first Something Honest About Being a Developer on This Kind of Team JSON Schema Validator Advanced Techniques for Power Users I Built Hermes Immune System — A Safety Lab for AI Agents Google I/O 2026: MCP Is Now Infrastructure (Spark, Managed Agents, WebMCP & More) Probabilistic Graph Neural Inference for deep-sea exploration habitat design for extreme data sparsity scenarios QuantConnect Review: Running 2,400 Backtests Without Installing a Single Python Library The Complete Guide to Video APIs in 2026 (And Why Your Choice of Tool Actually Matters) Alpha Vantage vs Yahoo Finance API: Free Market Data for Side Projects — An Honest Comparison Day 20 of 60: I Built a Production-Grade Authentication System with JWT Tokens and API Key Managemen Nobody on the internet knows if you are a human The fastest way to optimize images for your web projects (Zero Server Roundtrips) We Got Burned by Veltrix Configuration Layer and Lived to Tell the Story Why Block Handed Goose to the Linux Foundation: Agentic AI Goes Open The Delve Scandal Proved SOC 2 Is Broken — Here's What Micro-SaaS Founders Should Do Instead OpenTelemetry: The Foundation of Modern Cloud-Native Observability — Traces, Metrics, Logs, and the Future of Observability Arc Browser Review: 18 Months With a Browser That Thinks Differently [Boost] Docker healthchecks: what they actually measure and what you shouldn't promise Docker healthchecks: qué miden de verdad y qué no deberías prometer I Built an AI That Roasts Cold Emails — Here's What 18,000 Drafts Taught Me Are You My Parent?: Scaffolding in the architecture necessary for keyboard handling between components. The AI Labs Found Product-Market Fit in April How I Stopped Fighting AI Context: JetBrains AI vs. Copilot in Rider I Accidentally force-pushed to main at 11 PM — So I Built an Interactive Git Undo Tool Perplexity Spaces vs You.com vs Phind: which AI search fits your dev research workflow I'm 14, can't code, and built a cognitive state app in one day — here's what happened Three Cloudflare Patterns Earned the Hard Way Aider Review: The Open-Source AI Pair Programmer That Works With Any LLM How to Measure and Improve Core Web Vitals in Under 30 Minutes Standardizing Feature Flags Is Easy to Agree On. Migrating Safely Is the Hard Part. What if UI tests validated user experience instead of selectors? Why I Stopped Believing 'Best Practices' and Started Trusting 'Works For Us' PrestaShop Doctrine: Automatically Manage the DB Prefix PrestaShop Enterprise vs Shopify Plus A .NET Dinosaur in Web3 — Day 15: DAO Voting Halyra IDE Wearable App Development Cost: How to Build a Quality MVP Without Overspending New in Vue - May 2026 427 Remote Companies Using TypeScript in 2026 MCP CI gates need receipts: tools/list is not enough 📖 DICTIONARIES IN PYTHON: THE SMART DATA VAULT I Generated a Tableau Dashboard Using Gemma 4 — Locally, No API Key, No Cloud The Hidden Way Electronics Can Start a Fire — Even Without an Open Flame I Built a Beginner-Friendly NGINX Automation CLI for Linux Servers Vibe Thinking - The PM Who Writes Requirements That an AI Can Actually Use A Refreshing Perspective on AI and Truth Kubelet Metrics: How cAdvisor and CRI Collect Kubernetes Stats How to Optimize MongoDB on Bare Metal Servers: SRE Playbook Why I Built Bamise Instead of Using Laravel How to Build a Clean Academic Dataset Without Losing Your Mind (or Your Weekend) Kubernetes Is Eating Your Budget: How to Fix EKS Over-Provisioning What Awnings Taught Me About Developer Experience Tree Traversal: Why the Order You Pick Is a Data Flow Decision I built my own forum using PHP- it came out great Optimizing Chunking and Data Extraction for Zero-Hallucination RAG Controlling Blender with AI — Building an MCP Server for 3D Creation 5 Smart Contract Vulnerabilities Every Developer Should Know in 2026 Cursor users who write failing tests before prompting the AI complete features in 37% fewer iterations than those who pr When AI Becomes a Danger: 370,000 Grok Conversations Exposed I Refactored 100 Functions With Claude. CI Was Green. Production Got Slower in 7 Spots. I read my own commits like a stranger Child Safety vs. Data Center Dollars The Reason Your AI Chatbot Feels Fast Has Nothing to Do With a Better Model Beyond Vibe-Coding What I learned testing AI translation tools in 2026 (DeepL is still good, but LLMs caught up) AWS ECS Fargate Cost Allocation: Why Your Per-Cluster Spend Shows as One Line How to Surface License Violations in GitHub Advanced Security with feluda We Deleted 10 Real Users with a Test-Cleanup Script — RCA The Decision Subtraction Framework: How to Evaluate Any AI Tool How I Access My Home PC From Anywhere Without Spending a Penny # agents.md: Teaching AI Agents How to Scrape (The Future of Web Automation) KAI vs Global vs Tojiro vs Miyabi: How to Actually Tell Japanese Knife Brands Apart Why We Accidentally Blocked Our Users: A Deep Dive into Idempotency in Distributed Systems I Connected Hermes Agent to a Live MCP Server with 59 Tools and Here's What It Actually Built Our first app is finally live on the Play Store after 4 months of hard work 🚀 I Built UUIDs That Look Random But Sort Like Timestamps (50% Smaller Indexes!)
Why Your Treasure Hunt Engine Kept Crashing at 1.2M Concurrent Connections
Lisa Zulu · 2026-05-28 · via DEV Community

The problem we were actually solving was not how to make the treasure hunt more fun, but how to keep the leaderboard from exploding the heap when 1.2 million players hammered the Redis cluster at exactly 3:17 PM every Tuesday. The marketing team called this peak engagement; I called it a memory avalanche. We were running Veltrixs open-source treasure-hunt engine on a 32 GB RAM instance, and every spike turned the node into a swap-to-death zombie. The leaderboard tier used an in-memory sorted set that Redis advertises as O(log N) per operation, but at N=1.2 M the constant factor was high enough that the Lua scripts were spending more time context-switching than updating scores. We hit 400 MB of RESident memory per process, and once the Go garbage collector paused for 420 ms, the TCP backlog overflowed and dropped 37k ZADD requests. That was the first time the CEO noticed the word cache.

What we tried first (and why it failed)

We started with the obvious: bump the Redis instance to 128 GB, move the leaderboard to a separate node, and slap a read replica in front. The operator docs called this horizontal scaling. What they did not mention was that Redis Cluster splits the sorted set across slots, so a single players score update might fan out to three different primaries. When players near the top of the leaderboard updated their scores, the cluster saw a surge of cross-slot traffic that turned the network card into a bottleneck. We were pushing 8 Gbps of intra-cluster traffic with only 3 Gbps of actual game updates. The Redis cluster bus protocol started dropping gossip messages, and the cluster lost the view of slot ownership for 11 seconds. Those 11 seconds were enough for two dozen nodes to start a new election cycle, and the leaderboard froze while the slot map reconverged.

The architecture decision

We ripped out the Redis sorted set entirely and built a two-tiered leaderboard in Go. The first tier is a write-through sharded map: 64 shards, each an in-memory radix tree keyed by player ID. Each shard is a goroutine that batches writes into a 4 KB buffer and flushes asynchronously to a single PostgreSQL table partitioned by shard ID and date. We chose PostgreSQL not because it was fast—it absolutely is not—but because it gave us a real transaction log. If the process crashes, the WAL guarantees we can restore the last committed batch in under 100 ms. The second tier is a materialized view rebuilt every 30 seconds by a worker that runs a windowed SQL query over the last hours partitions. We serve reads from the materialized view, so leaderboard queries never touch the in-memory tree during the spike.

We added one more trick: a small LRU cache in front of the radix trees. Cache entries have a 5-minute TTL, but we use a version vector embedded in the score itself. When a players score changes by any amount, we increment the version and invalidate the cache entry atomically in the same PostgreSQL transaction that commits the new score. The cache hit rate hovers between 72 % and 78 %, but even on a miss the read path is a single radix lookup plus a single hash lookup instead of scanning the entire set.

What the numbers said after

After the change the 99th percentile leaderboard latency dropped from 420 ms to 3 ms. The peak memory footprint per shard stayed under 200 MB, and the Go process GC pause averaged 1.2 ms even under 1.4 M concurrent connections. The write throughput increased from 2.1k ops/sec to 220k ops/sec, and the cross-shard traffic vanished because every update stayed on one PostgreSQL connection. PostgreSQL CPU utilization peaked at 35 % during the Tuesday spike, which gave us headroom for the next 3× growth without touching the cluster again.

What I would do differently

I would not have trusted the Redis Cluster slot map to stay consistent during network hiccups. If I had to do it again, I would replace the PostgreSQL worker with a streaming job that consumes the write-ahead log directly via pg_logical and builds the materialized view incrementally. That way the view never lags more than a few hundred milliseconds behind the sharded in-memory trees, and we avoid the 30-second staleness window.

I would also instrument the radix tree with a Prometheus histogram that records the number of pointer dereferences per lookup. That metric told us the trees were compact enough that we rarely walked more than six levels, but the histogram made it obvious when a hot shard started growing unpredictably. Without that visibility we would have missed the 4 % of queries that were doing 12 levels of traversal after a few weeks of churn.