惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

H
Help Net Security
T
ThreatConnect
SecWiki News
SecWiki News
F
Future of Privacy Forum
AWS News Blog
AWS News Blog
C
Cisco Blogs
A
Arctic Wolf
Vercel News
Vercel News
The GitHub Blog
The GitHub Blog
Scott Helme
Scott Helme
V
V2EX
博客园 - 叶小钗
阮一峰的网络日志
阮一峰的网络日志
K
Kaspersky official blog
G
Google Developers Blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
P
Privacy International News Feed
C
Cyber Attacks, Cyber Crime and Cyber Security
N
News | PayPal Newsroom
Schneier on Security
Schneier on Security
NISL@THU
NISL@THU
Microsoft Azure Blog
Microsoft Azure Blog
量子位
The Hacker News
The Hacker News
Stack Overflow Blog
Stack Overflow Blog
Security Latest
Security Latest
M
Microsoft Research Blog - Microsoft Research
Google Online Security Blog
Google Online Security Blog
博客园_首页
C
CXSECURITY Database RSS Feed - CXSecurity.com
I
InfoQ
Google DeepMind News
Google DeepMind News
Y
Y Combinator Blog
The Cloudflare Blog
Microsoft Security Blog
Microsoft Security Blog
Martin Fowler
Martin Fowler
Cisco Talos Blog
Cisco Talos Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
T
Troy Hunt's Blog
F
Fox-IT International blog
S
Security @ Cisco Blogs
博客园 - 司徒正美
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
C
Comments on: Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
L
LINUX DO - 最新话题
GbyAI
GbyAI
Project Zero
Project Zero
腾讯CDC
T
Tailwind CSS Blog

DEV Community

Reset Windows Update: The Definitive MSP Guide to RWU Your Resume Was Never Built for This VCP-Virtual Private Cloud RAG 시스템 실전 구축 (v42) copilot cloud agent is becoming an automation api Cx Dev Log — 2026-04-23 Why Tesla Is Becoming the AI Enterprise Case Study Every Leader Should Understand ORA-00214 오류 원인과 해결 방법 완벽 가이드 SpecAgnt v2.0: The Agent Lifecycle Framework for AI-Native Engineering Optimizing Signal Latency and Weight Allocations in Algorithmic Pipelines SSH Under the Hood: Protocols, Mechanisms, and the Full Technical Story دليل بوابات الدفع للتاجر العربي في 2026 (وكيف تختار المناسبة لمتجرك) Cómo Mi Configuración de Docker Me Salvó de un Ataque de Supply Chain (Y Por Qué la Tuya Debería Hacerlo También) How My Docker Setup Saved Me From a Supply Chain Attack (And Why Yours Should Too) Astro: The epitome of SEO Technical Update I Gave My AI Agent the Ability to Research Before It Writes — Here’s What Changed Kubernetes sem Cloud Provider (Parte 2): Criando Operators em Go para automação e self-service de plataforma AI Memory Needs an Authority Policy, Not Just More Context You've done tutorial after tutorial. Your GitHub is still empty. (Free 1‑page PDF, no signup) TypeScript 7.0: The Go Compiler That Makes TS 10x Faster Connecting Wallets the Right Way: wagmi v2 and EIP-6963 The 5-Layer Architecture Every Production Multi-Agent System Needs (And Why Most Skip Layers 4 and 5) CSS Scroll-Driven Animations: No JavaScript Required Vite 8 + Rolldown: Rust-Powered Builds That Are 10–30x Faster Core Architectural Components of Azure My Skills How I Use AI as a Senior Engineer Construí um motor ATS determinístico porque estava cansado de adivinhar por que meu currículo era rejeitado SCS-Lab1 — CloudTrail: Trail + S3 + KMS + Log Validation LuisCore MCP server — daily syndication · 2026-05-25 Cursor vs JetBrains Rider for C#/.NET in 2026: which to pay for I built a local-first movie recommender with Corrective-RAG (cited explanations, hybrid retrieval, runs entirely on Ollama) Scaling to 1 Million Users : Load Balancing & Caching Strategies How the Events Table That Looked Right Killed Our Queue Three Failures My AI Memory System Caught — And the Flaw It Revealed in Itself dotnet Framework life cycle tool LangGraph 워크플로우 템플릿 (v41) I built a free image compression API — no signup, just curl Designing TikTok from Scratch — A System Design Deep Dive PREDICTION-20260525-0007: boredom-with-asymmetric-leverage [2026-Q3 through 2027-Q3] [Boost] How to integrate the QuickBooks Invoice API in 2026 How I Cut My Anthropic API Bill by 50% With a Local Python Tool Vibe Coding Problems: 7 Visual Bugs AI Code Generators Always Ship Chinese AI Models 2026: The Agentic Revolution, Hardware Independence, and What It Means for Global Developers The Quiet AI War Inside Your Browser The 12-Line Anti-Bot Trick That Saved Our Airdrop Snapshot From Sybil Farms Building a production-ready SaaS dashboard in Next.js 16 — Recharts, TanStack Table, dark mode, and collapsible sidebar Why 2026 Belongs to Agentic AI (And How to Build Your First Local Agent) It Was 2024 When We Tried to Outsmart the Treasure Hunt Engine RAG 시스템 실전 구축 (v40) I Found a Tool That Generates a Complete .NET 8 or Java Spring Boot API From SQL Schema in 30 Seconds I Added a 4th Agent That Audits My Other Agents. It Caught My Strategist Procrastinating for 3 Weeks. Streaming LLM responses to the browser in Go (Server-Sent Events) How We Publish and Manage Educational Admission Updates at Scale on DailyAxom A prompt is not a conversation. It's a component contract. How to Pass the EAA 2025 Accessibility Audit — A Step-by-Step WCAG Checklist Building an Autonomous MCP Lead Generation System with Hermes Agent LangGraph 워크플로우 템플릿 (v40) How I Built 100 Browser-Based Image Tools With No Server (FFmpeg WASM, PDF-lib, AI Background Removal) Nginx CVE-2026-9256, AI Prompt Injection Defenses, and Claude AI Data Leak Demo Scaling RAG for 10M+ Docs, .md Agent Memory, & Claude Code for Motion Graphics Diagram as Code with draw.io DuckDB Delta, PostgreSQL 17 Migration, & SQLite Optimization Deep Dives Windows 11 Microsoft Account Login Recovery During Internet Restrictions The Linux Commands You Forgot Exist (And Why AI Workflows Make Them Relevant Again) Spec-Driven Development Without an IDE: I Generated NestJS, Go, Spring Boot, Laravel, and Rust Apps From a Single PRD File Components are states Edge SEO y Middleware: Cómo Interceptar a Googlebot y LLMs antes de llegar a tu Servidor Context window exceeded at turn 23. Here's how I track token usage without a tokenizer. My Hermes agent spent $3 before I noticed. Now it can't. My Hermes agent's stop condition was a 40-line if/elif chain. I replaced it with 3 lines. My agent kept hitting context limits. This one function fixed it. Create and configure Azure Firewall Your Hermes agent's audit log is leaking customer emails. Here's a 100-line lib that fixes that. My agent kept forgetting what it was doing. A scratchpad fixed it. I replaced 200 lines of ad-hoc state management in my Hermes agent with one object. Per-Key Rate Limiting for Agent Tool Calls: Stop One User From Breaking Everything Composable Output Guardrails: Filter Agent Responses Before They Reach Users Sanitize Your LLM Message Lists Before Every API Call Thread a Run ID Through Every Agent Call So You Can Debug Anything Normalize Provider Error JSON So Your Agent Can Actually Handle Failures Priority Queue for Agent Sub-Tasks: Stop Processing Low-Priority Work First Static Lint Rules for Your LLM Prompts (Before They Hit Production) tool-call-budgets: Stop Runaway Agent Loops Before They Hit Your Invoice Step Through Your Agent's Failures Like a Debugger The Simplest Stop Condition: A Hard Cap on Agent Loop Iterations Score Your Agent's Responses With a 0.0-1.0 Rubric (No LLM Judge Required) Fix Bad Structured Output by Feeding the Error Back to the Model Building an effective Storyblok Tool Plugin with SvelteKit How to Get Your Renault / Dacia Radio Code for Free RAG 시스템 실전 구축 (v39) Retraction — scrml’s Living Compiler I built a fitness app where the AI roasts you for eating pizza (and hypes you when you PR) The Top SaaS Founder Communities on Discord (Beyond the AI Hype) I Built a Production-Grade Async Job Queue from Scratch — Here's Everything That Actually Happened How to watch SMS from multiple Android phones in one iOS app We Didn’t Want Another AI Wrapper — So We Explored a High-Speed Hermes Orchestrator for Engineering Crews Multi-tenant além do TenantId: problemas reais e aprendizados em sistemas .NET
The Treasure Hunt Engine That Broke Before the Traffic Did
Lillian Dube · 2026-05-26 · via DEV Community

The Problem We Were Actually Solving

We werent building a generic scale story; we were protecting a money-printing loop. The treasure-hunt engine awarded cash prizes every hour, and each award ran a small blockchain simulator to determine rarity. That simulator used about 4 MB of in-memory state per player. When the Rift hit, we had 85 k concurrent players and 290 GB of heap demanded by the single Node process. Our vertical-scrape plan—going from 12 cores / 64 GB to 32 cores / 256 GB—would have cost an extra $6 k per event and still risked another heap OOM on the next traffic surge because Nodes single-threaded GC cannot compact memory while the event loop is saturated. The real problem was not CPU or memory on a bigger box; it was the single-process model itself.

What We Tried First (And Why It Failed)

First we split users randomly across five Node processes behind HAProxy. That lowered max heap per process to ~1.1 GB, but we immediately hit a different wall: the in-memory simulation state was not serializable. We tried Redis to store the 4 MB blob per user, but the SET operations took 15–28 ms on a cloud Redis 7.0 cluster with 5 ms p99 latency. At 85 k players, that became 1.3 million round trips per second, and we saturated the 1 Gbps link between the Node pool and Redis. The error surfaced as 38 % of write operations timing out with:

NOAUTH Authentication required

We also tried sharding Redis into 16 slices, but the Lua scripts we used for atomic rarity calculation could not span multiple slots. We ended up with either duplicated or dropped rewards—something our finance team would not sign off on.

The Architecture Decision

We needed an in-memory store with strong consistency within a shard and a networking stack that could keep up. After running the numbers on three candidates—Dragonfly 1.0 (Redis fork), KeyDB 2.8, and Memurai 2.1—we picked Dragonfly. Its single-threaded, no-fork model gives deterministic latency and uses 40 % less RAM than Redis 7 for the same value size. We carved the key space into 64 shards and fronted it with envoy so that each Node process could open a gRPC stream to its shard instead of TCP. The Node side became stateless; every player affinity routed by the same hash ring to the same shard, so the 4 MB state lived in one place and the atomic rarity calculation ran in a single Lua call that Dragonfly executes in <2 ms.

On the write path we replaced the blocking Redis SET with a pipeline of 32 commands and capped in-flight requests per shard at 1 k. The Node servers started using worker_threads to isolate the simulator from the event loop, so a surge in puzzle-solving CPU would not stall the Redis pipeline. The change cost us two weeks of rewriting the rarity engine from callback-heavy to promise-based, but we gained a 7× latency drop on the critical path.

What The Numbers Said After

After the next Rift, we measured:

  • P95 latency on award transactions: 8 ms (down from 36 ms)
  • Heap per Node process: 240 MB (stable)
  • Redis shard CPU: 42 % (peak across 64 shards)
  • Cost per thousand players: $0.0012 (down from $0.0078)

The 85 k players completed without a single timeout error. The Node pool stayed at 52 % CPU, well below the 70 % inflection point where Nodes event-loop lag starts to climb exponentially. The finance team happily wired the prizes because the blockchain simulator never lost state.

What I Would Do Differently

Id push the stateless boundary earlier. By the time we finished Dragonfly, the Node processes were mostly I/O bound again; the worker_threads helped but added complexity in stack traces and heap snapshot debugging. Next event we will move the simulator into a separate Go micro-service behind gRPC, letting us scale the CPU independently and remove the Node heap entirely. Had we architected for embarrassingly parallel CPU work from day one instead of in-memory cache, we would have avoided the Redis rewrite and the OOM scare. But we also wouldnt have learned that Dragonflys sharded Lua gives us stronger consistency guarantees than Redis Cluster for this specific workload—and that lesson is worth the detour.


We removed the payment processor from our critical path. This is the tool that made it possible: https://payhip.com/ref/dev1