惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

O
OpenAI News
Latest news
Latest news
T
Threat Research - Cisco Blogs
Project Zero
Project Zero
V
Vulnerabilities – Threatpost
T
The Exploit Database - CXSecurity.com
Cloudbric
Cloudbric
T
Threatpost
N
News | PayPal Newsroom
I
Intezer
L
LINUX DO - 热门话题
The Hacker News
The Hacker News
H
Hacker News: Front Page
P
Proofpoint News Feed
S
Secure Thoughts
H
Help Net Security
S
Schneier on Security
TaoSecurity Blog
TaoSecurity Blog
S
Security Archives - TechRepublic
V
Visual Studio Blog
博客园 - 司徒正美
博客园 - Franky
T
Tailwind CSS Blog
aimingoo的专栏
aimingoo的专栏
AI
AI
V
V2EX - 技术
Microsoft Azure Blog
Microsoft Azure Blog
月光博客
月光博客
WordPress大学
WordPress大学
AWS News Blog
AWS News Blog
罗磊的独立博客
C
Cyber Attacks, Cyber Crime and Cyber Security
Webroot Blog
Webroot Blog
Forbes - Security
Forbes - Security
Engineering at Meta
Engineering at Meta
MyScale Blog
MyScale Blog
N
News and Events Feed by Topic
大猫的无限游戏
大猫的无限游戏
L
Lohrmann on Cybersecurity
H
Heimdal Security Blog
S
SegmentFault 最新的问题
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
D
DataBreaches.Net
Blog — PlanetScale
Blog — PlanetScale
小众软件
小众软件
Recent Commits to openclaw:main
Recent Commits to openclaw:main
B
Blog
T
Troy Hunt's Blog
Stack Overflow Blog
Stack Overflow Blog
C
CXSECURITY Database RSS Feed - CXSecurity.com

DEV Community

Authentication Security Deep Dive: From Brute Force to Salted Hashing (With Java Examples) Why AI Systems Don’t Fail — They Drift Spilling beans for how i learn for exam😁"Reinforcement Learning Cheat Sheet" I Replaced Chrome with Safari for AI Browser Automation. Here's What Broke (and What Finally Worked) How Python Borrows Other People's Work The $40 Architecture: Processing 1 Billion API Requests with 99.99% Uptime Vibe Coding: A Workflow Guide (From Zero to SaaS) Most webhook security guides protect the wrong side. The scary part is delivery. Headless CMS for TanStack Start: Build a Blog with Cosmic EU Age Verification App "Hacked in 2 Minutes" — What Actually Happened Comfy Cloud’s delete function does not actually remove files Running AI Models on GPU Cloud Servers: A Beginner Guide Event-driven media intelligence with AWS Step Functions and Bedrock I scored 500 AI prompts across 8 quality dimensions — here's what broke How to Call Google Gemini API from Next.js (Free Tier, No Backend Needed) The Portal Protocol: Reclaiming Human Connection in the Age of AI How to Fix Your Team's Scattered Knowledge Problem With a Self-Hosted Forum Intro to tc Cloud Functors: A Graph-First Mental Model for the Modern Cloud Designing Multi-Tenant Backends With Both Ownership and Team Access I Built a Neumorphic CSS Library with 77+ Components — Here's What I Learned PostgreSQL Performance Optimization: Why Connection Pooling Is Critical at Scale Cómo construí un SaaS multi-rubro para gestionar expensas en Argentina con FastAPI + Vue 3 🚀 I Built an Ethical Hacking Scanner Tool – Open Source Project I Replaced /usage and /context in Claude Code With a Single Statusline A Pythonic Way to Handle Emails (IMAP/SMTP) with Auto-Discovery and AI-Ready Design I Collected 8.9 Million Polymarket Price Points — Here's What I Found About How Markets Really Move EcoTrack AI — Carbon Footprint Tracker & Dashboard Everyone's Using AI. No One Agrees How. 5 self-hosted ebook managers worth trying in 2026 Building Your First AI Agent with LangChain: From Chatbot to Autonomous Assistant Common SOC 2 Failures (Real World) Stop Vibe-Checking Your AI App: A Practical Guide to Evals How to Use SonarQube and SonarScanner Locally to Level Up Your Code Quality Your Next To-Do App Is Dead — I Replaced Mine with an OpenClaw AI Sign a Nostr event in 60 lines of Python using coincurve — no nostr-sdk, no nbxplorer, no rust toolchain ITGC Audit Explained Like You’re in Big 4 Patch Tuesday abril 2026: Microsoft parcha 163 vulnerabilidades y un zero-day en SharePoint Stop scraping everything: a better way to track competitor price changes Listing on MCPize + the Official MCP Registry while routing payments OUTSIDE the marketplace — how I kept 100% of my x402 revenue Building an AI-Powered Risk Intelligence System Using Serverless Architecture Why We Ripped Function Overloading Out of Our AI Toolchain Testing AI-Generated Code: How to Actually Know If It Works SaaS Churn Is Killing Your Business. Here Is What to Do About It (Without a Support Team) The Speed of AI Is No Longer Linear - And Self-Improving Models Are Why How to Implement RBAC for MCP Tools: A Practical Guide for Engineering Teams From Standard Quote to Persuasive Proposal: AI Automation for Arborists I built a CLI that scaffolds complete multi-tenant SaaS apps Axios CVE-2025–62718: The Silent SSRF Bug That Could Be Hiding in Your Node.js App Right Now The dashboard that ended our friendship Data Pipelines Explained Simply (and How to Build Them with Python) The Hidden Cost of AI Systems Nobody Talks About. undefined vs undeclared, and how typeof behaves Switching from file-based jobs to NATS/Kafka in Rust without changing code io_uring Adventures: Rust Servers That Love Syscalls Why Agentic AI is Killing the Traditional Database The POUR principles of web accessibility for developers and designers Quantum Neural Network 3D — A Deep Dive into Interactive WebGL Visualization How To Install Caveman In Codex On macOS And Windows Automation Pipeline Reliability: Why Your Workflow Breaks When Nobody Is Watching I Built an 'Open World' AI Coding Agent — It Works From ANY Folder From Freelancing to Product: A Tech Service Company's SaaS Transformation China's AI Giants: Adding Tencent Hunyuan & ByteDance Doubao to AI University (74 Providers) On the Vibe Coders and Their Lies clerk: Auto-Summarize Your Claude Code Sessions AI Weekly — 2026/04/10–04/17 | The Model Lockdown Is Here, but the Toolchain Is the Real Battleground AI 週報 — 2026/04/10–2026/04/17 模型封鎖潮來了,但工具鏈才是真戰場 Maybe this is how Open-Source apps are born... 🚀 Fine-Tune LLMs with LoRA and QLoRA: 2026 Guide tRPC v11 + Next.js App Router: End-to-End Type Safety Without the Boilerplate ShadCN UI in 2026: Why I Stopped Installing Component Libraries and Started Owning My Components SaaS Billing in React Server Components: Stripe + Supabase Without a Single `useEffect` Join our DEV Weekend Challenge — $1,000 in Prizes Across TEN winners! Submissions Due April 20 at 6:59 AM UTC. Implementing FSRS Spaced Repetition in Flutter + Supabase — Adding Memory Science to an AI Learning App "I Texted My Localhost From the Train — Claude Code Fixed the Bug Before I Got Home" I Built a Sales Prep AI and It Went Deeper Than Expected Design to Code #2: One JSON, Eleven Outputs Solving the 100M-Row Problem: A Summary Table Pattern for High-Volume Push Notification Logs Flutter Web With Wasm: What Actually Changes For Developers I Built 50 Royalty-Free Soundtracks for My Side Project in a Weekend Using AI Music Generation The Vibe Coding Security Checklist: 7 Things to Check Before You Ship Stop Letting Googlebot Guess Fix Your React App's SEO Right Desconstruindo o Streaming do LinkedIn: Como Criar um Engine de Extração de Vídeo de Alta Performance com HLS e FFmpeg (EDA Part-1) EDA (Exploratory Data Analysis) Explained With Real Life — Why Looking at Your Data Is the Most Important Step in Machine Learning Brand Relationship Management at Scale: Our 4-Touch Outreach System for 200+ Brands Why String.fromEnvironment() Might Return an Empty String in Dart JGuardrails 1.0.0 — Hardening Java LLM Apps Against Jailbreaks, Toxicity, and Prompt Injection Plan and Schedule a Full Week of Threads Content From One Claude Conversation Coding Cat Oran Ep3, Five Tables Changed Everything Updated: BFF Pattern I'm done watching freelancers get buried by 200 proposals. So I'm building the alternative. This is my first post BFS Algorithm in Java Step by Step Tutorial with Examples Tracking LLM Pricing Monthly: An Open Dataset for 22 AI Models How We Measure Content ROI on a Comparison Site: Revenue Attribution Without Perfect Data Introducing Nova AI Ops: The AI-Native Operating System for SRE Teams I built a free desktop video downloader for Windows — Grabbit How Talkie OCR Helps Vision-Impaired & Dyslexic Users Read the World Around Them VRCFaceTracking安装和iPhone面捕配置教程,有bug Even CrowdStrike Can't See Your Agents The Automation Gold Rush: What n8n Workflows and Claude Are Opening Up for Developers Right Now
Surviving the region you run in: failover on Aurora DSQL, and what the demo proves
Jonathan · 2026-06-15 · via DEV Community

The thesis Quorum is built on is uncomfortable and true: the tools a team uses to coordinate an incident often live in the same region as the thing that is failing. When the region goes, the incident response goes with it. You are now coordinating a region outage over a status page that the region outage took down.

Quorum is an incident command plane designed to survive a region loss. This post is about how the failover works, what the live demo does and does not prove, and where the survival story currently ends, because a database audience will ask all three and they deserve a straight answer.

What DSQL gives you

A multi-region DSQL cluster in the US set is three regions: two full regions, which for Quorum are us-east-1 and us-east-2, and a log-only witness in us-west-2 that has no cluster endpoint of its own. Both full-region endpoints present a single logical database with strong consistency, and the architecture is designed for 99.999% multi-region availability with no single point of failure and automated failure recovery.

The behavior that matters for an incident tool is stated plainly in the GA announcement: applications can keep reading and writing with strong consistency even when they are unable to connect to a region's cluster endpoint, and the third region acts as a log-only witness with no cluster resource or endpoint. The survivor keeps serving; the witness holds the log so the surviving region keeps commit quorum. Quorum is, in effect, a live demonstration of that reference behavior with an incident-command product wrapped around it.

Quorum's failover layer

AWS's guidance for multi-region DSQL is to put routing in front of the endpoints: either DNS-based routing with Route 53, or application-level routing logic, so traffic redirects automatically when an endpoint becomes unreachable. This is laid out in Implement multi-Region endpoint routing for Amazon Aurora DSQL. Quorum, a Next.js app on Vercel, does the application-level version: it detects an unreachable region and routes writes and reads to the healthy endpoint.

The piece I am most satisfied with is that the health panel is itself failover-protected. A monitor Lambda re-validates failover on a schedule and writes a status snapshot through DSQL. So the component that tells you about the outage reads from the same database that survives the outage. The status display cannot become a casualty of the thing it is reporting on, which is the failure mode that makes most status pages useless at the exact moment you need them.

Ingestion works the same way. A CloudWatch alarm fires, EventBridge routes it, and an ingest Lambda writes the signal into DSQL as an event. Monitoring events become incidents through a path that does not hinge on a single region's data layer.

The capstone is recursive. Running a failover drill inside the product opens a real sev1 incident, "us-east-1 region impairment," which you then coordinate from the surviving region, in the same war room, on the same database, and resolve when the region restores. The drill exercises the exact flow a real region failure would. Because the event UUID is the idempotency key, the drill is safe to run repeatedly without leaving residue.

What the demo proves, and what it does not

Now the precise part, because precision here is the whole point, and it cuts in my favor before it cuts against.

Here is what the demo proves, and it is all real: the application detects an unreachable region and routes to the survivor, the incident record does not fork under contention, the recursive drill opens and coordinates a real incident from the surviving side, and the health panel keeps reading because it reads through DSQL. Those are measured live, on the click. Recovery point is effectively zero, because strong consistency means a failover loses no committed data.

Here is the boundary. The chaos toggle simulates a region's endpoint becoming unreachable, which is AWS's own framing of the failure scenario, and it exercises the application-layer failover. It does not partition DSQL's internal commit quorum, because I cannot safely destroy a real AWS region to film a demo and you should not trust a tool that claimed to. So the latencies the demo shows are happy-path latencies. On the happy path a commit needs a majority of the three cluster members, so it commits as soon as the two fastest acknowledge and never waits on the slowest. With one full region gone, the commit must reach the surviving region and the us-west-2 witness specifically, so it can no longer hide the slowest link behind the quorum, and commit latency in that degraded state runs higher than the demo's numbers. That degraded-quorum behavior is AWS's to guarantee, and Marc Brooker, who led the DSQL team, documents how it stays consistent and available on the majority side of a partition. The application-layer survival is mine to demonstrate, and that is what the demo does. The number is real; what it measures is application failover, not DSQL committing through a degraded quorum.

The both-regions-down case

There is a state most demos would quietly fake: both full regions unreachable at once. Quorum does not fake it. When no region can serve, the product says so, and it says the true thing, that committed data stays safe via the witness's journal and writes resume when a region recovers. The proofs that would write step aside rather than claim a commit that cannot happen. A coordination tool that lies about its own state in the failure case is worse than no tool, because it lies precisely when you are relying on it most.

Where the survival ends, for now

The same honesty applies to the architecture, not only the demo. What survives a region loss today is the data plane. DSQL's multi-region cluster keeps the incident record available and strongly consistent on the surviving side, with no data loss. The application tier does not yet match it: the Vercel functions and the ingestion and monitor Lambdas, as deployed, run in a single region, so a real loss of that region would take the serving layer down even though the data underneath it survives.

AWS names the fix directly, in its writeup on DSQL for global-scale financial transactions: pair an active-active data layer with active-active application tiers so the full stack absorbs a regional disruption. That is the next step here, deploy the functions and the Lambdas across regions so the serving layer fails over the way the data already does. The hard part, strongly consistent coordination state that does not fork across regions, is done. The remaining part is standard multi-region deployment. I would rather name that boundary than imply the whole stack already survives.


Break it yourself at https://quorum-h0.vercel.app: run a drill and watch it fail over, then take both regions down and see the honest state. The source and the full decision log are on GitHub at https://github.com/hocmemini/quorum.

This post was created for the purposes of entering the H0 "Hack the Zero Stack" hackathon. It is one of three: a companion post covers the event-sourced data model and optimistic concurrency, and a third covers how the system was built by directing an AI agent under an append-only decision log. #H0Hackathon