惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

H
Help Net Security
J
Java Code Geeks
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
H
Hackread – Cybersecurity News, Data Breaches, AI and More
V
Visual Studio Blog
G
Google Developers Blog
V
V2EX
The Register - Security
The Register - Security
博客园 - 三生石上(FineUI控件)
云风的 BLOG
云风的 BLOG
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
博客园_首页
S
SegmentFault 最新的问题
博客园 - Franky
Martin Fowler
Martin Fowler
Stack Overflow Blog
Stack Overflow Blog
A
About on SuperTechFans
人人都是产品经理
人人都是产品经理
aimingoo的专栏
aimingoo的专栏
罗磊的独立博客
C
Check Point Blog
MyScale Blog
MyScale Blog
T
The Blog of Author Tim Ferriss
MongoDB | Blog
MongoDB | Blog
The GitHub Blog
The GitHub Blog
Last Week in AI
Last Week in AI
Microsoft Azure Blog
Microsoft Azure Blog
IT之家
IT之家
F
Fortinet All Blogs
Jina AI
Jina AI
P
Proofpoint News Feed
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
阮一峰的网络日志
阮一峰的网络日志
B
Blog
L
LangChain Blog
月光博客
月光博客
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
宝玉的分享
宝玉的分享
博客园 - 【当耐特】
T
Tailwind CSS Blog
酷 壳 – CoolShell
酷 壳 – CoolShell
Microsoft Security Blog
Microsoft Security Blog
WordPress大学
WordPress大学
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
B
Blog RSS Feed
博客园 - 聂微东
Hugging Face - Blog
Hugging Face - Blog
M
MIT News - Artificial intelligence
GbyAI
GbyAI

DEV Community

Authentication Security Deep Dive: From Brute Force to Salted Hashing (With Java Examples) Why AI Systems Don’t Fail — They Drift Spilling beans for how i learn for exam😁"Reinforcement Learning Cheat Sheet" I Replaced Chrome with Safari for AI Browser Automation. Here's What Broke (and What Finally Worked) How Python Borrows Other People's Work The $40 Architecture: Processing 1 Billion API Requests with 99.99% Uptime Vibe Coding: A Workflow Guide (From Zero to SaaS) Most webhook security guides protect the wrong side. The scary part is delivery. Headless CMS for TanStack Start: Build a Blog with Cosmic EU Age Verification App "Hacked in 2 Minutes" — What Actually Happened Comfy Cloud’s delete function does not actually remove files Running AI Models on GPU Cloud Servers: A Beginner Guide Event-driven media intelligence with AWS Step Functions and Bedrock I scored 500 AI prompts across 8 quality dimensions — here's what broke How to Call Google Gemini API from Next.js (Free Tier, No Backend Needed) The Portal Protocol: Reclaiming Human Connection in the Age of AI How to Fix Your Team's Scattered Knowledge Problem With a Self-Hosted Forum Intro to tc Cloud Functors: A Graph-First Mental Model for the Modern Cloud Designing Multi-Tenant Backends With Both Ownership and Team Access I Built a Neumorphic CSS Library with 77+ Components — Here's What I Learned PostgreSQL Performance Optimization: Why Connection Pooling Is Critical at Scale Cómo construí un SaaS multi-rubro para gestionar expensas en Argentina con FastAPI + Vue 3 🚀 I Built an Ethical Hacking Scanner Tool – Open Source Project I Replaced /usage and /context in Claude Code With a Single Statusline A Pythonic Way to Handle Emails (IMAP/SMTP) with Auto-Discovery and AI-Ready Design I Collected 8.9 Million Polymarket Price Points — Here's What I Found About How Markets Really Move EcoTrack AI — Carbon Footprint Tracker & Dashboard Everyone's Using AI. No One Agrees How. 5 self-hosted ebook managers worth trying in 2026 Building Your First AI Agent with LangChain: From Chatbot to Autonomous Assistant Common SOC 2 Failures (Real World) Stop Vibe-Checking Your AI App: A Practical Guide to Evals How to Use SonarQube and SonarScanner Locally to Level Up Your Code Quality Your Next To-Do App Is Dead — I Replaced Mine with an OpenClaw AI Sign a Nostr event in 60 lines of Python using coincurve — no nostr-sdk, no nbxplorer, no rust toolchain ITGC Audit Explained Like You’re in Big 4 Patch Tuesday abril 2026: Microsoft parcha 163 vulnerabilidades y un zero-day en SharePoint Stop scraping everything: a better way to track competitor price changes Listing on MCPize + the Official MCP Registry while routing payments OUTSIDE the marketplace — how I kept 100% of my x402 revenue Building an AI-Powered Risk Intelligence System Using Serverless Architecture Why We Ripped Function Overloading Out of Our AI Toolchain Testing AI-Generated Code: How to Actually Know If It Works SaaS Churn Is Killing Your Business. Here Is What to Do About It (Without a Support Team) The Speed of AI Is No Longer Linear - And Self-Improving Models Are Why How to Implement RBAC for MCP Tools: A Practical Guide for Engineering Teams From Standard Quote to Persuasive Proposal: AI Automation for Arborists I built a CLI that scaffolds complete multi-tenant SaaS apps Axios CVE-2025–62718: The Silent SSRF Bug That Could Be Hiding in Your Node.js App Right Now The dashboard that ended our friendship Data Pipelines Explained Simply (and How to Build Them with Python) The Hidden Cost of AI Systems Nobody Talks About. undefined vs undeclared, and how typeof behaves Switching from file-based jobs to NATS/Kafka in Rust without changing code io_uring Adventures: Rust Servers That Love Syscalls Why Agentic AI is Killing the Traditional Database The POUR principles of web accessibility for developers and designers Quantum Neural Network 3D — A Deep Dive into Interactive WebGL Visualization How To Install Caveman In Codex On macOS And Windows Automation Pipeline Reliability: Why Your Workflow Breaks When Nobody Is Watching I Built an 'Open World' AI Coding Agent — It Works From ANY Folder From Freelancing to Product: A Tech Service Company's SaaS Transformation China's AI Giants: Adding Tencent Hunyuan & ByteDance Doubao to AI University (74 Providers) On the Vibe Coders and Their Lies clerk: Auto-Summarize Your Claude Code Sessions AI Weekly — 2026/04/10–04/17 | The Model Lockdown Is Here, but the Toolchain Is the Real Battleground AI 週報 — 2026/04/10–2026/04/17 模型封鎖潮來了,但工具鏈才是真戰場 Maybe this is how Open-Source apps are born... 🚀 Fine-Tune LLMs with LoRA and QLoRA: 2026 Guide tRPC v11 + Next.js App Router: End-to-End Type Safety Without the Boilerplate ShadCN UI in 2026: Why I Stopped Installing Component Libraries and Started Owning My Components SaaS Billing in React Server Components: Stripe + Supabase Without a Single `useEffect` Join our DEV Weekend Challenge — $1,000 in Prizes Across TEN winners! Submissions Due April 20 at 6:59 AM UTC. Implementing FSRS Spaced Repetition in Flutter + Supabase — Adding Memory Science to an AI Learning App "I Texted My Localhost From the Train — Claude Code Fixed the Bug Before I Got Home" I Built a Sales Prep AI and It Went Deeper Than Expected Design to Code #2: One JSON, Eleven Outputs Solving the 100M-Row Problem: A Summary Table Pattern for High-Volume Push Notification Logs Flutter Web With Wasm: What Actually Changes For Developers I Built 50 Royalty-Free Soundtracks for My Side Project in a Weekend Using AI Music Generation The Vibe Coding Security Checklist: 7 Things to Check Before You Ship Stop Letting Googlebot Guess Fix Your React App's SEO Right Desconstruindo o Streaming do LinkedIn: Como Criar um Engine de Extração de Vídeo de Alta Performance com HLS e FFmpeg (EDA Part-1) EDA (Exploratory Data Analysis) Explained With Real Life — Why Looking at Your Data Is the Most Important Step in Machine Learning Brand Relationship Management at Scale: Our 4-Touch Outreach System for 200+ Brands Why String.fromEnvironment() Might Return an Empty String in Dart JGuardrails 1.0.0 — Hardening Java LLM Apps Against Jailbreaks, Toxicity, and Prompt Injection Plan and Schedule a Full Week of Threads Content From One Claude Conversation Coding Cat Oran Ep3, Five Tables Changed Everything Updated: BFF Pattern I'm done watching freelancers get buried by 200 proposals. So I'm building the alternative. This is my first post BFS Algorithm in Java Step by Step Tutorial with Examples Tracking LLM Pricing Monthly: An Open Dataset for 22 AI Models How We Measure Content ROI on a Comparison Site: Revenue Attribution Without Perfect Data Introducing Nova AI Ops: The AI-Native Operating System for SRE Teams I built a free desktop video downloader for Windows — Grabbit How Talkie OCR Helps Vision-Impaired & Dyslexic Users Read the World Around Them VRCFaceTracking安装和iPhone面捕配置教程,有bug Even CrowdStrike Can't See Your Agents The Automation Gold Rush: What n8n Workflows and Claude Are Opening Up for Developers Right Now
Your Runbook Is Written. Nobody Runs It.
Jono Herring · 2026-05-14 · via DEV Community

The runbook was pinned. Owners in the footer. Audit passed. Then a routine change hits, the on-call opens the page, step four does not match what production actually is, and the sane move is to close the tab and do what the last hero did ... because nobody has treated "bring the doc back to truth" as release-critical work. That is not a missing wiki problem. That is a trust problem wearing a Confluence costume. The link is green. The organization is lying to itself in polite font.

The obvious failure is nobody writes it down after the firefight. Embarrassing. Loud. You can fix it with shame.

The dangerous failure is somebody did write it down, the template looks healthy, leadership pastes a URL like they just saved the company ... and the team learns the real rule anyway. The real rule is the DM thread. The real rule is whoever was on call last time. The real rule is whatever gets the release out before the executive thread wakes up again. Paper compliance feels adult. It is still a lie, and you have sat in a review where everyone nods while knowing it.


I used to think reliability was mostly capture. Turns out capture is the easy half. The hard half is enforcement when nobody is clapping.

Midnight in the incident channel ... everyone is a saint. Tuesday at 2 PM is where your religion gets tested. The sprint is tight. A PM is asking for a small exception. The runbook update is fifteen minutes nobody thinks they have. Nobody is confused about what gets rewarded in that moment. Skip the doc. Ship the thing. Be the hero who unblocks.

The incident thread gets executive attention. The runbook ticket floats. Everyone agrees it matters in principle. Nobody loses their job when it slips in practice. So the system does what systems always do when behavior is not backed by consequence. It stores risk in humans again. Then one person goes on PTO and the whole room learns what "distributed" never meant.

A runbook nobody runs is a liability dressed up as maturity.

Half a bridge

Documentation answers a shallow question ... can we describe how this works? A behavior contract answers the harder one ... will we behave the same way next week when nobody is watching?

If your reliability strategy stops at documentation, you built half a bridge. The other half is what leadership treats as non-negotiable during normal work, not only during incident theater. That is why I care whether decision records ever leave the doc and why I separate real debt from strong opinions dressed up as risk. Words that do not change behavior are expensive storytelling. Words linked to gates, owners, and verification change behavior.

Behavior contracts show up in boring places. A release checklist that cannot be marked done without a link to the updated runbook section. A named owner who verifies steps in staging (not only in a postmortem doc). A leadership review that asks for the diff in the runbook the same way it asks for the diff in code. If those are missing, your runbook is not operational memory. It is a comfort object you hug when auditors show up. I say that with love. I have hugged a few.

Same tax, slower burn

If you want the human shape of this pattern, it is the same Brent-shaped week where one person on PTO can stall three lanes before lunch. Heroics hide missing operational memory. Runbooks nobody runs are the slower version of the same tax. You do not feel it until the strong engineer is out and the org discovers it has been mistaking access to a URL for distributed capability.

Here is the social dynamic that quietly murders you. The firefight earns status because it is visible and it is fast and it makes everybody feel like something important happened. Prevention earns patience, and patience is always the first thing that gets cut when the roadmap shows up with a quarter-shaped deadline.

So you get this weird split. The outage gets the war room energy. The runbook follow-up gets a ticket that sits there while people "get back to real work." Leadership can praise mitigation in a standup because mitigation is a clean story. Recurrence work is messier, so it never wins a real slot on anyone's schedule, and after a while the org starts saying "we learned a lot" like that phrase is magic. If nothing in the system changed, you did not learn. You narrated. And the next release is not obligated to applaud your narrative.

The confession I owe

Earlier in my career, I fed this beast without meaning to.

Incident response felt like leadership because it felt alive. I showed up in channels. I asked good questions. I also rewarded velocity in planning forums in ways that made the acceptable answer obvious. The team optimized for what I treated as urgent. Prevention work is rarely urgent until it is catastrophic, so it kept getting deferred, and deferred work always looks rational until the bill arrives.

The ugly truth is I confused presence with leadership. I confused motion with maturity.

The fix is not guilt. The fix is changing what gets treated as incomplete.

What actually moved the needle

The shift that moved behavior for us was embarrassingly small on paper and expensive in discipline. We stopped treating incident closure as a single state. Mitigated meant customer impact was controlled. Closed meant prevention was verified and shipped. Most teams collapse those into one checkbox, and that is where follow-through dies. The ticket closes when the pager quiets down, the harder prevention work becomes a "later" task, and later rarely survives the next priority wave.

Splitting those states forced the conversation nobody wants in the moment and everybody wants in retrospect. No one could claim completion without prevention evidence. Release leads could see open risk in plain language. Leadership reviews stopped rewarding fast mitigation alone.

If you want every step on the habit side, start with mitigated versus actually closed. Reliability is not a documentation problem. It is a courage problem about what you enforce when the room is tired ... and who gets embarrassed when you say "we are not done."

Twenty minutes before you ship

If you want something practical that does not require a new platform, run a twenty-minute reliability follow-through review before release. Ask what failed last time in a way that could repeat, what changed in the runbook and in ownership, which action is still open, and whether this release should be gated until that is done.

If you never ask the gating question, you are doing ceremony. If you ask it and never hold the line, you are doing theater. Theater is expensive. It buys calm meetings and expensive weekends.

Speed without recurrence control is fake speed. You are borrowing from future stability, and the interest rate is ugly.

If you only visit reliability in a deck

If you are a director or VP who has not personally verified that your teams run their operational docs under normal load, not only during audits, you do not get to claim "we are mature about reliability." Maturity is what happens when the boring step survives contact with a busy Tuesday.

This is the same posture as staying close enough to the system that your judgment is not theoretical. I still read pull requests to understand how teams think across regions. I still care what happens after incidents because that is where culture becomes real. If your only relationship to reliability is a dashboard and a quarterly review, you are not managing reliability. You are managing narrative. Narrative does not reboot production.

If the same failure can happen twice, treat the second time like a leadership signal, not bad luck dressed up as engineering mystery.

What I measure when I do not trust vibes

Pick thirty days and track three numbers without drama. Repeat incidents by service area, percent of incidents with runbook updates completed before the next release, and percent of follow-through actions closed by due date. If repeat rate is flat and closure rate is low, your incident program is still response-heavy. If repeat rate drops while closure rate rises, your leadership habits are improving, and reliability is finally compounding. Numbers do not replace judgment. They stop leadership from lying to itself with heroic language.

Fix the foundation before you polish the toy

If your team is debating another model wrapper while the foundation squeaks, fix the system before you polish the prompt. AI will happily help you ship more change into a brittle operational surface. It will not volunteer to be the adult in the room when your runbook is ornamental. The problems have not changed, only the marketing around them ... and "more intelligence" does not substitute for "we agreed how we behave when nobody is watching."

When the humans align first

My team learned a version of this the hard way during the AI tooling wave, before we deserved to talk about agents like adults. We rolled tooling out, pull requests got bigger, review time did not magically grow to match, and the codebase started collecting inconsistent patterns like a junk drawer with a CI/CD pipeline. That is not a metaphor I use for cuteness. It is what it feels like when output accelerates faster than shared taste.

The fix was not a lecture. It was boring alignment work ... patterns, logging expectations, error handling defaults, architecture notes in markdown, the kind of stuff nobody wants to present at a town hall. Once humans agreed what "good" meant in our repo, the tools had something to amplify besides chaos. Guardrails stopped being a vibe and became something you could actually review. If you skip that step, your runbook is not the only ornamental artifact in the building. Your whole engineering culture is.

This week

Link the runbook to the release gate, or stop pretending it is operational memory. Borrow the habit that mattered when we split mitigated from actually closed. If critical context still lives in one person's head, your runbook is probably theater no matter how pretty the wiki is.

You do not need more templates. You need fewer stories you tell yourself while the next on-call discovers the sanctioned path and the real path are not the same thing.

So pick the fight you have been avoiding. The boring one. The one that does not look good on LinkedIn. That is the one that keeps you out of the ditch.


One email a week from The Builder's Leader. The frameworks, the blind spots, and the conversations most leaders avoid. Subscribe for free.