惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

GbyAI
GbyAI
T
Tenable Blog
Webroot Blog
Webroot Blog
L
Lohrmann on Cybersecurity
S
Securelist
S
Schneier on Security
NISL@THU
NISL@THU
Know Your Adversary
Know Your Adversary
C
Cybersecurity and Infrastructure Security Agency CISA
T
The Exploit Database - CXSecurity.com
L
LINUX DO - 热门话题
C
CXSECURITY Database RSS Feed - CXSecurity.com
O
OpenAI News
I
Intezer
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
TaoSecurity Blog
TaoSecurity Blog
S
Secure Thoughts
Application and Cybersecurity Blog
Application and Cybersecurity Blog
P
Privacy International News Feed
H
Hacker News: Front Page
N
Netflix TechBlog - Medium
M
MIT News - Artificial intelligence
博客园 - Franky
PCI Perspectives
PCI Perspectives
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
Microsoft Azure Blog
Microsoft Azure Blog
MongoDB | Blog
MongoDB | Blog
L
LangChain Blog
P
Proofpoint News Feed
S
Security Affairs
WordPress大学
WordPress大学
The Last Watchdog
The Last Watchdog
S
SegmentFault 最新的问题
小众软件
小众软件
F
Full Disclosure
博客园 - 叶小钗
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
T
The Blog of Author Tim Ferriss
Simon Willison's Weblog
Simon Willison's Weblog
P
Palo Alto Networks Blog
Security Latest
Security Latest
P
Proofpoint News Feed
月光博客
月光博客
T
Tailwind CSS Blog
Scott Helme
Scott Helme
Hacker News - Newest:
Hacker News - Newest: "LLM"
Google Online Security Blog
Google Online Security Blog
T
Threat Research - Cisco Blogs
Help Net Security
Help Net Security
Project Zero
Project Zero

DEV Community

Authentication Security Deep Dive: From Brute Force to Salted Hashing (With Java Examples) Why AI Systems Don’t Fail — They Drift Spilling beans for how i learn for exam😁"Reinforcement Learning Cheat Sheet" I Replaced Chrome with Safari for AI Browser Automation. Here's What Broke (and What Finally Worked) How Python Borrows Other People's Work The $40 Architecture: Processing 1 Billion API Requests with 99.99% Uptime Vibe Coding: A Workflow Guide (From Zero to SaaS) Most webhook security guides protect the wrong side. The scary part is delivery. Headless CMS for TanStack Start: Build a Blog with Cosmic EU Age Verification App "Hacked in 2 Minutes" — What Actually Happened Comfy Cloud’s delete function does not actually remove files Running AI Models on GPU Cloud Servers: A Beginner Guide Event-driven media intelligence with AWS Step Functions and Bedrock I scored 500 AI prompts across 8 quality dimensions — here's what broke How to Call Google Gemini API from Next.js (Free Tier, No Backend Needed) The Portal Protocol: Reclaiming Human Connection in the Age of AI How to Fix Your Team's Scattered Knowledge Problem With a Self-Hosted Forum Intro to tc Cloud Functors: A Graph-First Mental Model for the Modern Cloud Designing Multi-Tenant Backends With Both Ownership and Team Access I Built a Neumorphic CSS Library with 77+ Components — Here's What I Learned PostgreSQL Performance Optimization: Why Connection Pooling Is Critical at Scale Cómo construí un SaaS multi-rubro para gestionar expensas en Argentina con FastAPI + Vue 3 🚀 I Built an Ethical Hacking Scanner Tool – Open Source Project I Replaced /usage and /context in Claude Code With a Single Statusline A Pythonic Way to Handle Emails (IMAP/SMTP) with Auto-Discovery and AI-Ready Design I Collected 8.9 Million Polymarket Price Points — Here's What I Found About How Markets Really Move EcoTrack AI — Carbon Footprint Tracker & Dashboard Everyone's Using AI. No One Agrees How. 5 self-hosted ebook managers worth trying in 2026 Building Your First AI Agent with LangChain: From Chatbot to Autonomous Assistant Common SOC 2 Failures (Real World) Stop Vibe-Checking Your AI App: A Practical Guide to Evals How to Use SonarQube and SonarScanner Locally to Level Up Your Code Quality Your Next To-Do App Is Dead — I Replaced Mine with an OpenClaw AI Sign a Nostr event in 60 lines of Python using coincurve — no nostr-sdk, no nbxplorer, no rust toolchain ITGC Audit Explained Like You’re in Big 4 Patch Tuesday abril 2026: Microsoft parcha 163 vulnerabilidades y un zero-day en SharePoint Stop scraping everything: a better way to track competitor price changes Listing on MCPize + the Official MCP Registry while routing payments OUTSIDE the marketplace — how I kept 100% of my x402 revenue Building an AI-Powered Risk Intelligence System Using Serverless Architecture Why We Ripped Function Overloading Out of Our AI Toolchain Testing AI-Generated Code: How to Actually Know If It Works SaaS Churn Is Killing Your Business. Here Is What to Do About It (Without a Support Team) The Speed of AI Is No Longer Linear - And Self-Improving Models Are Why How to Implement RBAC for MCP Tools: A Practical Guide for Engineering Teams From Standard Quote to Persuasive Proposal: AI Automation for Arborists I built a CLI that scaffolds complete multi-tenant SaaS apps Axios CVE-2025–62718: The Silent SSRF Bug That Could Be Hiding in Your Node.js App Right Now The dashboard that ended our friendship Data Pipelines Explained Simply (and How to Build Them with Python) The Hidden Cost of AI Systems Nobody Talks About. undefined vs undeclared, and how typeof behaves Switching from file-based jobs to NATS/Kafka in Rust without changing code io_uring Adventures: Rust Servers That Love Syscalls Why Agentic AI is Killing the Traditional Database The POUR principles of web accessibility for developers and designers Quantum Neural Network 3D — A Deep Dive into Interactive WebGL Visualization How To Install Caveman In Codex On macOS And Windows Automation Pipeline Reliability: Why Your Workflow Breaks When Nobody Is Watching I Built an 'Open World' AI Coding Agent — It Works From ANY Folder From Freelancing to Product: A Tech Service Company's SaaS Transformation China's AI Giants: Adding Tencent Hunyuan & ByteDance Doubao to AI University (74 Providers) On the Vibe Coders and Their Lies clerk: Auto-Summarize Your Claude Code Sessions AI Weekly — 2026/04/10–04/17 | The Model Lockdown Is Here, but the Toolchain Is the Real Battleground AI 週報 — 2026/04/10–2026/04/17 模型封鎖潮來了,但工具鏈才是真戰場 Maybe this is how Open-Source apps are born... 🚀 Fine-Tune LLMs with LoRA and QLoRA: 2026 Guide tRPC v11 + Next.js App Router: End-to-End Type Safety Without the Boilerplate ShadCN UI in 2026: Why I Stopped Installing Component Libraries and Started Owning My Components SaaS Billing in React Server Components: Stripe + Supabase Without a Single `useEffect` Join our DEV Weekend Challenge — $1,000 in Prizes Across TEN winners! Submissions Due April 20 at 6:59 AM UTC. Implementing FSRS Spaced Repetition in Flutter + Supabase — Adding Memory Science to an AI Learning App "I Texted My Localhost From the Train — Claude Code Fixed the Bug Before I Got Home" I Built a Sales Prep AI and It Went Deeper Than Expected Design to Code #2: One JSON, Eleven Outputs Solving the 100M-Row Problem: A Summary Table Pattern for High-Volume Push Notification Logs Flutter Web With Wasm: What Actually Changes For Developers I Built 50 Royalty-Free Soundtracks for My Side Project in a Weekend Using AI Music Generation The Vibe Coding Security Checklist: 7 Things to Check Before You Ship Stop Letting Googlebot Guess Fix Your React App's SEO Right Desconstruindo o Streaming do LinkedIn: Como Criar um Engine de Extração de Vídeo de Alta Performance com HLS e FFmpeg (EDA Part-1) EDA (Exploratory Data Analysis) Explained With Real Life — Why Looking at Your Data Is the Most Important Step in Machine Learning Brand Relationship Management at Scale: Our 4-Touch Outreach System for 200+ Brands Why String.fromEnvironment() Might Return an Empty String in Dart JGuardrails 1.0.0 — Hardening Java LLM Apps Against Jailbreaks, Toxicity, and Prompt Injection Plan and Schedule a Full Week of Threads Content From One Claude Conversation Coding Cat Oran Ep3, Five Tables Changed Everything Updated: BFF Pattern I'm done watching freelancers get buried by 200 proposals. So I'm building the alternative. This is my first post BFS Algorithm in Java Step by Step Tutorial with Examples Tracking LLM Pricing Monthly: An Open Dataset for 22 AI Models How We Measure Content ROI on a Comparison Site: Revenue Attribution Without Perfect Data Introducing Nova AI Ops: The AI-Native Operating System for SRE Teams I built a free desktop video downloader for Windows — Grabbit How Talkie OCR Helps Vision-Impaired & Dyslexic Users Read the World Around Them VRCFaceTracking安装和iPhone面捕配置教程,有bug Even CrowdStrike Can't See Your Agents The Automation Gold Rush: What n8n Workflows and Claude Are Opening Up for Developers Right Now
Synthetic Monitoring Best Practices: What to Monitor and How Often
DevHelm · 2026-06-20 · via DEV Community

DevHelm

Most synthetic monitoring setups fail in one of a few predictable ways. They monitor everything and alert on nothing useful. They assert on status code 200 and miss the empty response body. They run flaky browser checks that page someone at 2 AM for a problem that fixed itself by 2:01. Or they go stale — the checkout flow changed three months ago and the check has been failing-then-being-ignored ever since.

These are not exotic failures. They are the default outcome of setting up synthetic monitoring without a discipline. Here is the discipline.

1. Monitor the journeys that cost money, not everything

Every browser check costs compute and, more importantly, costs maintenance. A check on a path that does not matter is worse than no check — it generates noise that trains your team to ignore alerts.

Rank your journeys by cost of silent failure and monitor the top of the list:

  • Authentication — login, signup. The gate to everything else.
  • The revenue path — checkout, upgrade, add payment method.
  • The core product action — the one thing your product exists to do.
  • Critical third-party handoffs — OAuth redirects, payment iframes, SSO.

Leave static pages, read-only endpoints, and admin screens to cheaper uptime and API checks. A good rule: if a path breaking would not generate a support ticket or lose revenue, it does not need a browser check.

2. Assert on what the user sees, not just the status code

The entire point of synthetic monitoring is catching the failure that a 200 OK hides. So your assertions have to go past the status code.

// Weak: passes even when the page renders an error
await page.goto("https://shop.example.com/checkout");
expect(page.url()).toContain("/checkout");

// Strong: asserts the user can actually complete the action
await page.getByRole("button", { name: "Pay now" }).click();
await expect(page.getByText("Order confirmed")).toBeVisible({
  timeout: 10000,
});
await expect(page.getByTestId("order-number")).not.toBeEmpty();

For API checks, the same principle applies: assert on the response body and JSON paths, not just the code. Check that data.user.role equals "admin", that the array is non-empty, that the token is present. A status code tells you the server answered; an assertion tells you it answered correctly.

3. Set the interval to your tolerance for silent failure

Your check interval is your worst-case detection latency. A 5-minute interval means a broken deploy can bleed for five minutes before anything notices. For revenue-critical journeys, 30 seconds is the standard.

But faster is not automatically better, because interval drives cost. A browser check every 30 seconds from three regions is roughly 259,200 runs per month — for one check. On metered pricing that is real money, and a misconfigured 10-second check can produce a surprise four-figure bill. Match the interval to the journey: 30 seconds for the money path, 1–5 minutes for secondary flows, and reserve sub-30-second intervals for the handful of checks where every second of downtime is quantifiably expensive.

4. Run checks from multiple regions

Failures are often regional. A CDN edge certificate expires in one region; DNS propagates unevenly; a deploy rolls out zone by zone; an SSL chain is misconfigured on one edge. A single-origin check is blind to all of these.

Run each critical check from at least two or three regions that match where your users are. Multi-region also disambiguates incidents: if a check fails from one region but passes from the others, you have a regional problem, not a global outage — a distinction that changes who you wake up and how hard you panic.

5. Engineer out flakiness before it trains your team to ignore alerts

A flaky check is worse than no check, because it teaches your team that the alert is noise. The three biggest sources of flakiness and their fixes:

  • Hard waits. Never waitForTimeout(3000). Wait for a condition — an element visible, a network response received, a URL reached. Conditional waits adapt to real timing; fixed sleeps race against it.
  • Single-sample failures. A genuine 30-second blip should not page anyone. Use confirm-on-failure: when a check fails, immediately re-run it (ideally from another region) before declaring an incident. This collapses the vast majority of transient false positives without adding latency to real outages.
  • Shared mutable state. Two checks that log in as the same user and mutate the same cart will trip over each other. Give each check its own isolated test account and idempotent steps.

6. Keep checks as code, in version control

Synthetic checks are infrastructure, and infrastructure that lives only in a vendor's web UI rots. Define your checks as code — a Playwright spec, a YAML config — committed to your repository alongside the application they test.

The payoff is concrete: when a developer changes the checkout button's label, the check that depends on it is right there in the same pull request, so it gets updated in the same change instead of silently breaking in production. Config-as-code also gives you code review, history, and the ability to recreate your entire monitoring setup from scratch. This is the same monitoring-as-code discipline that keeps the rest of your reliability tooling honest.

7. Use test data safely

Synthetic checks run against production, repeatedly, forever. That has consequences:

  • Use dedicated synthetic accounts, never a real customer's. Tag them so they are excluded from analytics and billing.
  • Make steps idempotent or self-cleaning. A checkout check that creates a real order every 30 seconds will pollute your data and possibly charge a real card. Use a test payment token and a path that does not commit real state, or clean up after each run.
  • Never hard-code real secrets in a check. Use the platform's secret storage; a check definition in Git must not leak credentials.

8. Route alerts by severity and correlate with dependencies

Not every failed check deserves the same response. A failed checkout check is a wake-someone-up event; a failed check on a secondary report page is a business-hours ticket. Map check severity to routing so the right alerts reach the right channels — and tie it to your incident severity levels so the response is consistent.

Then correlate. A checkout check that fails because Stripe is degraded is a vendor incident, not your bug. Grouping dependent checks and subscribing to the relevant vendor status feeds means a third-party outage shows up next to your failing checks, so you spend the first five minutes fixing instead of diagnosing whose fault it is. That correlation is the difference between a low MTTR and a long one.

9. Treat checks as living code — they rot

The single most common failure of a mature synthetic setup is staleness. The product changes; the check does not; the check starts failing; someone mutes it "temporarily"; six weeks later the journey is genuinely broken and the muted check never said a word.

Prevent it with the same hygiene you apply to tests: review checks when the flow they cover changes, fail loudly rather than allowing silent mutes, and periodically audit which checks have been red-and-ignored. A check you do not trust is a check you do not have.

Start with the foundation

Best practices compound from the bottom up: get your endpoint and uptime coverage right first — multi-region, real assertions, severity-routed alerts — then layer browser journeys on top. For tool selection see the best synthetic monitoring tools in 2026, and for turning an existing test suite into monitors see Playwright monitoring.

Set up multi-region uptime and API checks with config-as-code, severity-based alert routing, and a status page that updates from the same data at app.devhelm.io — your first monitor is live in about 60 seconds, no credit card.


Originally published on DevHelm.