惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

博客园 - 司徒正美
大猫的无限游戏
大猫的无限游戏
Scott Helme
Scott Helme
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
S
Secure Thoughts
Google DeepMind News
Google DeepMind News
博客园_首页
Hacker News: Ask HN
Hacker News: Ask HN
量子位
Jina AI
Jina AI
I
InfoQ
V
V2EX
Martin Fowler
Martin Fowler
Y
Y Combinator Blog
H
Hackread – Cybersecurity News, Data Breaches, AI and More
人人都是产品经理
人人都是产品经理
B
Blog
IT之家
IT之家
云风的 BLOG
云风的 BLOG
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
博客园 - Franky
博客园 - 【当耐特】
N
Netflix TechBlog - Medium
Cloudbric
Cloudbric
H
Heimdal Security Blog
TaoSecurity Blog
TaoSecurity Blog
S
Security @ Cisco Blogs
U
Unit 42
Project Zero
Project Zero
Webroot Blog
Webroot Blog
The Register - Security
The Register - Security
N
News | PayPal Newsroom
Microsoft Security Blog
Microsoft Security Blog
H
Help Net Security
Forbes - Security
Forbes - Security
宝玉的分享
宝玉的分享
Last Week in AI
Last Week in AI
C
Check Point Blog
博客园 - 聂微东
M
MIT News - Artificial intelligence
有赞技术团队
有赞技术团队
D
DataBreaches.Net
Cyberwarzone
Cyberwarzone
N
News and Events Feed by Topic
N
News and Events Feed by Topic
Simon Willison's Weblog
Simon Willison's Weblog
J
Java Code Geeks
G
Google Developers Blog
GbyAI
GbyAI
T
Threatpost

DEV Community

Authentication Security Deep Dive: From Brute Force to Salted Hashing (With Java Examples) Why AI Systems Don’t Fail — They Drift Spilling beans for how i learn for exam😁"Reinforcement Learning Cheat Sheet" I Replaced Chrome with Safari for AI Browser Automation. Here's What Broke (and What Finally Worked) How Python Borrows Other People's Work The $40 Architecture: Processing 1 Billion API Requests with 99.99% Uptime Vibe Coding: A Workflow Guide (From Zero to SaaS) Most webhook security guides protect the wrong side. The scary part is delivery. Headless CMS for TanStack Start: Build a Blog with Cosmic EU Age Verification App "Hacked in 2 Minutes" — What Actually Happened Comfy Cloud’s delete function does not actually remove files Running AI Models on GPU Cloud Servers: A Beginner Guide Event-driven media intelligence with AWS Step Functions and Bedrock I scored 500 AI prompts across 8 quality dimensions — here's what broke How to Call Google Gemini API from Next.js (Free Tier, No Backend Needed) The Portal Protocol: Reclaiming Human Connection in the Age of AI How to Fix Your Team's Scattered Knowledge Problem With a Self-Hosted Forum Intro to tc Cloud Functors: A Graph-First Mental Model for the Modern Cloud Designing Multi-Tenant Backends With Both Ownership and Team Access I Built a Neumorphic CSS Library with 77+ Components — Here's What I Learned PostgreSQL Performance Optimization: Why Connection Pooling Is Critical at Scale Cómo construí un SaaS multi-rubro para gestionar expensas en Argentina con FastAPI + Vue 3 🚀 I Built an Ethical Hacking Scanner Tool – Open Source Project I Replaced /usage and /context in Claude Code With a Single Statusline A Pythonic Way to Handle Emails (IMAP/SMTP) with Auto-Discovery and AI-Ready Design I Collected 8.9 Million Polymarket Price Points — Here's What I Found About How Markets Really Move EcoTrack AI — Carbon Footprint Tracker & Dashboard Everyone's Using AI. No One Agrees How. 5 self-hosted ebook managers worth trying in 2026 Building Your First AI Agent with LangChain: From Chatbot to Autonomous Assistant Common SOC 2 Failures (Real World) Stop Vibe-Checking Your AI App: A Practical Guide to Evals How to Use SonarQube and SonarScanner Locally to Level Up Your Code Quality Your Next To-Do App Is Dead — I Replaced Mine with an OpenClaw AI Sign a Nostr event in 60 lines of Python using coincurve — no nostr-sdk, no nbxplorer, no rust toolchain ITGC Audit Explained Like You’re in Big 4 Patch Tuesday abril 2026: Microsoft parcha 163 vulnerabilidades y un zero-day en SharePoint Stop scraping everything: a better way to track competitor price changes Listing on MCPize + the Official MCP Registry while routing payments OUTSIDE the marketplace — how I kept 100% of my x402 revenue Building an AI-Powered Risk Intelligence System Using Serverless Architecture Why We Ripped Function Overloading Out of Our AI Toolchain Testing AI-Generated Code: How to Actually Know If It Works SaaS Churn Is Killing Your Business. Here Is What to Do About It (Without a Support Team) The Speed of AI Is No Longer Linear - And Self-Improving Models Are Why How to Implement RBAC for MCP Tools: A Practical Guide for Engineering Teams From Standard Quote to Persuasive Proposal: AI Automation for Arborists I built a CLI that scaffolds complete multi-tenant SaaS apps Axios CVE-2025–62718: The Silent SSRF Bug That Could Be Hiding in Your Node.js App Right Now The dashboard that ended our friendship Data Pipelines Explained Simply (and How to Build Them with Python) The Hidden Cost of AI Systems Nobody Talks About. undefined vs undeclared, and how typeof behaves Switching from file-based jobs to NATS/Kafka in Rust without changing code io_uring Adventures: Rust Servers That Love Syscalls Why Agentic AI is Killing the Traditional Database The POUR principles of web accessibility for developers and designers Quantum Neural Network 3D — A Deep Dive into Interactive WebGL Visualization How To Install Caveman In Codex On macOS And Windows Automation Pipeline Reliability: Why Your Workflow Breaks When Nobody Is Watching I Built an 'Open World' AI Coding Agent — It Works From ANY Folder From Freelancing to Product: A Tech Service Company's SaaS Transformation China's AI Giants: Adding Tencent Hunyuan & ByteDance Doubao to AI University (74 Providers) On the Vibe Coders and Their Lies clerk: Auto-Summarize Your Claude Code Sessions AI Weekly — 2026/04/10–04/17 | The Model Lockdown Is Here, but the Toolchain Is the Real Battleground AI 週報 — 2026/04/10–2026/04/17 模型封鎖潮來了,但工具鏈才是真戰場 Maybe this is how Open-Source apps are born... 🚀 Fine-Tune LLMs with LoRA and QLoRA: 2026 Guide tRPC v11 + Next.js App Router: End-to-End Type Safety Without the Boilerplate ShadCN UI in 2026: Why I Stopped Installing Component Libraries and Started Owning My Components SaaS Billing in React Server Components: Stripe + Supabase Without a Single `useEffect` Join our DEV Weekend Challenge — $1,000 in Prizes Across TEN winners! Submissions Due April 20 at 6:59 AM UTC. Implementing FSRS Spaced Repetition in Flutter + Supabase — Adding Memory Science to an AI Learning App "I Texted My Localhost From the Train — Claude Code Fixed the Bug Before I Got Home" I Built a Sales Prep AI and It Went Deeper Than Expected Design to Code #2: One JSON, Eleven Outputs Solving the 100M-Row Problem: A Summary Table Pattern for High-Volume Push Notification Logs Flutter Web With Wasm: What Actually Changes For Developers I Built 50 Royalty-Free Soundtracks for My Side Project in a Weekend Using AI Music Generation The Vibe Coding Security Checklist: 7 Things to Check Before You Ship Stop Letting Googlebot Guess Fix Your React App's SEO Right Desconstruindo o Streaming do LinkedIn: Como Criar um Engine de Extração de Vídeo de Alta Performance com HLS e FFmpeg (EDA Part-1) EDA (Exploratory Data Analysis) Explained With Real Life — Why Looking at Your Data Is the Most Important Step in Machine Learning Brand Relationship Management at Scale: Our 4-Touch Outreach System for 200+ Brands Why String.fromEnvironment() Might Return an Empty String in Dart JGuardrails 1.0.0 — Hardening Java LLM Apps Against Jailbreaks, Toxicity, and Prompt Injection Plan and Schedule a Full Week of Threads Content From One Claude Conversation Coding Cat Oran Ep3, Five Tables Changed Everything Updated: BFF Pattern I'm done watching freelancers get buried by 200 proposals. So I'm building the alternative. This is my first post BFS Algorithm in Java Step by Step Tutorial with Examples Tracking LLM Pricing Monthly: An Open Dataset for 22 AI Models How We Measure Content ROI on a Comparison Site: Revenue Attribution Without Perfect Data Introducing Nova AI Ops: The AI-Native Operating System for SRE Teams I built a free desktop video downloader for Windows — Grabbit How Talkie OCR Helps Vision-Impaired & Dyslexic Users Read the World Around Them VRCFaceTracking安装和iPhone面捕配置教程,有bug Even CrowdStrike Can't See Your Agents The Automation Gold Rush: What n8n Workflows and Claude Are Opening Up for Developers Right Now
Our VP Said AI Would Test Itself. I Raised My Hand. I Got Reassigned. Day 3 Cost $2.8M. I Had the Screenshots Ready.
xulingfeng · 2026-06-07 · via DEV Community

Based on real software development trends. About a VP of Engineering who believed AI would verify its own output, 47 TODOs that shipped to production, and a $2,800,000 discount calculation error that nobody caught.

This story is based on a submission from a community member. If you have a similar story or something you need to get off your chest — reach out. The next one could be yours.


Act 1 · The Tech Meeting

"Starting today — no more hand-written code."

Marcus, the new VP of Engineering, put a slide up on the big screen.

Four words: WRITING BY HAND IS OVER.

I was sitting in the back row, against the wall. Seven years at this company. Three core modules that I'd built from scratch. Two production systems that ran the company's primary revenue stream. Now someone was telling me — don't write anymore.

The room went quiet for about five seconds. Then people started whispering. Someone pulled out a phone and took a picture of the slide.

Marcus added: "AI coding isn't optional — it's a mandatory development standard. We benchmarked this. AI writes code 400% faster than humans. Anyone still typing manually is wasting the company's time."

I raised my hand.

"Who reviews the code?"

"AI reviews it."

"Who writes the tests?"

"AI tests itself."

"What if AI writes something wrong?"

Marcus laughed. Not a polite laugh. The kind of laugh you give someone whose question you've already decided doesn't matter.

"Let me ask you something."

He paused.

"Do you really think — you, one person — have more training data than Orion-7? "

People started laughing. Not supportive laughter. Pile-on laughter.

"Or do you think the world's AI companies — hundreds of billions in investment, tens of thousands of GPUs — built something that's less reliable than one backend developer?"

Nobody was looking at me anymore. Everyone was watching him, waiting for the kill shot.

He didn't take it. He just smiled. "Starting next sprint, it's AI across the board. Anyone who has concerns — my door's open."


Act 2 · The Reassignment

I didn't go to his door.

HR notified me the next day: I was being moved to the Legacy Systems team.

Three people. Three projects that hadn't been touched in five years. No new requirements. No QA handoffs. No deadlines — because nobody used the software.

I handed off my work and sat down at my new desk.

A beat-up laptop sat there. Screen cracked in the bottom-right corner.

I opened my email — three new messages. All from the repo: my write access had been revoked on all three legacy repositories. Review-only.

I went to find Marcus.

"About the repo permissions —"

"Oh, right." He looked like he'd been waiting for me. "New system code runs under the new policy. You're not familiar with the AI workflow, so I set you up with read-only access to the legacy repos."

He leaned back in his chair.

"Oh, and I cancelled your Copilot license. Those three legacy projects — not worth the GPU cycles for AI anyway. Write it by hand. You don't trust AI anyway."

He smiled. He knew exactly what that line would do. He was waiting to see if I'd say something.

I didn't.

I went back to my desk, opened the cracked laptop, connected to the internal network, and cloned the new system's main repo. Public read-only access. He'd forgotten to lock it down.


Act 3 · Building a Case

The new team delivered the order module in two weeks.

The CEO posted in the company-wide Slack channel:

"New order module is live. AI-generated code end-to-end. Zero-defect delivery. Let's give the team a round of applause."

The channel filled with clapping emojis. Someone @'d Marcus: "That's insane efficiency."

I was in the channel. I didn't say anything.

What the CEO called "live" wasn't a full cutover. It was a canary — the new module running alongside the old system. Low traffic. Simple orders. Nothing that would hit an edge case.

But the code was sitting on the server.

I opened a terminal in the new system's code directory and ran one line:

pytest tests/ --tb=short -q

Output:

..............F.......FFF......
====== 8 failed, 15 passed in 2.87s ======

15 passed, 8 failed.

I ran another line:

grep -rn "TODO" src/ | wc -l

47.

One more — I hit the API:

curl -s https://new-api.xxx.com/v2/order/123456 | jq '.discount_detail'

Response:

{
  "order_id": "ORD-2026-0614-0123456",
  "subtotal": 12500.00,
  "line_items": [
    {"sku": "A-200", "qty": 10, "price": 800.00},
    {"sku": "B-150", "qty": 5, "price": 900.00}
  ],
  "discounts": [
    {"type": "promo_early", "amount": 500.00},
    {"type": "member_gold", "amount": 1875.00},
    {"type": "volume_tier_3", "amount": 1250.00}
  ],
  "discount_detail": null
}

The API docs said: "discount_detail: object, required. Contains breakdown of all applied discounts." It returned null.

I stared at the screen for thirty seconds.

Then I created a folder.

Called it evidence/.

I opened my email. Wrote to Marcus with the subject "New Order Module — Test Failures Found." Attached the pytest output, the TODO count, and the null response.

He didn't reply.

For the next month, nobody touched that code. Marcus was preparing the Q3 board deck. The AI team was writing prompts for the next module. And me — I was on Legacy Systems. My job was to not touch anything.


Act 4 · The Glory

The new order module ran canary for a month. Low traffic. Simple orders. Zero incidents.

The CEO was happy. Marcus asked for a product demo.

The screen in the conference room showed a perfect happy path. Clean data. Discount calculations stacking correctly. Coupons applying. One-click ordering. Everything worked.

The CEO said: "You're two months ahead of my timeline."

Marcus smiled.

"Because I used AI. Some people are still writing code by hand. You can see the difference, right?"

He glanced at the back row.

I was in the back row. What that look meant — everyone in the room knew.

I stood up.

"The module — "

Marcus cut me off.

"— has a problem? Zero incidents in canary. Full green on the test dashboard. You've been on legacy — have you even looked at the new code?"

The CEO glanced at me. Then turned back.

"Marcus — run through the full Q3 cutover plan again."

I stood there for maybe three seconds. Nobody looked at me. Nobody asked if I had more to say.

I sat down.

The CEO made the call: "Full Q3 cutover. Start sunsetting the legacy systems."

Marcus got his scope expanded — he now owned the entire engineering org. No more CTO sign-off needed.

People patted him on the shoulder on the way out. "Well deserved." He said thanks. He walked right past me — he didn't need to look at me anymore.


Act 5 · The Crash

Day 3 of the full cutover.

4:27 PM. The CFO walked straight into the CEO's office.

Not a complaint. Worse.

"A major client called me directly. Every order we've processed in the last three days has the wrong discount amount. Not some of them — every single one. "

Root cause analysis:

AI-generated code was computing compound discounts
with no defined calculation order.
The prompt didn't specify:
  "calculate line-item discounts first → then subtotal → then member tier."
The model guessed an order.
It guessed a different one on every request.

Canary hadn't caught it — because canary traffic only hit simple orders with single discounts. Full production on day one brought in bulk orders from major clients. Every single one computed discounts differently.

Impact: 3 consecutive days. $2,800,000 in unreconciled billing.

Metric Value
Orders affected 1,247
Average discrepancy per order $2,246
Largest single order discrepancy $18,740
Enterprise clients impacted 16
Longest uncorrected window 72 hours

The CEO called Marcus into his office. I wasn't in the room — but someone told me later he came out looking pale. CEO had HR schedule a postmortem.


Act 6 · The Reckoning

Postmortem. The CEO chaired. Marcus sat in the front row. The CTO, CFO, finance team, and the enterprise account managers filled the room.

I was there too. Back row.

Marcus opened: "The prompt missed a business rule. The team is fixing it now."

The CTO asked: "Canary ran for a month. How did nobody catch this?"

A pause. "The canary environment didn't have complex discount data."

"So your logic is — if the data wasn't in canary, it wouldn't be a problem in production?"

The room went quiet.

I raised my hand.

"Can I share something?"

Marcus turned and looked at me.

The CTO nodded. "Put it on screen."

I walked to the front. Plugged my laptop into the projector.

Slide 1:

# src/orders/discount_calculator.py:127

def apply_compound_discount(order: dict, user: dict) -> float:
    """
    Apply tiered discounts to an order.
    TODO: handle compound discount ordering — currently iterates
          over discount fields in JSON insertion order, which means
          the calculation sequence depends on how the client serialized
          the payload. This is not guaranteed to be consistent across
          requests. Needs explicit ordering: line_item → subtotal → member.
    """
    total = order["subtotal"]
    for key, rule in order.get("discounts", {}).items():
        total -= rule["amount"]  # ← order depends on dict insertion order
    return max(total, 0.0)

"This isn't a bug. It's a TODO. There were 47 of these in the codebase when it shipped to production. Distributed across every critical path in the module."

Slide 2:

# tests/test_discount_complex/test_discount_order.py — last modified 2019

def test_compound_discount_sequence():
    """Verify discount application order: line_item → subtotal → member.

    This sequence was confirmed with Finance in 2019 and must not
    be changed without cross-team approval.
    """
    order = sample_order(
        items=[{"sku": "A", "price": 100.00, "qty": 5}],
        subtotal_discount_pct=10,
        member_tier="gold",
    )
    result = legacy_discount_engine.apply(order)

    # Expected: 500 * 0.9 (subtotal) * 0.85 (member) = 382.50
    assert abs(result.final - 382.50) < 0.01
    assert result.application_log == [
        "line_item",
        "subtotal",
        "member",
    ]

"The old system had this test. Written five years ago."

Slide 3:

vibe-coding-take-down/evidence/
├── screenshot-01-unit-test-failures.png  (15 pass, 8 fail)
├── screenshot-02-todo-list.png           (47 TODOs)
└── screenshot-03-api-null.png            (API returning null)

Generated: 27 days before production cutover.

"Twenty-seven days before this went to production — I ran the tests. Fifteen passed, eight failed. Forty-seven TODOs. I screenshotted every single one."

I let that sit.

"Twenty-seven days. These tests sat in CI failing for almost a month. Did anyone open them? "

Silence.

I asked again: "Did anyone look at the test report?"

Marcus didn't look at me. He looked at the CTO.

"During canary… we were focused on business metrics. Unit tests weren't fully executed. The timeline was tight."

I didn't answer him. I put up slide four.

Slide 4: a screenshot of a Slack message.

Marcus's own message, sent before AI development began.

"Don't spend too much time on testing. AI generates its own code — it'll verify itself. Let's ship first, we'll backfill tests later."

Nobody moved. Nobody spoke.

I closed my laptop.

"Marcus. You said AI would test itself. Forty-seven TODOs unaddressed. Eight failing tests. Two point eight million dollars lost. What exactly did your AI verify? "

Marcus didn't answer.

The CEO spoke. He wasn't looking at Marcus. He was looking at me.

"When did you build that folder?"

"Twenty-seven days before production."

"Why didn't you say anything?"

"I did. At the demo. I stood up and tried to say something. Marcus cut me off. Nobody let me finish. "


Act 7 · The Aftermath

The CEO sent a company-wide email the next morning:

"Effective immediately: all AI-generated code must pass a line-by-line review by a senior engineer before merging to main. Test coverage requirements are non-negotiable. This policy cannot be overridden."

Marcus kept his title.

But the same day that email went out, HR released an updated org chart. A new Code Review Board, independent from Engineering, reporting directly to the CTO. I was put in charge.

Three people on the board. The other two came with me from Legacy Systems — solid engineers who never learned to play the room.

Marcus's team no longer had final say on what got merged.

Someone asked me later: "You waited a month for one meeting? Was it worth it?"

I said: "I didn't wait a month for a chance to prove I was right."

"I waited for him to say 'AI will test itself' in front of the whole company. I wanted to watch him swallow every word."

I deleted the evidence folder later. Didn't need it anymore.

The next day I walked past Marcus's office. The door was open. He was sitting at his monitor, reading a document about the new code review process — the one I'd written.


Forty-seven TODOs. Not a single one became a bug. The AI just guessed a calculation order where no rule was specified — a different guess on every request. When the process itself doesn't exist, AI will help you prove that "doesn't exist" is good enough.


Have you ever seen someone package a process failure as a technology belief? What happened next?

It took me thirty seconds to create an evidence/ folder — and one second to delete it. Forty-seven TODOs taught me one thing: the only calculation order I trust is the one my coffee cup follows.Buy me a coffee ☕

Follow for more stories about AI testing, quality engineering, and what happens when the code is generated but not verified.