惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

I
Intezer
Jina AI
Jina AI
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
有赞技术团队
有赞技术团队
J
Java Code Geeks
人人都是产品经理
人人都是产品经理
博客园 - 叶小钗
M
MIT News - Artificial intelligence
月光博客
月光博客
C
Check Point Blog
Y
Y Combinator Blog
S
SegmentFault 最新的问题
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
C
Cybersecurity and Infrastructure Security Agency CISA
A
Arctic Wolf
S
Security Archives - TechRepublic
S
Securelist
美团技术团队
SecWiki News
SecWiki News
H
Help Net Security
V
Vulnerabilities – Threatpost
S
Secure Thoughts
F
Fortinet All Blogs
量子位
aimingoo的专栏
aimingoo的专栏
T
Tor Project blog
大猫的无限游戏
大猫的无限游戏
Scott Helme
Scott Helme
MyScale Blog
MyScale Blog
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
D
Docker
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
L
Lohrmann on Cybersecurity
F
Fox-IT International blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
博客园 - 三生石上(FineUI控件)
Engineering at Meta
Engineering at Meta
Microsoft Security Blog
Microsoft Security Blog
Recorded Future
Recorded Future
V
Visual Studio Blog
WordPress大学
WordPress大学
S
Schneier on Security
Stack Overflow Blog
Stack Overflow Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
Apple Machine Learning Research
Apple Machine Learning Research
N
News | PayPal Newsroom
GbyAI
GbyAI
T
Threat Research - Cisco Blogs

DEV Community

Are Companies Really Doing Layoffs "For AI"? The Internet Is for Agents How Hermes Agent Helped Me Ship an Indonesian NLP Parser in One Week I built DeepWrap: a Python SDK and CLI for DeepSeek Chat Your "Autonomous Agent" Is Just a Cron Job With Better Marketing Tensors Explained Part 1: How AI Systems Represent Data Stop pasting production JWTs into random online decoders. I Tried Building a Complex Security Tool with a 1.5B Local Model — Here's What Broke I Built an MCP Server for INDmoney — Ask Claude About Your Portfolio in Plain English # Como criei um varredura diária de papers de IA com Ollama + Telegram Inline Wikipedia for every article you read How I Built a Multi-Chain DEX Trading Bot with Hermes Agent as My Trading Partner Stop Debugging in the Dark: How to Build a Real-Time Control Room for Autonomous AI Agents Kafka sem duplicação – 2 padrões pra você dormir em paz What Building My Own AI Bot Taught Me About Generative AI Control Plane Sovereignty: Why Your AI Stack Probably Isn't Sovereign A Superpower Behind Smart Decisions: Python in Data Analytics Base64 explained — what it is, when to use it, and the gotchas that bite developers AI fatigue is very real and people are fighting back! I built 39 free browser-based dev tools — here's every decision I made and why BrowserRouter You Solved the Hard Technical Problems. Operational Debt Is What's Going to Kill Your Company. Pinpoint Answer Today: Claude Code vs Cursor vs Copilot — I Tested All 3 for 90 Days on Real Projects Hermes Autonomy Substrate: compiling my judgment into a removable approval gate I found the r/openclaw thread with 27 upvotes where someone gave an agent a real iPhone and now I can’t stop thinking about it INP in production: what we wish we had measured earlier How Traveling Shaped My Mind and Helped Me Respect Other Cultures Why 'Who Last Touched This File' Is the Wrong Question Time When More Layers Meant Worse Model ... Birth Of Residual Why I Built a Privacy-First Discord Alternative Beyond the Numbers: How Ada Lovelace Envisioned the Dawn of Symbolic Computation (1833–1834) Millwright-Inspector: A Methodology for Software Development with AI Coding Agents Build Your First Claude Skill: An Gmail-to-GDrive Receipt Filer in 20 Minutes When Preprocessing Helps-and When It Hurts: Why Your Image Classification Model's Accuracy Varies So Much Treasure Hunt Engine: How We Blew Up the Docs and Built a System That Actually Works The Blacklist Nightmare: How to Get Off Spam Lists Fast How I built a Bluesky scraper using the AT Protocol API (and published it on Apify) How to Prompt AI Coding Tools Like a Senior Dev (2026) The Moment the JVM Tuning Knob Broke Our Treasure Hunt Engine Most Software Is Workflow Design Zendesk Relate 2026 - What I learned Me encanto! Pure CSS 3D Cat A Practical Home Energy OS with Home Assistant BrowserRouter, Routes, Link, and useNavigate Developers keep failing AWS SAA-C03 for the same reason (and it's not lack of AWS knowledge) I Tested 5 AI Coding Tools for 30 Days — Here's What Actually Works LogicNodes: 2,316 Deterministic AI Workers via HTTP — No Signup Required The Better Primary Key: A Guide to ULIDs for Rails Developers Why Hytales Treasure Hunt Engines Explode Under Load (And How We Fixed It Without Losing Ourselves) Why I think AI tools should live closer to the browser workflow Stacks en entrevistas técnicas: 3 problemas resueltos paso a paso How Would I Build For Right Now How to Set Up a Clean Page Object Model (POM) in Selenium with Java Part 2: Replacing a 3.4MB video with 40kb of scripted GSAP animations: adding a camera Hospedando sites de graça na AWS + Cloudflare Google's Gemini 3.5 Flash is 4x faster than other frontier models. Here is how to call it from TypeScript. Week 1 I built a code runner for 14 languages - try to break it and test How to Analyze Your Google Analytics Data with AI: GA4 AI Agent Guide 🇺🇸 Glancer — Ask your Rails database questions in plain language After a Delete, I Kill the Session I scanned Inbox Zero. It has a comprehensive prompt injection defense system. 🇧🇷 Glancer — Converse com seu banco de dados Rails em linguagem natural A Caddy Cert Expired Because systemd-resolved Was Selectively Lying Remetric: find waste in self-hosted Prometheus, Grafana, and Loki Execution-Boundary Governance for AI Coding Agents This is great, taking an oldie-but-goodie and rethinking it How Neural Networks Actually Work — A Thread for Curious Minds 10.430 muertes, 10 preguntas y un pipeline en Python: lo que los datos de violencia policial en EE.UU. no te cuentan a simple vista Payment Provider Profiles for Agent Task Markets Dual Encoder vs Cross-Encoder: Why Your RAG Pipeline Needs Both Stop Using useEffect for Data Fetching: Understanding TanStack Query When you bring your data home, who is going to keep an eye on it? How to Translate Your Existing Lovable app with i18.dev (In Under 1 Minute) I made Claude Code and Codex talk to each other across machines. Here's what broke. The Day the GC Tuning Patch Broke the Leaderboard TLS Is Easy to Enable and Hard to Get Right Comment laisser GPT-5.5 corriger un CV sans jamais lui montrer un seul donnée personnelle AI-Orchestrated 3D Asset Pipeline: From JPEG to Game-Ready GLB Without Touching Blender Building Production Voice AI Agents: Latency, Architecture, and What Nobody Tells You Beyond the Hype: Announcing the Open Source Sovereign Systems Specification & Pattern Library SynaptoRoute: A Study in Local Semantic Routing AGENTS.md is Not a Junk Drawer KeePass can't protect you from a stealer. So I built something that can. Anthropic's New Security Tooling is a Wake-Up Call for Agent Builders TinyPNG vs QuickShrink: Why I Switched to Client-Side Image Compression AI-Native Engineering Story C# Networking Deep Dive with io_uring part 6 - Numbers Build a Self-Healing App on AWS: A Beginner's Guide Semantic caching the VLM step in our product-photo pipeline GitHub Finish-up-a-Thon Challenge. The Myth of the 10x Developer Apache Data Lakehouse Weekly: May 21-27, 2026 Why We Chose Nx to Untangle a Growing Multi-Tenant Platform Nexube v0.2.6 is out now on GitHub! 🎬 Brownfield Slack alerts: a 6-minute guided MCP run on Stripe + Resend webhooks TipTap + Yjs + Hocuspocus saves content, but other users only see updates after a page refresh AWS Security vs Azure Security: A Complete Comparison Types of Prompting: Complete Guide to Prompting Techniques
Your AI Coding ROI Is Disappearing and Your Dashboard Won't Tell You
Keith MacKay · 2026-05-28 · via DEV Community

Your AI Coding ROI Is Disappearing and Your Dashboard Won't Tell You

The dashboard looks great. The delivery numbers don't.


Your AI coding dashboard looks great. Acceptance rate up. Lines generated up. Developer satisfaction scores up. Your team is thrilled. Management is impressed. The slide deck practically writes itself.

Now ask a different question: has your cycle time improved? Has your post-merge defect rate gone down? Has your review burden per PR decreased?

If you don't know the answers to those questions, you don't know if AI is helping. You know your team feels good. That's not the same thing.

Engineering leaders are measuring AI coding ROI with the wrong instruments. The metrics that are easy to capture look great. The metrics that would tell you whether the AI is actually making your team more effective are mostly going unmeasured. And that gap is where AI investments are disappearing.


The Metrics Everyone Uses (And Why They're Misleading)

Lines of code generated and autocomplete acceptance rate are the default starting points for most AI coding dashboards. They're easy to pull, easy to trend, and easy to show in a QBR. They are also almost entirely useless as productivity signals.

These metrics reward volume, not quality. You can 10x both numbers and slow your team down. Bigger is not better when it comes to code (unless it's "lines of code removed from the codebase"). More lines means more surface area to review, more places for bugs to hide, and more cognitive load for every engineer who touches the code after the author. More to maintain. More to refactor later. The AI doesn't know it's supposed to be frugal. It is, by definition, generative (it's in the name!). Measuring how much it generates and celebrating when the number goes up is like measuring how many ingredients your chef used and calling it a restaurant review. The best dishes are all about quality ingredients and phenomenal execution -- so too with code.

Developer satisfaction is the sneakiest misleading metric of the three. People love feeling fast. The sensation of code appearing faster than you can type it is genuinely mind-blowing (and addictive...I've considered starting a 12-step program for coding agent users and it's not a joke...I've counted 17 open terminal windows on my desktop, working DIFFERENT projects. Not rare to want to start 'just one more' process long after I should be sleeping...definite convo for another post!) It feels like productivity, but it often isn't.

There's a well-documented cognitive bias at play here: when a tool makes early-stage work feel effortless, people systematically rate their overall productivity higher, even when downstream costs eat the gains. The DORA 2025 data makes this concrete at scale: teams nearly doubled their PR merge rate and reported high enthusiasm about AI tools, while organizational delivery metrics stayed flat [1]. Satisfaction scores captured the feeling. The delivery numbers told a different story.

Time to first commit is the third common trap. It measures the wrong finish line. A commit that took 10 minutes to generate but 3 hours to review and 2 days to hunt down and fix the bugs it introduced did not save time. It shifted costs downstream and made them invisible to the metric that was being tracked. You look fast on the front end. The system slows down on the back end. Nobody connects the two. I wrote about this "waterbed problem" some months ago -- I'll include the link at the end of the article if you'd like to read further.


The Numbers You're Ignoring

The research is not subtle about this problem.

DORA's 2025 State of DevOps report found that AI tools increased tasks completed by 21% and PRs merged by 98% [1]. Those are the numbers that end up in the AI vendor case study. Here's what doesn't: organizational delivery metrics stayed flat. More PRs merged. Same delivery performance. The throughput increased. The outcomes didn't follow.

That finding deserves a moment. Organizations nearly doubled their PR merge rate and saw no improvement in delivery. Something in the system was absorbing all the gains. The code was moving faster into the pipeline ... but the pipeline wasn't getting faster.

On quality: CodeRabbit analyzed 470 real-world PRs in December 2025 and found that AI-generated code produces 1.7 times more issues overall and 1.4 times more critical issues than human-authored code [2]. Veracode's data is sharper: AI-generated code contains 2.74 times more security vulnerabilities, with a 45% security flaw rate overall and 72% when just the Java code was reviewed [3].

And on confidence: only 3.8% of developers report both low hallucination rates and high confidence shipping AI-generated code without human review [4]. The other 96.2% are, at minimum, uncertain. Many are doing substantial review work that isn't being measured anywhere.


The PR Size Problem Nobody Is Talking About

DORA 2025 found that AI tools consistently increased PR size by 154% [1].

That is important -- PR size is not a neutral variable. Larger PRs are harder to review. Review quality degrades as PR size increases. Reviewers shift from actually understanding the changes to pattern-matching for obvious errors. Bugs slip through not because reviewers are bad at their jobs but because human attention has limits and a 600-line PR is a different cognitive task than a 400-line one.

You code faster but your pipeline chokes. The AI generates more code per session. That code lands in larger PRs. Those PRs take longer to review and are reviewed less carefully. More issues make it through to merge. Post-merge defect rates climb. Incident rates follow.

This is a systems problem. You optimized one node in the pipeline and degraded the downstream nodes. The metric you were watching (lines generated, PRs merged) went up. The metric you should have been watching (cycle time, defect rate) didn't.

The bottleneck didn't disappear. It moved. And most teams don't have the measurement infrastructure to see where it went.


What to Measure Instead

Four metrics. These aren't exotic. Most engineering teams can instrument them.

Cycle time, commit to deploy. Not commit to commit, not task started to PR opened. Commit to deploy. This captures the full pipeline cost including review time, CI/CD wait time, and any rework loops. If AI is genuinely accelerating delivery, this number should move. If it's flat or growing while PR volume increases, you have the same problem DORA documented.

Post-merge defect rate, segmented by AI-assisted versus human-authored code. This is the quality signal that autocomplete acceptance rate completely misses. Track bugs filed against features and fixes, tag the originating PRs, and compare defect rates across code origin. The CodeRabbit and Veracode numbers suggest you will find a meaningful difference. That difference has a cost you can now put a number on.

Review burden per PR. Time to first review, number of review iterations, and reviewer time spent. This tells you whether the code landing in review is ready to review. If AI-generated PRs are consuming disproportionate reviewer attention, that's a real cost that isn't showing up anywhere in your current dashboard.

Rework rate within 30 days. How much AI-generated code gets substantially rewritten within a month of merge? Code that has to be redone isn't a cost savings. It's a deferral. The initial PR looked like velocity. The rewrite is where you pay it back, with interest.


Implementing the Shift

This doesn't require a new platform. It requires tagging.

Start by tagging PRs by AI involvement. The simplest version: developers mark PRs as AI-assisted, AI-generated, or human-authored. You don't need perfect granularity to start seeing signal.

Then run a 60-day baseline on the four metrics above, segmented by those tags. You will probably see what the research predicts: AI-assisted code moves faster into the pipeline and creates more downstream work. The net effect on cycle time will depend on how your specific team and codebase absorb that tradeoff.

The point isn't to prove AI doesn't work. Some teams will find it does, clearly and measurably. The point is to get honest about where the value is and where the costs are landing. Right now most engineering leaders are flying on instruments that measure activity, not outcomes. You can't optimize what you're not measuring.

Stop celebrating PR volume. Start measuring what happens after the PR.

One practical starting point: pick one team, one sprint, and instrument cycle time and post-merge defects by PR tag. You'll have more signal from that one experiment than from three months of acceptance rate data.

Another thing to track across this same timeframe are token volume and costs (track both -- cost per volume has dropped, but that trajectory is subject to change real soon now as OpenAI gears up to go public and as the business model of subsidized tokens grows less and less tenable). Tracking costs allows legitimate ROI conversations. Tracking token count allows comparison over time as cost metrics change.


The Bottom Line

The metrics most teams are using to measure AI coding ROI are measuring effort and sentiment. They are not measuring delivery performance. They are not measuring quality. They are not measuring whether the system your engineers are embedded in is getting faster or slower, and are not tracking whether any actual improvements have measurable ROI.

DORA doubled the PR merge rate and found flat delivery outcomes [1]. CodeRabbit found 1.7 times more issues in AI-generated code [2]. Veracode found 2.74 times more security vulnerabilities [3]. Developer satisfaction scores climbed while cycle time stayed flat. The dashboard looked great. The numbers didn't lie--they just measured the wrong things.

Measure cycle time. Measure post-merge defects. Measure review burden. Measure rework. If AI is helping your team deliver better software faster, those numbers will tell you. If it's helping your team feel productive while shifting costs downstream, those numbers will tell you that too.

Measure token count and cost. This is the only way to determine actual ROI.

The dashboard that tells you what you want to hear is not a monitoring system. It's a press release.


If this resonated, here are some related articles:


References

  1. 2025 DORA State of AI-Assisted Software Development — Google/DORA
  2. State of AI vs. Human Code Generation Report — CodeRabbit
  3. GenAI Code Security Report 2025 — Veracode
  4. State of AI Code Quality 2025 — Qodo

What metrics are you using to evaluate AI coding tools in your org? Curious whether teams are seeing the same disconnect between activity metrics and delivery outcomes. Drop your experience in the comments.


Keith MacKay is a technology strategy consultant and CTO in EY-Parthenon's Software Strategy Group (SSG), specializing in AI disruption and technology diligence for private equity and corporate clients. SSG's AI Disruption Lab conducts rapid assessments of how AI transforms and threatens existing business models and value chains. Keith teaches at Northeastern University and writes about strategy, management, and AI/technology, with an AI collaborator.