

























If you’re an engineering or product leader, you’re probably already getting the question: “Are AI tools getting us the 30% productivity boost that is happening in other organizations?”
You likely don’t have a good, honest answer to that question. And to get there you need a bit of patience and to face an age old problem for software engineering – how do we measure it?
One caution at the start – let adoption mature. In almost every rollout I’ve seen, the first 3-6 months are a time of rapid improvement:
AI Tool adoption is the biggest knowledge and skills change for engineers and engineering teams ever in any of our careers. Competence takes time. Early on, your measure should focus on adoption and use to enable coaching, not trying to push too hard on other measures. But that doesn’t get you off the hook from figuring out how to answer the measurement question. Side note: if you haven’t yet incorporated AI coding tools into your SDLC, check out our recent blog post 2-week spike to ramp up on AI Coding Tools.
Want to learn more? We’re hosting a special two-hour deep dive for engineering and product leaders about how to measure the real impact of AI coding tools, what metrics actually matter, and how high-performing teams are handling the transition.
AI Coding Tool Metrics: DORA and CTOs Deep Dive
Friday, January 9, 2026 • 8–10 AM PST / 11 AM–1 PM EST
Can’t attend live? Register anyway and we’ll send you the full session recording.
All registrants receive the full recording.
This two-hour, high-impact mini-conference includes:
You’ll learn:
This is the first time the LA CTO Forum has opened one of its online sessions to a broader audience. Don’t miss this opportunity!
Once you’re past the initial rollout, most orgs end up tracking some subset of these:
That said, you quickly run into the same problem we’ve always had with developer measurement and the AI coding tools just layer complexity on top.
I will also point out that the widely varying studies that you read plays directly into this and the fact that you are likely measuring immature adoption.
The other trap is that a lot of the best AI use cases don’t include code generation and may not affect “throughput” numbers:
Using an assistant to explain logs, propose hypotheses, and narrow in on fixes is incredibly valuable. The final fix might be three lines of code, but the time saved in root cause analysis is where the win lives.
Having an agent walk an engineer through modules, data flows, and edge cases is gold for onboarding and cross-team work, and really day-to-day work as well. The output might be a short design note, a diagram, or just a better mental model, but often not code itself.
Turning fuzzy business goals into crisp acceptance criteria, edge cases, migration plans, and trade-off analyses is real engineering work. Good use of AI here usually means more iterating and more thinking up front. This work itself is not yet code.
AI can act as a second set of eyes: flagging missing tests, odd edge cases, or inconsistencies with past patterns. It may not change the size of the diff, but it can quietly improve quality and shorten the path from PR to deployment.
If you rely too heavily on Lines of Code produced, you will fall into all the old traps and you will especially undervalue these use cases.
Even when AI tools are helping, they create some early friction that can make metrics look worse before they look better:
Once engineers get good with AI, they tend to ask more – and better – questions about requirements and acceptance criteria. Tickets that used to be “good enough” start getting challenged. That’s healthy, but in the short term it can make cycle times look longer and frustrate product managers who weren’t expecting that level of scrutiny.
If you think of AI as multiplying your number of junior developers, your ratio just shifted dramatically. You now have far more “entry-level” code being submitted for review review. Without changes to review practices and guardrails, senior and mid-level engineers get swamped in AI-generated diffs and everything slows down.
This is why you can’t just stare at velocity charts and “% AI-generated code” and call it a day. You have to look at the whole system: how long work takes end-to-end, how quality and incidents move, how much time seniors spend reviewing, and whether the non-code work (requirements, debugging, comprehension) is getting easier.
If you’re getting pressure to “show me the numbers,” a reasonable stance looks like:
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。