惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

C
Comments on: Blog
酷 壳 – CoolShell
酷 壳 – CoolShell
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
李成银的技术随笔
美团技术团队
博客园 - 三生石上(FineUI控件)
爱范儿
爱范儿
Simon Willison's Weblog
Simon Willison's Weblog
Cisco Talos Blog
Cisco Talos Blog
博客园 - 司徒正美
Jina AI
Jina AI
S
SegmentFault 最新的问题
Recorded Future
Recorded Future
大猫的无限游戏
大猫的无限游戏
月光博客
月光博客
E
Exploit-DB.com RSS Feed
J
Java Code Geeks
腾讯CDC
V
V2EX
NISL@THU
NISL@THU
M
MIT News - Artificial intelligence
量子位
T
Tor Project blog
T
Threatpost
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
博客园 - Franky
Scott Helme
Scott Helme
U
Unit 42
博客园 - 聂微东
Hacker News - Newest:
Hacker News - Newest: "LLM"
雷峰网
雷峰网
Vercel News
Vercel News
GbyAI
GbyAI
MyScale Blog
MyScale Blog
Microsoft Security Blog
Microsoft Security Blog
Recent Commits to openclaw:main
Recent Commits to openclaw:main
aimingoo的专栏
aimingoo的专栏
H
Hackread – Cybersecurity News, Data Breaches, AI and More
有赞技术团队
有赞技术团队
W
WeLiveSecurity
T
Tailwind CSS Blog
S
Schneier on Security
Hugging Face - Blog
Hugging Face - Blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
Y
Y Combinator Blog
I
Intezer
Last Week in AI
Last Week in AI
D
Darknet – Hacking Tools, Hacker News & Cyber Security

DEV Community

From Side Project to Student Savior: My AI PPT & Resume Tool Crossed 1.5K+ Users Self-Hosted Document AI: How to Run Document Intelligence On Your Own Infrastructure (2026) How to Extract Tables from PDFs with AI: 4 Methods That Actually Work (2026) IDP vs OCR: What's the Difference — and Which Does Your Business Actually Need? Automated PII Detection and Redaction in Business Documents: A Practical Guide Human-in-the-Loop Document Review: When to Use It and How to Set It Up (2026) Document Processing Without RPA: A Modern Approach for Small Teams Reducto Alternative: When You Need More Than a Document Parser (2026) Hermes Agent vs LangChain vs CrewAI: When to Reach for Each SparshAI: I Built an Offline AI Tutor for Students Using Gemma 4 — Here's What Happened Building NeuroSense AI: A Human-Centered Stress Insight Assistant Powered by Gemma Why I Built a Privacy-First Dev Toolkit GAS Input Tags: Ability Activation Without Hardcoded Bindings AI Legal Document Advisor Supported By Gemm 4 Model Building Convertify in Public Week 10: PDF Cluster + Blog Launch CureNet AI: Decentralized Health Intelligence for India, Powered by Gemma 4 and ABHA Standardization When Open-Weights AI Meets a Broken Healthcare System: Deploying Gemma 4 in Rural India V.A.L.I.D. Google I/O 2026: The Year Google Stopped Building AI Assistants and Started Shipping AI Engineers Bondmap: AI-Powered Relationship Network That Maps How You're Connected to Everyone Using Gemma 4 Gemma 4 challenge inspired me to build my first app! 96. LoRA: Fine-Tune a Billion-Parameter Model on a Laptop From a Student Who Used CircuitVerse to a GSoC Contributor — My Community Bonding Story How Bf-Tree Keeps Mini-Pages Small, Hot, and Cheap to Evict I asked Claude to explain the chip war and ended up understanding modern geopolitics differently Stop Manually Checking for Server Updates: Automate With Email Notifications Nostalgia Meets Cybersecurity: Spotting Modern Scams in a Retro OS Simulator - Forward or Fraud CRACKING CODING INTERVIEW From Python to Production Pipeline :A Practical guide to Apache Airflow Antigravity 2.0: Google Just Changed What It Means to Be an Engineer I Built a Free Sticker Maker Because Every Other One Hid the Export How I bypassed Blazor WebAssembly's Virtual DOM using raw WASM pointers Distributed Tracing for LLM Agents: When MCP Makes Tool Calls Observable The Zero-Budget Memory Setup Behind My AI Agent Workflow No database. No framework. Just files, startup order, correction logs, and discipline. I Built an AI Second Brain with Gemma 4 The Most Exciting Google I/O 2026 Announcement for Me: HTML-in-Canvas CrisisLens: Compressing Disaster Scenes into 200-Byte Emergency Payloads with Gemma 4 I'm 15 and I built a todo app with Telegram Stars payments — only legal way for me to monetize before turning 18 Crypto Branding After the Token Launch Building an on-chain alerts bot in Python without any blockchain library FinePrint — An AI Pocket Lawyer That Decodes Predatory Contracts Using Gemma 4 How to Connect OpenAI with Supabase in 10 Minutes for a Lightning-Fast AI MVP One AI Gateway for AWS Bedrock, Google Vertex AI, Gemini, and Anthropic Reading Log #9 — Aoashi The Tacit Dimension Thinking, Fast and Slow Web3 Onboarding Is Not a Wallet Problem. It Is a Trust Problem. FHE Prompt Privacy: The Metadata Leak Your Demo Still Has Software Might Be Becoming Agent-Aware: What if software starts coordinating itself? The Silent Killers of Go Concurrency: Mutexes, Semaphores, and Goroutine Leaks Lynx framework first look Building Aries AI: A Solo-Built AI Abacus Tutor on OpenAI + Supabase + Render + Razorpay I built a paid Telegram bot. Here's what Telegram Stars actually pay. Transfer Fees, Metadata, and Soulbound Tokens: A Tour of Solana Token Extensions Improving AI resume matching with prompt iteration — 7.37 to 8.37/10 7 things you can do with Rogue Studio that no other AI IDE will let you do Why I Think WordPress Still Matters Reading Log #7 — Aoashi Guns, Germs, and Steel Distinction Open Models and the Sub-Saharan Region What 12 Months of AI-Generated Pull Requests Taught My Engineering Team Feature Flags in .NET 8: ASP.NET Core, Minimal APIs, Blazor The Quiet Architecture of Systems That Refuse to Die From OOP to SOLID: Everything You Need to Know in One Article I Scanned 5 Common LangChain Agent Patterns. Every Single One Was Over-Permissioned. Production-Ready MCP Servers in 60 Seconds (Auth, Rate Limits, Audit Logs Included) Dari OOP ke SOLID: Semua yang Perlu Kamu Tahu dalam Satu Artikel The Most Important Part of Google I/O 2026 Wasn’t a Model — It Was the Infrastructure When SafetyCo Goes to War: Anthropic, the DOD, and the Limits of Ideals-Based Frameworks Why AI Memory Resolves Too Much — And What to Preserve Instead What Gemma 4 Means for the Future of Local AI (And Why It Matters More Than GPT-5) The Classroom Gap: Why Applied AI Has Yet to Transform How the World Learns Cell-to-Sentence (C2S): LLM-Powered scRNA-seq Annotation with Gemma 4 GitHub rust-2026-template — my Rust starter in 2026 Stop Editing JSON by Hand How I Turned an Old Movie Recommendation Project Into a Cinematic AI Platform Linux Command Line: The 25 Commands I Use Every Day (2026) The Multilingual SEO Trap: When Your Meta Description Speaks the Wrong Language young-colleague-job-worries What I Learned About Token Design on Solana as a Web2 Developer 19/30 Days System Design Questions! My first Android App - NightLock Tabula vs Camelot vs pdfplumber in 2026: Which Python Library Actually Wins? AI Agent Failure Loops: When Persistence Becomes a Quality Bug Experienced devs are slower with AI and they don't even know it Building a No-KYC Poker Bot: What I Learned Automating Crypto Tables React.lazy + chunk errors: how to recover users stuck after a deploy How I Built Clinical Trials API - From Public Data to RapidAPI in 2 Weeks Where is the Code Editor?! - Reception for Antigravity 2.0 I built a tool to catch AI coding agents misbehaving — and put zero AI in it Reading Log #5 — Aoashi Seeing Like a State Distinction [Boost] How to Build a Clinical Trial Search App in 5 Minutes - Clinical Trials API Tutorial Gemma For Dummies: I Knew Nothing. Now I'm Running AI on My Laptop. I gave an AI a Kill Switch. Here's what I learned about trust in local-first tooling. Notification System Technical Specification What ElumKit v0.1 already does (and the one primitive I missed) Why Every Student Developer Should Know About Microsoft Imagine Cup 🚀 Mikplanu: Empowering Education through Edge AI Sovereignty 터미널 AI 에이전트 구축 (v9) What If Your Portfolio Verifier Could Actually See Your UI? Node.js Event Loop Architecture — How a Single-Threaded Runtime Handles Massive Concurrency From Concept to Code: Bringing Your Vision to Life with Michael K. Laweh
Why Story Points Don’t Work in the AI Era, And What Should Take Their Place Instead.
Sriramprabhu · 2026-05-25 · via DEV Community

Imagine this scenario at your next sprint review meeting: You're looking good on your velocity graph. But half your team is struggling in their own little hell. Estimations have devolved into Russian roulette. You get a "2-point" done in 45 minutes. You have another "2-point" that takes 4 days because AI-written code introduced a nasty bug in staging that was missed.

This isn't an effort issue; it’s a problem of predictability. Story points, meant for predicting the performance of stable humans, don’t even know what to do with this situation.

I believe it’s time to ditch story points. Let me explain how.

Why I Think the Old System Is Defective

"Story points are a commitment to the idea 'that task is about twice as hard as this task.'" These assumptions include the following:

  1. The developer has approximately the same capacity
  2. The level of complexity correlates well to duration

AI breaks both assumptions.

Throughput is no longer fixed. The same developer, same work, same day—make one tweak to the model and next thing you know they’re getting things done in 40% less time. Or something goes wrong with the model and it takes them two days to fix an hallucinated abstraction.

Complexity not linked to task duration. A very difficult task could become very easy if the AI does it right. But a very simple task could become very difficult if the AI gets it wrong. The variance is much larger than the mean—and story points measure only the mean.

Here’s what I believe a normal team would be seeing after a couple of sprints using AI:

Task Type         | Pre-AI Avg  | Post-AI Range
------------------|-------------|----------------
API endpoint      | 2 days      | 3 hours – 2.5 days
DB migration      | 3 days      | 1 day – 4 days  
UI component      | 1.5 days    | 30 min – 3 days
Legacy refactor   | 5 days      | 2 days – 8 days

Enter fullscreen mode Exit fullscreen mode

If the range exceeds the estimation, then the estimation is noise.


The Hidden Cost That No One Is Estimating

This is my hypothesis on what most teams are failing to estimate: verifying the AI output is the highest-cost activity for almost any task.

A real-time estimate of costs involved would be something like this:

  • 20% of the time: figuring out what to build
  • 15% of the time: the AI building it
  • 65% of the time: going through it, looking for problems

And that third one—the curation tax—falls into an unseen category for everyone. They're estimating the time spent on construction and not on curation. That would be like planning a home renovation based only on how fast someone bangs a hammer.

If teams started taking review-and-validation seriously as part of their estimate process, I'm convinced their accuracy will improve drastically.


What I Suggest to Replace Story Points

1. Confidence-Tagged Estimates

What I suggest: Every ticket must be provided with two things—a time estimate and a confidence tag.

ticket: INDE-002 — Migrate auth service to new SDK
estimate: 1.5 days
confidence: low
reason: "New SDK, AI hasn't seen our auth patterns before"
action: spike first (0.5 day), then re-estimate

Enter fullscreen mode Exit fullscreen mode

The confidence tag is the important part. It tells the PM whether to trust the number or treat it as a hypothesis.

Three bands:

Tag What it means How to plan
high Done this before with AI, know the variance Plan on the estimate
medium Familiar territory, some unknowns Buffer 2x
low Novel task or AI-unfriendly domain Spike first, don't commit

My recommendation for the rule: always spike a ticket marked low before adding it to a sprint. My guess is that just one rule can prevent almost all sprint meltdowns in AI-driven teams.


2. "Free Second Time" Paradigm

It’s interesting how I notice a certain pattern: when doing any task for the first time with the help of artificial intelligence, it’s very costly. But the second time the same is done, the cost is reduced by 60%. And by the third time, it becomes pretty much negligible.

How can this happen? Well, the first attempt makes one develop a specific workflow – an optimal prompt structuring, a proper context window configuration, and everything that might go wrong during execution.

That being said, I believe the cost estimation should be done in a different way:

First instance of task type:  estimate × 2 (you're building the workflow)
Second instance:              estimate × 0.8 (refining the workflow)
Third+ instance:              estimate × 0.4 (executing the workflow)

Enter fullscreen mode Exit fullscreen mode

Example: consider migration of 12 microservices using a new observability SDK.

  • Service 1: 6 hours (thinking about how to do it, writing prompts)
  • Service 2: 2.5 hours (fine-tuning, dealing with edge cases)
  • Services 3-12: ~45 minutes each (batches in bulk using known procedure)

Old estimate: 12 x 4 hours = 48 hours. This approach: ~16 hours. But only if you put in the effort on service 1 rather than rushing through.


3. Review-Weighted Sizing

I don’t believe that one should size by "how difficult will it be to develop." Instead, one must size by "how difficult will it be to verify?"

The easiest pieces to create are often very difficult to review (large refactors, verbose migrations), while difficult pieces to generate are simple to review (small algorithmic fixes with explicit test cases).

This sizing rubric must be inverted:
| Old thinking | New thinking |
|-------------|-------------|
| "Lots of code = big ticket" | "Lots of code to review = big ticket" |
| "Complex logic = big ticket" | "Ambiguous correctness criteria = big ticket" |
| "New framework = big ticket" | "AI-unfamiliar patterns = big ticket" |

500-lines of boilerplate migration needs to be large not because it’s difficult to generate, which an AI can do within minutes, but because checking for nuanced differences in 500 lines of code is truly costly.


How This Changes The PM Conversation

The hardest part of any estimation paradigm shift isn’t technical. It’s explaining the change.

Old conversation:

"This epic is 34 story points. At our velocity of 21/sprint, it’ll take ~1.6 sprints."

Where I think this discussion needs to go:

"This epic has 8 tickets. 5 of which are high-confidence tickets (we’re going to meet the estimates here). 2 are medium confidence (double our estimates). 1 is low confidence (we need a spike day to be sure about that). Optimistic estimate: 1 sprint. Pessimistic: 1.5 sprints. What if the low-confidence ticket is a problem? 2.5 sprints."

Bigger sentences? Yes. More useful? Absolutely. PMs have a choice to make now: "Pull out the low-confidence ticket and ship everything else on time" is now a discussion that you can have.


Metrics Worth Tracking Instead

Velocity as a metric should be scrapped in favor of more useful measures, which I would propose to track include the following:

  • Curating rate – proportion of review time vs creation time. Goal: below 3:1.
  • Confidence success rate – proportion of 'high' tickets that make into the estimate.
  • Process reuse rate – frequency of reusing a process for second similar task vs creating anew.
  • Spike conversion rate – after spike how often 'low' ticket turns into 'medium' or 'high'.

These measures will inform about the progress of the team in collaboration with the AI, as opposed to going 'fast'.


TL;DR – The Replacement Kit

If you are still Fibonacci-estimating stories in 2026 and asking why sprints are akin to playing Russian roulette, here’s my suggestion:

  1. Confidence tagging for estimates — confidence level will matter more than estimate itself
  2. Curation effort estimation vs Construction effort estimation — curation will be the hard work
  3. Novelty tracking for each task — new tasks are 2-3 times costlier than recurring tasks
  4. Task size based on difficulty of reviewing and not generation — reverse the complexity paradigm
  5. Spike before undertaking high uncertainty tasks — one simple rule to massively reduce blowups

In my mind, however, the fundamental change that needs to happen is as follows: before, estimation was focused on how much time it takes to build something. Today, it’s time to focus on how much time it takes to validate it. Change the paradigm, and suddenly, sprint planning starts reflecting reality.


Obviously just one way of seeing things—there are many brilliant minds out there who have figured out how to make story points work using AI-driven adjustments. Where do you stand: adding to the old system or building a new one from scratch?