惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

F
Full Disclosure
Recorded Future
Recorded Future
T
Tenable Blog
S
Securelist
C
CERT Recently Published Vulnerability Notes
T
Threatpost
S
Schneier on Security
A
Arctic Wolf
The Hacker News
The Hacker News
C
CXSECURITY Database RSS Feed - CXSecurity.com
Know Your Adversary
Know Your Adversary
P
Privacy International News Feed
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
The Register - Security
The Register - Security
Cisco Talos Blog
Cisco Talos Blog
AWS News Blog
AWS News Blog
K
Kaspersky official blog
T
True Tiger Recordings
T
Threat Research - Cisco Blogs
V
Vulnerabilities – Threatpost
P
Palo Alto Networks Blog
T
The Exploit Database - CXSecurity.com
小众软件
小众软件
B
Blog
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
Microsoft Azure Blog
Microsoft Azure Blog
Cyberwarzone
Cyberwarzone
C
Cybersecurity and Infrastructure Security Agency CISA
T
Tor Project blog
Spread Privacy
Spread Privacy
Malwarebytes
Malwarebytes
P
Proofpoint News Feed
F
Fox-IT International blog
F
Fortinet All Blogs
P
Privacy & Cybersecurity Law Blog
G
GRAHAM CLULEY
量子位
Latest news
Latest news
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
博客园 - 叶小钗
Project Zero
Project Zero
T
Tailwind CSS Blog
N
Netflix TechBlog - Medium
Martin Fowler
Martin Fowler
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
I
Intezer
博客园_首页
腾讯CDC
H
Hackread – Cybersecurity News, Data Breaches, AI and More
D
Darknet – Hacking Tools, Hacker News & Cyber Security

Hacker News - Newest: "AI"

The Vibe Coding Era: Why AI Won't Replace Software Engineers [video] AI agents are scrambling power users' brains Ask HN: Has AI affected negatively the job market for devs? Show HN: I built a tool to auto-accept AI slop and bigtech devs loves it OpenAI admits AI hallucinations are mathematically inevitable, not just engineering flaws starlette - secwest.net - secure virtual engagement Shopify's AI Developer twitter.com Robotics giant Figure AI demonstrates its robots to the world Bay Area mom out thousands after scammers use AI to mimic daughter's voice in fake kidnapping iPhone 版“Today” - App Store 6 Million Fake GitHub Stars: How to Vet Open-Source AI Tools Before You Bet on Them Why AI's Biggest Deals Price Assets Before Revenue AI chatbots show bias toward Catholicism, researchers say LMIM OS – an offline AI ecosystem. Voice, RAG, WhatsApp. ++ One file. 0 setup Authors versus AI and the risks to government public sector push There's at Least One Job That AI Isn't Killing AskMingLi: AI-assisted BaZi chart readings AI Isn't Management. Try Explaining That to Matthew Prince Who Wants to Be Hired? (May 2026) – AI Engineer (Python, RAG, Agentic Workflows) twitter.com The AI Industry Just Walked Into the Vatican Humanize – two LLM-agnostic skills to rewrite and detect AI text HypeScribe – AI-powered transcription, summaries, and search for any audio/video GitHub - NikhilSKashyap/interviewsignal: AI-native broad-interviewing. Share a code, capture thought process, auto-grade on submit. pip install, zero setup cost, pure signal. FlowLink: MCP proxy blocking destructive AI agent commands Blitzy AI charges by LOC generated AI-Related Issues in Securities Cases: Privilege Pitfalls, 'AI Washing' Claims AI is killing All About Berlin Pheno: AI-Powered Personalized Health Platform GitHub - rishavsunny12/harvestGuard: Lets see how claude code creatively creates a project for me NES, SNES, Genesis, VirtualBoy, and PSX | A journey with AI and Recompilation The Rise of the AI Script Kiddie Stack Overflow's forum is dead thanks to AI SpaceX's AI Pursuits Have yet to Take Off Do AI Risks Require Extraordinary Government Intervention? GitHub - Dylanchess0320/LuckyD-Code: LuckyD Code - Terminal AI Assistant / Discord - https://discord.gg/ApEKKUuKd I applied to YC with an AI-native IDE for hardware prototyping AI may be fuelling U.S. business creation, but few signs of a similar trend in Canada A Board Game agent built using Sanity Context and Vercel's AI SDK | Sanity Microsoft’s GitHub was positioned to win the AI coding race. Outages got in the way Too dangerous to release: is Mythos the start of the restricted-AI era? Show HN: Audiogen – a new take on generative music AI ScribeItLocal — Free Local Video & Audio Transcription The Three-Cylinders Problem — When AI Models Choose Beauty Over Truth Show HN: MurrDB: A RocksDB-based NVMe/S3 cache for AI inference workloads The rise of the -10x engineer: The negative side of AI productivity Safe Ways to Use AI Agents Programming Is Real Engineering, And AI Proves It What AI race? China and U.S. AI are tightly connected High-VRAM GPUs aren't the future of local AI GitHub - mbbill/mind-expander: A shared visual workspace for understanding and steering code with AI agents. Show HN: We made a cinematic heist trailer with 4 AI models for $60 Release shield-v0.7.0 · AperionAI/shield AI Startup Says It Will Pay People $2,000 a Month to Masturbate—Yes, Really MCP: Security Design Considerations for AI-Driven Automation by NSA [pdf] Rethinking organizational design in the age of agentic AI Client Challenge GitHub - takshd15/Laptop-AI GitHub - SynapCores/synapcores-agent: Real, framework-free AI support agent where SynapCores is the brain — memory, RAG, tool routing, generation in one database. Browser chat widget + live Brain debug sidebar. Fork and run in 30s. The Math Changed AI-Augmented Software Development Manifesto Whisper by Remskill — AI Voice Assistant for Desktop AI tools lead to 'clear racial disparities' in job hiring Excerpts from Pope Leo XIV's manifesto on humanity and AI | AP News GitHub - StackOneHQ/stack-nudge ‘BusPatrol’ Put AI Cameras in Tens of Thousands of School Buses. Now They Want to Give Cops Access AI Killed Stack Overflow (and why that sucks) AI-Powered Cyber Attacks in 2026: How Adversaries Are Evolving Rogue states are putting AI agents to work on sanctions evasion Show HN: Treats Human and AI the Same Seventy years of mathematics built the thing we call AI Genre glitches and unexpected promotional phrases as a sign of AI writing Reverse centaurs and the failure of AI (2021) HVTracker – trust registry for open-source AI agents The Inevitability: Why AI Cannot Be Stopped, Slowed, or Resisted WebBridge - Let Kimi Agent Drive Your Browser | Kimi RTMH: Pope Leo’s Magnifica Humanitas on AI — LessWrong GitHub - SkepticCTO/decoding_the_language_machine: Documentation, Prompts, and Media for the "Decoding the Language Machine" series Block open-sourced Goose, an AI agent that scaled to 60% of the company Beyond Recall: Behavioral Specification as an Interpretive Layer for AI Personalization GitHub - compuficial/apery: Synthetic Data Generator for Agents Will AI cause a job apocalypse? 3 AIs Answer Why AI Agents Should Be State Machines Show HN: I built a tool to estimate AI agent costs before you ship GitHub - aws-samples/sample-well-architected-skills-and-steering: Reusable skills and steering that teach AI coding agents how to apply the AWS Well-Architected Framework. One set of playbooks, 12 supported tools. Spotify says its AI remix tool protects artists from unregulated ‘slop’ Taming the agentic influx: a blueprint for AI business observability BurnKit – Stop being the human event loop for your AI coding sessions Release BoquilaHUB 0.5 · boquila/boquilahub Neo-Capital — Local-first bookkeeping GitHub - microsoft/agent-governance-toolkit: AI Agent Governance Toolkit — Policy enforcement, zero-trust identity, execution sandboxing, and reliability engineering for autonomous AI agents. Covers 10/10 OWASP Agentic Top 10. Vendorlobby — Vendor pitches, on autopilot AiAffList — The Biggest AI Affiliate Programs List Sam Altman: I was wrong, AI unlikely to lead to jobs apocalypse Typerion: The coherence system for software development AI speeds up discovery of next-gen computer chips and electronic materials Daily links from Cory Doctorow Microsoft and Uber Are Running into an AI Cost Problem GitHub - JustVugg/judicex: Open-source Legal AI workspace for evidence-grounded legal drafting, matter analysis and verifiable answers.
Twelve Ways to Be Wrong About AI-Assisted Coding
calcifer · 2026-05-21 · via Hacker News - Newest: "AI"

Posted

Suppose your manager asks you next week to demonstrate that the AI coding tools your company signed up for are worth the subscription cost. Would you measure lines of code generated, or tickets closed? Or would you send out a survey asking whether developers feel more productive? Each of those approaches is flawed in a different way; the sections below explain why.

Note: this post is about how people are assessing AI, not at LLM-assisted coding itself; with a little rewording, these criticisms could be applied to a lot of the claims that have been made about agile development, test-driven development, and other practices. If I’ve learned anything in the last twenty years, it’s that software engineering would be a lot further ahead today if we had been willing to let our peers in the human sciences teach us how to study these kinds of things properly.

Also, if you’d like a one-day introduction to the research methods you should use to avoid making these errors, please reach out. I’m not qualified to teach it, but I know people who are, and I could probably talk them into doing it…

Counting Lines of Code Generated

Proxy metrics stand in for concepts that are hard to measure directly, and lines of code is one of the oldest. LLMs generate more code, but not necessarily better outcomes: a team that sees a 40% increase in lines of code per developer after adopting LLM tools has measured verbosity, not productivity. Deleting 2000 lines of tangled logic and replacing it with 200 clean ones is an improvement that looks like a loss on this metric [Sadowski2019]. More code also means more to read, maintain, and debug, and AI’s contribution to that future burden does not appear in the line count.

Timing Artificial Tasks

A widely cited study found that developers who used GitHub Copilot completed a task 55% faster than those who did not [Peng2023]. The task was implementing an HTTP server in JavaScript from scratch, in ninety minutes; the developers had no other obligations that day. Real software development involves navigating a large codebase you did not write, understanding a requirement described ambiguously in a ticket, coordinating with colleagues, and attending meetings. Speed on a greenfield toy task does not predict speed on any of that. A randomized controlled trial with experienced open-source developers found the opposite of what participants themselves predicted: giving them access to AI tools increased task completion time by 19% [Becker2025].

Before/After With No Control Group

You start using LLMs in January; by June, pull requests are shipping faster, so the tools must be working, right? But between January and June you hired twelve engineers, refactored the CI pipeline, and switched your cloud provider. Without a group that did not adopt the tools, you cannot separate the effect of LLMs from any of the other changes that happened at the same time. Internal validity requires a credible counterfactual, i.e., some way of knowing what would have happened otherwise.

Asking Developers If They Feel More Productive

Survey results like “87% of developers report feeling more productive with AI tools” are regularly cited as evidence that the tools work [Liang2024], but three things make self-report systematically misleading:

  • The Hawthorne effect means people work differently when they know they are being observed and evaluated;

  • The novelty effect means new tools feel faster because they are novel, and that feeling typically fades within weeks; and

  • Social desirability bias means respondents tend to say what they believe the survey wants to hear, especially when management chose the tool.

Counting Commits, Pull Requests, and Tickets

In 2023, McKinsey proposed measuring individual developer productivity using counts of commits, pull requests, code reviews, and similar activities [McKinsey2023]. Goodhart’s Law states that when a measure becomes a target, it ceases to be a good measure [Goodhart1984]. When developers know their commit count is tracked, they make more, smaller commits; when ticket counts are tracked, tickets get split. The numbers improve while the underlying work does not [Beck2023]. Activity is not output; output is not value.

Measuring Only the Easy Half

LLMs make code generation faster, and that half is easy to measure. The other half is harder: time spent reviewing LLM-generated code for correctness, time lost debugging confidently wrong suggestions, security vulnerabilities introduced by plausible-looking but insecure code, and technical debt from suggestions that solved the immediate problem while ignoring the surrounding design. A study of GitHub Copilot’s code found that a substantial fraction of generated code contained security vulnerabilities, and that developers under time pressure accepted insecure suggestions at higher rates [Pearce2022]. A 2025 evaluation of five major LLMs found that none produced web application code meeting industry security standards [Dora2025]. A large-scale analysis of over 300,000 AI-authored commits found that more than 15% introduce at least one quality issue, and nearly a quarter of those issues persist in the codebase long-term [Liu2026]. Measuring only the inputs that go up while ignoring the costs that also rise is not measurement; it is marketing.

Treating Adoption Rate as a Success Metric

“We have achieved 90% AI tool adoption across engineering” is a procurement outcome, not a productivity outcome. Adoption measures whether the tool is installed and opened; it says nothing about whether suggestions are useful, whether developers accept them thoughtlessly, or whether the accepted suggestions are correct. High adoption combined with low suggestion quality produces a workforce spending time managing a tool rather than benefiting from one. A study of IBM’s enterprise AI coding assistant found that while the tool often provided net productivity increases, those gains were not experienced uniformly across its user base [Weisz2025]. But adoption is easier to measure than benefit, which is exactly why it gets reported instead.

Comparing Volunteers to Non-Volunteers

Studies that compare developers who chose to use LLMs against those who did not are comparing two different populations, not two conditions. Early adopters differ from late adopters and non-adopters in ways that directly predict productivity: they are more motivated to experiment, more comfortable with new tooling, and more likely to already be high performers. Selection bias means any observed difference between the groups may be a property of the person rather than the tool. This is the most common design flaw in industry AI productivity reports, because it is the cheapest study to run. A two-year longitudinal study of Copilot use at a large IT organization found that developers who used the tool were consistently more active than non-users even before it was introduced [Stray2026].

Measuring the Individual Instead of the System

Individual coding speed is the easiest thing to measure, so it gets measured. But if AI tools help developers write code 30% faster and the team’s time from ticket to production does not change, the bottleneck was not writing code. More code generated also means more code to review: if AI increases code volume without increasing review capacity, cycle time may worsen [Forsgren2021]. An empirical study of professional developers found that while AI tools boosted output for less-experienced contributors, senior developers experienced a 19% decline in their own productivity as they absorbed a 6.5% increase in code review load from AI-generated code [Xu2025]. Optimizing one stage of a pipeline while ignoring the others is a systems-thinking failure dressed as a productivity study.

Measuring During the Novelty Period

A four-week study that finds a productivity boost has found a four-week productivity boost. The novelty effect is real: developers are more engaged with new tools during the initial period, which inflates observed performance relative to the long-run baseline. Effects that actually matter emerge over months, not weeks, including skill atrophy for tasks now delegated to the AI, accumulation of technical debt from wrong suggestions, or changes in how teams collaborate. A study designed to detect a short-term benefit, has not told you anything about what happens after the study ends. An analysis of 807 open-source repositories adopting Cursor found exactly this pattern: adoption produced a large but transient increase in development velocity alongside a substantial and persistent increase in code complexity and static analysis warnings [He2026].

Treating Suggestion Acceptance Rate as a Quality Signal

LLM coding assistants commonly report what fraction of their suggestions developers accept, and higher acceptance rates are presented as evidence that the tool is useful. Acceptance measures whether the generated code looked plausible enough for a developer to press Tab; it does not measure whether the code was correct, secure, or maintainable. Developers under time pressure accept more suggestions, including insecure ones [Pearce2022], so a tight deadline makes acceptance rate rise for exactly the wrong reasons. An enterprise study of 400 developers found a 33% average acceptance rate alongside high developer satisfaction, but tracked no measure of the correctness or security of accepted code [Bakal2025]. A metric that rewards looking good enough is not a metric that rewards being good.

Comparing AI to Nothing

Studies that compare AI-assisted developers to a control group using nothing have chosen a baseline that does not exist in practice. Developers without LLM assistants use documentation, colleagues, and the time they would otherwise spend thinking through the problem themselves. The relevant question is whether LLM tools outperform the alternatives developers already have, and that comparison is rarely made [Peng2023]. Choosing a weak baseline makes any tool look good; it does not make the tool useful.

I am profoundly grateful to everyone who has taken the time over the years to explain this stuff to me. Any errors or over-simplifications in what I’ve written are entirely my fault.

Bibliography

Bakal2025
Gal Bakal, Ali Dasdan, Yaniv Katz, Michael Kaufman, and Guy Levin: “Experience with GitHub Copilot for Developer Productivity at Zoominfo.” arXiv:2501.13282, 2025.
Becker2025
Joel Becker, Nate Rush, Elizabeth Barnes, and David Rein: “Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity.” arXiv:2507.09089, 2025.
Beck2023
Kent Beck: “Measuring Developer Productivity: Real-World Examples.” Medium, 2023. https://tidyfirst.substack.com/p/measuring-developer-productivity
Dora2025
Swaroop Dora, Deven Lunkad, Naziya Aslam, S. Venkatesan, and Sandeep Kumar Shukla: “The Hidden Risks of LLM-Generated Web Application Code: A Security-Centric Evaluation of Code Generation Capabilities in Large Language Models.” arXiv:2504.20612, 2025.
Flournoy2025
John C. Flournoy, Carol S. Lee, Maggie Wu, and Catherine M. Hicks: “No Silver Bullets: Why Understanding Software Cycle Time is Messy, Not Magic.” arXiv:2503.05040, 2025.
Forsgren2021
Nicole Forsgren, Margaret-Anne Storey, Chandra Maddila, Thomas Zimmermann, Brian Houck, and Jenna Butler: “The SPACE of Developer Productivity.” ACM Queue, 19(1), 2021.
Goodhart1984
Charles Goodhart: “Problems of Monetary Management: The U.K. Experience.” In Anthony Courakis (ed.), Inflation, Depression, and Economic Policy in the West, Rowman and Littlefield, 1984.
He2026
Hao He, Courtney Miller, Shyam Agarwal, Christian Kästner, and Bogdan Vasilescu: “Speed at the Cost of Quality: How Cursor AI Increases Short-Term Velocity and Long-Term Complexity in Open-Source Projects.” Proc. MSR ‘26, 2026.
Hicks2025
Catherine M. Hicks, Carol Lee, and Kristen Foster-Marks: “The New Developer: AI Skill Threat, Identity Change & Developer Thriving in the Transition to AI-Assisted Software Development.” https://osf.io/preprints/psyarxiv/2gej5_v2, 2025, https://doi.org/10.31234/osf.io/2gej5_v2.
Liang2024
Jenny T. Liang, Chenyang Yang, and Brad A. Myers: “A Large-Scale Survey on the Usability of AI Programming Assistants: Successes and Challenges.” Proc. ICSE 2024, 2024.
Liu2026
Yue Liu, Ratnadira Widyasari, Yanjie Zhao, Ivana Clairine Irsan, Junkai Chen, and David Lo: “Debt Behind the AI Boom: A Large-Scale Empirical Study of AI-Generated Code in the Wild.” arXiv:2603.28592, 2026.
McKinsey2023
Nora Elsayed, Tarek Elhounsri, and Sven Blumberg: “Yes, You Can Measure Software Developer Productivity.” McKinsey & Company, 2023. https://www.mckinsey.com/industries/technology-media-and-telecommunications/our-insights/yes-you-can-measure-software-developer-productivity
Pearce2022
Hammond Pearce, Baleegh Ahmad, Benjamin Tan, Brendan Dolan-Gavitt, and Ramesh Karri: “Asleep at the Keyboard? Assessing the Security of GitHub Copilot’s Code Contributions.” Proc. IEEE S&P 2022, 2022, doi:10.1109/SP46214.2022.9833571.
Peng2023
Sida Peng, Eirini Kalliamvakou, Peter Cihon, and Mert Demirer: “The Impact of AI on Developer Productivity: Evidence from GitHub Copilot.” arXiv:2302.06590, 2023.
Sadowski2019
Caitlin Sadowski and Thomas Zimmermann (eds.): Rethinking Productivity in Software Engineering. Apress, 2019, 978-1484242209.
Stray2026
Viktoria Stray, Elias Goldmann Brandtzæg, Viggo Tellefsen Wivestad, Astri Barbala, and Nils Brede Moe: “Developer Productivity With and Without GitHub Copilot: A Longitudinal Mixed-Methods Case Study.” Proc. HICSS-59, 2026.
Weisz2025
Justin D. Weisz, Shraddha Kumar, Michael Muller, Karen-Ellen Browne, Arielle Goldberg, Ellice Heintze, and Shagun Bajpai: “Examining the Use and Impact of an AI Code Assistant on Developer Productivity and Experience in the Enterprise.” Proc. CHI‘25, 2025.
Xu2025
Feiyang Xu, Poonacha K. Medappa, Murat M. Tunc, Martijn Vroegindeweij, and Jan C. Fransoo: “AI-Assisted Programming Decreases the Productivity of Experienced Developers by Increasing the Technical Debt and Maintenance Burden.” arXiv:2510.10165, 2025.