惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

The Register - Security
The Register - Security
美团技术团队
Recent Announcements
Recent Announcements
MongoDB | Blog
MongoDB | Blog
Jina AI
Jina AI
C
Check Point Blog
aimingoo的专栏
aimingoo的专栏
I
InfoQ
S
Securelist
T
Tor Project blog
GbyAI
GbyAI
L
LINUX DO - 热门话题
V
Visual Studio Blog
AWS News Blog
AWS News Blog
The Cloudflare Blog
腾讯CDC
K
Kaspersky official blog
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
Recorded Future
Recorded Future
李成银的技术随笔
W
WeLiveSecurity
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
M
Microsoft Research Blog - Microsoft Research
G
Google Developers Blog
酷 壳 – CoolShell
酷 壳 – CoolShell
Schneier on Security
Schneier on Security
B
Blog
IT之家
IT之家
爱范儿
爱范儿
H
Help Net Security
Simon Willison's Weblog
Simon Willison's Weblog
NISL@THU
NISL@THU
J
Java Code Geeks
博客园 - 聂微东
T
The Exploit Database - CXSecurity.com
Cyberwarzone
Cyberwarzone
博客园 - 叶小钗
MyScale Blog
MyScale Blog
Application and Cybersecurity Blog
Application and Cybersecurity Blog
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Project Zero
Project Zero
F
Future of Privacy Forum
D
Darknet – Hacking Tools, Hacker News & Cyber Security
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
Hacker News: Ask HN
Hacker News: Ask HN
D
Docker
Apple Machine Learning Research
Apple Machine Learning Research
B
Blog RSS Feed
V
Vulnerabilities – Threatpost

Forbes - Innovation

The New Surgeon General Advisory On The Harms Of Screen Use— Here’s What The Science Says About Risks And Benefits Developing An Executive Cybersecurity Strategy For Directors Stop Measuring AI Spend, Start Measuring Impact AI Agents Belong In Your Identity Program How SMEs Unlock Greater Value From AI Why Small, Elite Teams Outperform Big Ones If You Value Online Security Stop Using Public Wi-Fi Hotspots Demystifying Success: A Practical Approach To Guiding Your Business Are Financial Institutions Failing To Back The Low-Carbon Economy? Airbnb CEO Brian Chesky Called Chinese AI Fast And Cheap. Now, Congress Wants Answers The Neurotech CRO: Kordata Launches To Power Next-Gen Clinical Trials Latest AI Behaves More Like Humans By Rudely Interrupting You During Conversational Chats And We Might Relish It Google I/O 2026 Turned Gemini Into An Agent Platform Advanced Packaging Leads The Way To Intel Foundry Success From AI Policies To AI Literacy In Education Today’s Wordle #1797 Hints And Answer For Thursday, May 21 NYT ‘Pips’ Hints, Answers And Walkthrough For Thursday, May 21 Meta Expands Its Creator Ecosystem With Instagram’s New Instants App 4 Factors That Strongly Influence First Impressions, By A Psychologist ‘The Boys’ Series Finale Is A Crushing Disappointment ‘Escape From Tarkov’ Icebreaker Delayed As Current Event Extended Class Of 2026 Faces A Hard Truth: AI Isn’t The Threat—Ignoring It Is What Google’s Universal Cart Launch Means For AI-Led Shopping The $150 Trillion Question—What Is AI’s Value In Asset Management A Third-Wave Philanthropy Unlocked By AI Could Supercharge Federal R&D Top Frontier AI Models Top Out At C+ ... Barely Better Than Old Models Google Wants Search To Stop Answering And Start Acting Quordle Hints Today: Thursday, May 21 Clues And Answers NYT Strands Hint Today: Thursday, May 21 Clues And Answers (In A Material World) NYT Connections Hints Today: Thursday, May 21 Clues And Answers (#1,075) NYT Connections Answers Explained For Thursday, May 21 (#1,075) Why Infrastructure Modernization Is The Real Enabler Of AI The ‘Concord’ Curse Returns With Quantic Dream’s PvP Game Axed In 3 Months ‘Obsession,’ Now Going Viral, Just Set A 17-Year Box Office Record ‘The Boys’ Series Finale Review: Last Second Salvation Inside Incyte’s $120 Million AI For Drug Development Deal Samsung Galaxy Z Fold 8: Bad Crease News, No Display Upgrade The Robot Revolution Has Officially Begun When Insurance Disappears, Economies Follow. The G7 Has A Unique Opportunity To Act Solving The Identity Crisis: Putting Today’s Fragmented Consumer Back Together Private Equity-Owned Retinal Practices Perform Fewer Retinal Detachment Procedures Some Private Equity Firms Would Rather Let People Go Blind Than Reduce Their Profits How To Mitigate The Microsoft Windows BitLocker ‘Angry Hacker’ 0-Day ‘Off Campus’ Has Set A Rotten Tomatoes Score Record For The Past Year E-Cigarettes Can Help You Quit Smoking Says New Study Netflix’s Best New No. 1 Show Lands A 90% Rotten Tomatoes Score 2 Stubborn Habits That Predict Long-Term Success, By A Psychologist The AI-Native Human: What You Must Become To Stay Relevant GitHub Says 3,800 Repositories Breached—TeamPCP Hackers Demand $50,000 Vibe Hunting: A New Way Of Threat Hunting With AI Your AI Is Getting Smarter. But Whose Intelligence Is It Building? How 24/7 Renewables Are Ending Fossil Fuel Reliability ​Behind Vertical AI: What AI Is Already Demanding Of Energy And Utilities How The ARISE Network Is Rethinking Clinical AI Meet The Star-Nosed Mole — The Fastest Hunter On The Planet It’s Time To Rethink Data Retention In Healthcare The Intelligence Infrastructure Behind AI Agents The Next Phase Of Enterprise AI: Why LLM Consolidation Is Inevitable ​The Software Coordination Tax: Why Your 40-Engineer Team Is Shipping Like 25 Beyond The ‘Build Versus Buy’ Trap: Agentic Orchestration​'s Role In The Future Of GTM Demystifying Success: A Practical Approach To Guiding Your Business The Identity Crisis Your Security Team Didn't See Coming The End Of The Server Room: What Happens When Your Cameras Start Modernizing Legacy Industries And Multi-Partner Coordination Why Pharma Boards Confuse Scenario Models With Risk Measurements Your Company Is Measuring AI Adoption Wrong. Track This Instead. China Has Outspent The U.S. On Research For The First Time. 3,375 American Scientists Are Telling Congress To Pay Attention Why Americans Are Turning Against Data Centers Climate Advisers Call For Maximum Workplace Temperature Rules In U.K. People Are Really Angry At AI Content Even If It Turns Out That AI Didn’t Produce It And The Content Was Actually Human Made Volvo EX60 Road Tested: Is This The Best Electric SUV Yet? Ugreen Packs A Punch With Its Latest Nexode And MagFlow Air Chargers Why AI Literacy Has Become A Boardroom And Investor Priority 007 First Light Early Access: How To Play Before The Release Date Ronda Rousey’s 17-Second Win Drew Staggering Netflix Viewership Ronda Rousey Fallout: UFC Veteran Rips MVP MMA 1 As 'Cringe' When Is the Next UFC? Date, Times and Full Schedule Ebola Outbreak Update: An American Doctor, A WHO Emergency, And What The New Numbers Mean Microsoft Work Trend Index 2026 Shows AI Productivity Is Not Enough Today’s Wordle #1796 Hints And Answer For Wednesday, May 20 NYT ‘Pips’ Hints, Answers And Walkthrough For Wednesday, May 20 ‘The Boys’ Season 5, Episode 8 Release Time: Here’s When The Series Finale Drops On Prime Video America Built An Ebola Response System After 2014. Here’s How It Works As Doctor Shortage Rages On, Physician Assistant Pay Hits $140,000 Google I/O Buried Google Glass — And Launched Something Better Google’s AI Smartglasses Could Challenge The App Economy How PwC Is Supporting Agentic AI Deployments World’s Biggest Humanoid Robot Maker: The Tipping Point Is Near AI Data Center Build Out Faces Infrastructure And Political Head Winds NYT Strands Hint Today: Wednesday, May 20 Clues And Answers (No Rush) Quordle Hints Today: Wednesday, May 20 Clues And Answers NYT Connections Hints Today: Wednesday, May 20 Clues And Answers (#1,074) Dell COO Says Agentic AI Is Forcing Data Center Rebuild Apple’s Upcoming Accessibility Features Show The Real Potential Of Apple Intelligence German EV Subsidies Begin And China Could Be A Big Winner Sony Marks 10 Years Of Noise-Cancelling Headphones With Premium 1000X The Collexion The Hidden Players Powering The Future Of Quantum Computing AI Security Threats Coming From Outside And Inside, And Few Are Ready Defining An Intelligent Business 41-Year-Old Father Died Of Cancer. His Widow Shares Life After Death.
​Why The Cheapest AI Stack Becomes The Most Expensive At Scale
Ben Gutkovic · 2026-05-21 · via Forbes - Innovation

Co-Founder of Superlinked, building enterprise-grade open-source inference for production-scale AI search and document processing.

getty

Most AI infrastructure projects don't fail on quality. They fail on economics. Somewhere between the first and the Nth scale events, the cost curve detaches from the usage curve. The finance team notices three months later, but by then, the team has pulled two features, deferred a third and spent weeks rebuilding the part of the stack that made the original economics work.

This silent detachment is one of the most underestimated risks in AI infrastructure, and it reshapes the product before it ever appears on the finance team's radar.

Pilot economics aren't production economics.

Teams model AI cost by taking pilot metrics and multiplying them by expected traffic. The math is clean in a spreadsheet and totally wrong in production.

Pilot traffic is narrow, repetitive and predictable in its query distribution. Production traffic, on the other hand, is wide, spiky and disproportionately expensive in the long tail. A system that costs four cents per query at pilot can cost many times that in the tail, where the most valuable queries to the business tend to live.

Treating AI infrastructure cost as a linear function of volume sits behind most AI budgeting errors, but the budget rarely breaks first. Usually, the first victim is the roadmap because the team starts making product decisions to defend the unit economics rather than to serve the user.

Average cost hides the shape underneath it.

Per-token pricing and blended monthly invoices are smooth metrics. They produce clean lines in the budget. They're also the wrong instrument for noticing the failure mode described above.

A blended number smooths over latency variance, retry cost, cold-start spikes and the true distribution of cost across query types. Teams see the invoice total, not the distribution underneath.

However, the distribution is where the business consequences live. A small fraction of queries that are slow, expensive or cold-started will drive most of the user-facing latency that matters. A reliability issue that affects a small share of volume but overlaps with the high-converting queries becomes a revenue problem, not an infrastructure problem. An averaged bill keeps that shape invisible until a team actually goes looking for it.

The roadmap usually breaks before the budget does.

The industry frames inference cost as a procurement problem. In my experience, however, the harder problem is how the cost structure affects the product decisions a team still gets to make.

When inference dominates the AI budget, the team stops shipping features that could improve the product. Embedding refresh cadence slows down. Long-context scenarios are cut. Custom models are rejected in favor of whatever fits within the catalog that an inference provider happens to support. The most expensive queries are throttled or downgraded, which is usually a euphemism for a degraded user experience.

The bill breaks the team's ability to keep innovating and improving user experience.

When does cost become a positioning problem?

In our work on open-source inference infrastructure, we saw production systems where a small fraction of query types consumed a disproportionate share of the inference budget. In one case, a company building a semantic search product spent most of its inference budget on the most complex queries. The team's first instinct was to cap query complexity, but that would've removed their primary differentiator in the category.

The cost problem had migrated into a positioning problem before anyone noticed. Inference bills had reshaped what the product could offer, and the reshape was invisible to leadership until the competitive gap showed up in a deal review.

What do the teams that see the problem early do differently?

Three actions separate teams that model AI economics correctly from teams that discover the economics through a meeting with the finance team.

1. Instrument unit cost by query class, not just aggregate spend. Latency, error rate and retry cost belong on the same dashboard as dollars. If the financial view of the system is the only view, the roadmap will surprise everyone.

2. Model cost as a function of query distribution, not volume. Most of the business value and most of the infrastructure cost live in the tail. Teams that treat the tail as an edge case discover it is not, usually when that discovery is most expensive.

3. Preserve optionality. A deployment choice that's cheap to reverse is worth more than a deployment choice that's a few percent cheaper per query. Lock-in is a liability that quietly accrues interest.

Inference is a product decision, not a procurement line.

Teams that treat AI infrastructure as a budget line eventually make product decisions to defend the budget. Teams that treat AI infrastructure as part of the product keep the optionality to make the budget work in whichever direction the product needs to go.

The cost of inference is the space of features a team can still imagine shipping. That space is a leadership decision, which shouldn't be outsourced.​


Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?