


















Business news |
By Brian Tristam Williams
AI token costs are becoming harder to treat as a rounding error, as agentic coding tools and enterprise AI workflows push usage from simple prompts into long, multi-step inference jobs.
Goldman Sachs Research says agentic AI could drive a 24-fold increase in token consumption by 2030, reaching 120 quadrillion tokens per month as consumer and enterprise adoption grows. The bank’s analysis, published earlier this month, argues that the same trend could improve the economics of hyperscalers and model providers if inference costs keep falling faster than demand rises.
The problem for customers is that lower unit costs do not automatically mean lower bills. Agentic tools can call models repeatedly, review context, generate code, run checks, and revise their own output. That turns a single developer request into a chain of token-consuming actions. This is why token-based billing is becoming a practical issue for engineering organisations rather than a narrow cloud-infrastructure concern.
Uber has become one of the more visible examples. The company is reassessing parts of its AI spending after reports that its 2026 AI budget had been exhausted within the first few months of the year. Uber president and COO Andrew Macdonald has said the company does not yet see a clear link between higher token consumption and more useful consumer-facing features. That does not mean the tools are useless, but it does make the cost-benefit argument less automatic.
Microsoft is facing a related issue inside its own engineering operations. The company is reportedly winding down most internal Claude Code licences for parts of its Experiences + Devices group and steering developers towards GitHub Copilot CLI by the end of June. Separately, GitHub has announced that Copilot plans will move to usage-based billing from 1 June 2026, with GitHub AI Credits consumed according to token usage across input, output, and cached tokens.
The Goldman Sachs view is not simply bearish. It expects semiconductor providers to cut inference cost per token by 60% to 70% per year through chip and architecture improvements. It also expects chip supply to remain constrained for the next 12 to 18 months as production capacity catches up with the pace of new AI use cases.
That makes the story relevant well beyond software procurement. If agents become a default interface for coding, customer service, search, and enterprise workflow automation, the load shifts back into datacentre silicon, networking, memory, storage, and power infrastructure. As previously reported by eeNews Europe when ARM set out its datacentre CPU plans, agentic AI is already being used to justify new processor strategies for AI datacentres.
The immediate lesson is more prosaic. Businesses are being pushed to measure AI against shipped features, resolved support cases, reduced engineering time, or revenue impact, not against token volume. AI token costs may fall at the hardware level, but agentic workflows can easily spend the savings before finance teams see them.
If you enjoyed this article, you will like the following ones: don't miss them by subscribing to :
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。