

















Within just a few days, two Chinese AI providers have permanently and drastically cut their API prices per token, pushing the global price war over large language models into a new phase. Xiaomi is rolling out price cuts of up to 99 percent for its MiMo-V2.5 series starting today, May 27, 2026. DeepSeek, in parallel, is making permanent the discount campaign introduced the previous month for its flagship V4-Pro, keeping usage costs at one quarter of the original level.
Both moves target the same market segment: paying enterprise customers and developers who process billions of tokens daily and for whom the price per million tokens has become the hardest operational metric.
With the MiMo-V2.5 series, Xiaomi is positioning itself unmistakably as a price disruptor. The Chinese conglomerate, which under CEO Lei Jun intends to invest at least $8.7 billion in AI by 2029, announced several measures in a single package:
On third-party platforms such as OpenRouter, MiMo-V2.5-Pro is currently listed at $0.435 per million input tokens and $0.87 per million output tokens. The model features a context window of one million tokens and, according to Xiaomi, positions itself against the Western top tier on benchmarks such as SWE-bench Pro and ClawEval.
DeepSeek, for its part, is turning its time-limited discount campaign into a permanent state. API prices for V4-Pro remain in a range of 0.025 to 6 yuan per million tokens — approximately $0.0035 to $0.83. Before the adjustment, the level was still 0.1 to 24 yuan. The exact amount depends on whether it involves pure text input or the significantly more compute-intensive text generation.
Of interest is the parallel capital raise: the lab founded by hedge fund billionaire Liang Wenfeng is opening its cap table to external investors for the first time. According to reports from the Financial Times, Bloomberg, and the South China Morning Post, a round of three to four billion US dollars is being sought at a valuation of up to $50 billion — led by the state-owned Chinese semiconductor fund “Big Fund III,” with participation from Tencent, Alibaba, and Hillhouse. It would be the first known investment by Big Fund in a Chinese LLM provider — a political signal that Beijing is positioning DeepSeek as a national champion.
The key question for Western providers and investors is: how can Chinese providers offer prices that are a fraction of what OpenAI or Anthropic charge, without structurally running at a loss? The answer lies in a combination of three layers — hardware, software, and political economy.
1. Inference Optimization at the Software Level. In its announcement, Xiaomi reveals surprisingly openly where the lever lies. The conglomerate’s inference team has fundamentally rebuilt the KV cache architecture — the memory mechanism that retains the most important intermediate results during token generation. SGLang HiCache is used in combination with Sliding Window Attention (SWA). HiCache organizes the KV cache according to the principle of modern CPU architectures across three levels: GPU memory as L1, host memory as L2, and distributed storage as L3. According to Xiaomi, this reduces the volume of data to be transferred between memory levels to approximately one-seventh of the previous value, while the number of cacheable tokens increases by a factor of five. In practical terms, this means: for recurring requests with similar prefixes — such as in coding agents or multi-turn conversations — the model needs to recompute far less frequently.
2. Own Hardware Strategy. DeepSeek consistently relies on Huawei Ascend 950 semiconductors for V4 instead of Nvidia GPUs, which are already difficult for Chinese customers to obtain due to US export controls. The company has indicated that infrastructure costs will continue to fall once the so-called supernodes of the Ascend series are deployed more broadly in the second half of 2026. The combination of DeepSeek + Huawei is strategically regarded as the core of an independent Chinese AI stack. What began as a workaround against export restrictions is developing into structural cost arbitrage: Ascend chips are cheaper to procure in China and are billed without US margin markups.
3. Political Economy. With the entry of “Big Fund III” — should it be confirmed — DeepSeek would effectively become a state co-financed champion. This fundamentally changes the business logic: a company that does not primarily need to be profitable in the short term, but is instead meant to capture strategic market share in a geopolitically contested sector, can offer prices that are barely economically viable for purely privately financed competitors. Xiaomi, too, finances its AI division from the cash flow of a profitable consumer electronics conglomerate with announced cross-subsidized investments of $8.7 billion.
For Western providers such as OpenAI and Anthropic, the situation is becoming increasingly uncomfortable. Both companies typically charge several multiples of what DeepSeek and Xiaomi are now offering per million tokens for their top models. For pure commodity workloads — classification, translation, simple extraction — the switching barrier will continue to fall. The picture is different for complex reasoning, agent, and coding workloads, where model quality, security tooling, and enterprise integration remain the differentiating factors.
For startups and developers, the development means above all one thing in the short term: capable reasoning models are increasingly being priced like infrastructure — not like premium software. Anyone validating an idea with an AI backend today can do so at unit costs that would have been unthinkable two years ago.
In the medium to long term, the more strategic question arises: if inference costs are pushed further toward marginal cost, value creation shifts away from the pure model and toward data integration, tooling, security, and vertical expertise. That is precisely where Western providers want to defend their pricing power. Whether they succeed will also depend on how aggressively Chinese providers extend their cost advantage into Western markets — and how quickly open-source alternatives continue to close the gap on the model side.
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。