惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

S
Securelist
Schneier on Security
Schneier on Security
Cloudbric
Cloudbric
S
Security @ Cisco Blogs
Webroot Blog
Webroot Blog
Attack and Defense Labs
Attack and Defense Labs
G
GRAHAM CLULEY
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
S
Schneier on Security
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
Latest news
Latest news
C
CXSECURITY Database RSS Feed - CXSecurity.com
D
Darknet – Hacking Tools, Hacker News & Cyber Security
H
Heimdal Security Blog
I
Intezer
GbyAI
GbyAI
T
The Blog of Author Tim Ferriss
罗磊的独立博客
O
OpenAI News
D
Docker
Cisco Talos Blog
Cisco Talos Blog
S
Secure Thoughts
S
Security Affairs
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
The Last Watchdog
The Last Watchdog
L
LINUX DO - 热门话题
AI
AI
B
Blog
C
Cybersecurity and Infrastructure Security Agency CISA
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
H
Help Net Security
爱范儿
爱范儿
博客园 - 司徒正美
Scott Helme
Scott Helme
博客园_首页
Recent Commits to openclaw:main
Recent Commits to openclaw:main
Blog — PlanetScale
Blog — PlanetScale
Simon Willison's Weblog
Simon Willison's Weblog
Google DeepMind News
Google DeepMind News
N
News and Events Feed by Topic
A
About on SuperTechFans
T
Threat Research - Cisco Blogs
P
Proofpoint News Feed
Y
Y Combinator Blog
C
CERT Recently Published Vulnerability Notes
T
Tenable Blog
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
V
V2EX - 技术
The Register - Security
The Register - Security

Vercel News

Vercel Open Source Program: Winter 2026 cohort How Notion Workers run untrusted code at scale with Vercel Sandbox How we run Vercel's CDN in front of Discourse From idea to secure checkout in minutes with Stripe Building Slack agents can be easy Scaling redirects to infinity on Vercel Advancing Python typing Gamma builds design-first agents with Vercel How Avalara turns pipe dreams into patent-pending with v0 Keeping community human while scaling with agents How OpenEvidence built a healthcare AI that physicians actually trust Security boundaries in agentic architectures Skills Night: 69,000+ ways agents are getting smarter Video Generation with AI Gateway We Ralph Wiggumed WebStreams to make them 10x faster How Stably ships AI testing agents in hours, not weeks How we built AEO tracking for coding agents Anyone can build agents, but it takes a platform to run them Introducing Geist Pixel The Vercel AI Accelerator is back with $6m in credits Making agent-friendly pages with content negotiation The Vercel OSS Bug Bounty program is now available Introducing the new v0 Run untrusted code with Vercel Sandbox, now generally available How Stripe built a game-changing app in a single flight with v0 How Sensay went from zero to product in six weeks AGENTS.md outperforms skills in our agent evals Agent skills explained: An FAQ Testing if "bash is all you need" AWS databases are now live on the Vercel Marketplace and v0 Use Perplexity Web Search with Vercel AI Gateway Introducing: React Best Practices Nick Bogaty joins Vercel as Chief Revenue Officer How Mux shipped durable video workflows with their @mux/ai SDK How to build agents with filesystems and bash How we made v0 an effective coding agent Stopping the slow death of internal tools Building AI-Generated Pixel Trading Cards with Vercel AI Gateway We removed 80% of our agent’s tools AI SDK 6 Our $1 million hacker challenge for React2Shell Cline now runs on Vercel AI Gateway How to prompt v0 Build smarter workflows with Notion and v0 Vercel launches partner certification Inside Workflow DevKit: How framework integrations work React2Shell Security Bulletin | Vercel Knowledge Base Billions of requests: Black Friday-Cyber Monday 2025 Investing in the Python ecosystem AWS Databases coming to the Vercel Marketplace How we built the v0 iOS app Workflow Builder: Build your own workflow automation platform Security through design: Creating the improved Firewall experience Vercel Open Source Program: Fall 2025 cohort Self-driving infrastructure Vercel collaborates with Google for Gemini 3 Pro Preview launch Vercel: The anti-vendor-lock-in cloud How Nous Research used BotID to block automated abuse at scale How AI Gateway runs on Fluid compute What we learned building agents at Vercel Build and deploy data applications on Snowflake with v0 BotID Deep Analysis catches a sophisticated bot network in real-time Vercel Agent can now run AI investigations Vercel achieves TISAX AL2 compliance to serve automotive partners Bun runtime on Vercel Functions David Totten Joins Vercel to Lead Global Field Engineering Vercel Ship AI 2025 recap You can just ship agents AI agents and services on the Vercel Marketplace Built-in durability: Introducing Workflow Development Kit Zero-config backends on Vercel AI Cloud Introducing Vercel Agent: Your new Vercel teammate Update regarding Vercel service disruption on October 20, 2025 Agents at work, a partnership with Salesforce and Slack Running Next.js in ChatGPT: How to Build ChatGPT Apps Talha Tariq joins Vercel as CTO of Security Just another (Black) Friday Server rendering benchmarks: Fluid Compute and Cloudflare Workers Towards the AI Cloud: Our Series F Collaborating with Anthropic on Claude Sonnet 4.5 to power intelligent coding agents Preventing the stampede: Request collapsing in the Vercel CDN BotID uncovers hidden SEO poisoning How we made global routing faster with Bloom filters What you need to know about vibe coding Scale to one: How Fluid solves cold starts Addressing security & quality issues with MCP tools - Vercel AI agents at scale: Rox’s Vercel-powered revenue operating system Helly Hansen migrated to Vercel and drove 80% Black Friday growth Introducing Vercel Drains: Complete observability data, anywhere Introducing x402-mcp: Open protocol payments for MCP tools MongoDB Atlas is now available on the Vercel Marketplace The second wave of MCP: Building for LLMs, not developers A more flexible Pro plan for modern teams Critical npm supply chain attack response - September 8, 2025 Stress testing Biome's noFloatingPromises lint rule Open SDK strategy Preparing for the worst: Our core database failover test AI-powered prototyping with design systems - Vercel – Vercel AI Gateway: Production-ready reliability for your AI apps - Vercel – Vercel Rethinking prototyping, requirements, and project delivery at Code and Theory - Vercel – Vercel
DeepSeek enters the fight for token volume, Anthropic continues to dominate spend
Authors · 2026-06-08 · via Vercel News

Every month, AI Gateway routes tens of trillions of tokens between production applications and AI labs, giving us visibility into what AI usage actually looks like, separate from leaderboards and benchmarks. We publish the data monthly in the AI Gateway production index.

Link to headingMay 2026 summary

  • Total AI Gateway tokens grew +20% MoM; total spend grew +43% MoM. Customers paid almost 20% more per token on average than in April.

  • DeepSeek’s share of tokens jumped from under 1% to 17% in a single month, while its share of spend stayed near 1%.

  • Anthropic’s share of spend grew from 61% to 65% in May, holding 70–80% of spend across every high-stakes use case (AI app generation, back office agents, and coding agents).

  • Cost-consciousness meant smarter routing between low-cost and frontier models. Customers got more deliberate about which model did which work, while overall usage kept climbing.

Last month, headlines about blown token budgets dominated tech news: Uber burned through its annual Claude Code budget shortly after Q1 and Amazon shut down KiroRank to curb unproductive tokenmaxxing. While runaway cost is a real problem, this month’s report shows that spend on production use cases still increased.

Two insights emerged from AI Gateway data in May:

  • Low-cost models entered production: New models shipped at price points that made the established labs look even more expensive, and they are capable enough to enter the mix in production.

  • Spend is increasing, but with smarter model mixes: Teams are still increasing token budgets, but they are implementing smarter routing strategies to get more value out of every dollar.

Link to headingLow-cost models saw significant production volume for the first time

From February to April, volume distribution across labs on AI Gateway changed slowly, but in May, DeepSeek V4's launch completely shifted token share. The low-cost end of the market that barely existed in April became AI Gateway’s third-largest provider by volume in May, without a significant impact on overall spend.

In April, DeepSeek accounted for less than 1% of AI Gateway tokens and less than 0.2% of spend. In May, its volume share jumped to 17% of tokens, putting it in third place, ahead of OpenAI. Almost all of the volume comes from two models: deepseek/deepseek-v4-flash and deepseek/deepseek-v4-pro, both released in May.

The spend picture tells the other half of the story. Even though DeepSeek’s token share grew to 17% in a single month, its cost share stayed near 1%.

DeepSeek V4 Flash launched at $0.14 input / $0.28 output per million tokens, roughly 20–50× lower than comparable Anthropic models and 8–12× lower than other value-tier flagships like Qwen 3.6 Plus and Kimi K2.6. With a savings gap that big, teams adopted V4 Flash quickly.

Price alone wouldn’t have shifted DeepSeek’s volume that much in a month, meaning teams testing DeepSeek V4 against their existing evals found the output good enough to ship, not just low-cost enough to try.

Value-tier models have always been available on AI Gateway, but have never captured token share at this scale, meaning DeepSeek V4 was the first model at its price point to clear the quality bar for production work.

Link to headingFrontier labs continued to capture a majority of new spend

Even as the low-cost end of the market grew fastest in volume, the expensive end grew faster in dollars.

Anthropic’s token share grew from 26% to 32%, and its spend share from 61% to 65%. OpenAI’s token share held near 13%, but its spend share ticked up from 12% to 13% on a much larger total, so customers were paying more per OpenAI token in May.

The average token got more expensive in May, even with DeepSeek pulling the average down. That increase happened because the work that demands frontier models grew faster than the work that doesn’t. The AI coding agent use case shows the low-cost/frontier split most clearly:

  • DeepSeek drove 49% of the segment’s token volume, but only 4% of the cost.

  • Anthropic drove 28% of tokens and 70% of the cost.

Lower-cost models are now a meaningful part of production workflows, but frontier model use is still growing, driving the increase in overall spend.

The frontier is getting more expensive per token, and customers are still paying. Anthropic continues to lead on spend, taking 65% of all gateway spend in May, and 70–80% of spend across every high-stakes use case.

Link to headingCost discipline became a routing strategy

Increased overall spend showed that demand for AI continued to grow in May, but teams applied more precision to their budgets through routing. They sent the cheap, high-volume work to lower-priced models and used frontier models where quality mattered most. Slow adoption of Google's latest Flash model is a clear example.

Gemini 3.5 Flash launched in May at a higher price point than Gemini 3.0 Flash, but migration didn’t happen at scale. By month-end, 3.5 held only 7% of the Flash family’s tokens while 3.0 held 90%.

Compared to the rapid adoption of Gemini 3.1 Pro across February and March, slower migration to 3.5 Flash shows that teams happy with 3.0 Flash aren't willing to pay the higher cost yet.

Link to headingConclusion: Cost-effective, capable options mean smarter model mixes

This month's report signals increased pricing sensitivity in the market, even as overall spend and token volume grow. That means developers are looking for ways to get more out of every dollar.

Data revealed two optimization strategies:

  1. Using DeepSeek's cheap, but capable V4 family for lower-risk, high-volume tasks

  2. Choosing to delay model family upgrades until the ROI makes sense

Routing gives teams the ability to adjust their model mix, and budget, in real time as the labs compete for different layers of production AI workloads.

Link to headingAppendix

Token vs cost share by B2B classification

B2B applications run fewer, more expensive calls, while B2C applications run many cheap ones. On a per-token basis, B2B cost roughly 60% more than B2C in May.

Link to headingAgent tool use across tokens and requests

Just under a quarter of requests end in a tool call, but those requests carry well over half of all tokens. Both metrics are roughly flat month-over-month.

Link to headingModel diversity distribution by request volume

The more requests an app serves, the more models it runs in production. Single-model setups dominate the lowest-volume tier, while at 1M+ requests the majority of apps route across 11 or more models.

Cost vs volume share by use case

Use case cost share indicates how expensive a wrong answer is, not how many tokens it burns. Personal assistants and coding agents run cheap per token, while back-office and recruiting work costs far more.

Link to headingPrevious reports

Read the April 2026 AI Gateway production index.

Link to headingAbout this data

This analysis is based on anonymized, aggregate routing data from the Vercel AI Gateway through May 2026.

A few notes on measurement:

  • Spend uses market-rate pricing (published list price) to provide a normalized view across teams that bring their own API keys.

  • Volume counts tokens routed through AI Gateway.

  • B2C, B2B, and use-case classifications are aggregate. No individual team or workload is identified.