惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

S
Securelist
Schneier on Security
Schneier on Security
Cloudbric
Cloudbric
S
Security @ Cisco Blogs
Webroot Blog
Webroot Blog
Attack and Defense Labs
Attack and Defense Labs
G
GRAHAM CLULEY
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
S
Schneier on Security
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
Latest news
Latest news
C
CXSECURITY Database RSS Feed - CXSecurity.com
D
Darknet – Hacking Tools, Hacker News & Cyber Security
H
Heimdal Security Blog
I
Intezer
GbyAI
GbyAI
T
The Blog of Author Tim Ferriss
罗磊的独立博客
O
OpenAI News
D
Docker
Cisco Talos Blog
Cisco Talos Blog
S
Secure Thoughts
S
Security Affairs
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
The Last Watchdog
The Last Watchdog
L
LINUX DO - 热门话题
AI
AI
B
Blog
C
Cybersecurity and Infrastructure Security Agency CISA
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
H
Help Net Security
爱范儿
爱范儿
博客园 - 司徒正美
Scott Helme
Scott Helme
博客园_首页
Recent Commits to openclaw:main
Recent Commits to openclaw:main
Blog — PlanetScale
Blog — PlanetScale
Simon Willison's Weblog
Simon Willison's Weblog
Google DeepMind News
Google DeepMind News
N
News and Events Feed by Topic
A
About on SuperTechFans
T
Threat Research - Cisco Blogs
P
Proofpoint News Feed
Y
Y Combinator Blog
C
CERT Recently Published Vulnerability Notes
T
Tenable Blog
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
V
V2EX - 技术
The Register - Security
The Register - Security

Vercel News

Vercel Open Source Program: Winter 2026 cohort How Notion Workers run untrusted code at scale with Vercel Sandbox How we run Vercel's CDN in front of Discourse From idea to secure checkout in minutes with Stripe Building Slack agents can be easy Scaling redirects to infinity on Vercel Advancing Python typing Gamma builds design-first agents with Vercel How Avalara turns pipe dreams into patent-pending with v0 Keeping community human while scaling with agents How OpenEvidence built a healthcare AI that physicians actually trust Security boundaries in agentic architectures Skills Night: 69,000+ ways agents are getting smarter Video Generation with AI Gateway We Ralph Wiggumed WebStreams to make them 10x faster How Stably ships AI testing agents in hours, not weeks How we built AEO tracking for coding agents Anyone can build agents, but it takes a platform to run them Introducing Geist Pixel The Vercel AI Accelerator is back with $6m in credits Making agent-friendly pages with content negotiation The Vercel OSS Bug Bounty program is now available Introducing the new v0 Run untrusted code with Vercel Sandbox, now generally available How Stripe built a game-changing app in a single flight with v0 How Sensay went from zero to product in six weeks AGENTS.md outperforms skills in our agent evals Agent skills explained: An FAQ Testing if "bash is all you need" AWS databases are now live on the Vercel Marketplace and v0 Use Perplexity Web Search with Vercel AI Gateway Introducing: React Best Practices Nick Bogaty joins Vercel as Chief Revenue Officer How Mux shipped durable video workflows with their @mux/ai SDK How to build agents with filesystems and bash How we made v0 an effective coding agent Stopping the slow death of internal tools Building AI-Generated Pixel Trading Cards with Vercel AI Gateway We removed 80% of our agent’s tools AI SDK 6 Our $1 million hacker challenge for React2Shell Cline now runs on Vercel AI Gateway How to prompt v0 Build smarter workflows with Notion and v0 Vercel launches partner certification Inside Workflow DevKit: How framework integrations work React2Shell Security Bulletin | Vercel Knowledge Base Billions of requests: Black Friday-Cyber Monday 2025 Investing in the Python ecosystem AWS Databases coming to the Vercel Marketplace How we built the v0 iOS app Workflow Builder: Build your own workflow automation platform Security through design: Creating the improved Firewall experience Vercel Open Source Program: Fall 2025 cohort Self-driving infrastructure Vercel collaborates with Google for Gemini 3 Pro Preview launch Vercel: The anti-vendor-lock-in cloud How Nous Research used BotID to block automated abuse at scale How AI Gateway runs on Fluid compute What we learned building agents at Vercel Build and deploy data applications on Snowflake with v0 BotID Deep Analysis catches a sophisticated bot network in real-time Vercel Agent can now run AI investigations Vercel achieves TISAX AL2 compliance to serve automotive partners Bun runtime on Vercel Functions David Totten Joins Vercel to Lead Global Field Engineering Vercel Ship AI 2025 recap You can just ship agents AI agents and services on the Vercel Marketplace Built-in durability: Introducing Workflow Development Kit Zero-config backends on Vercel AI Cloud Introducing Vercel Agent: Your new Vercel teammate Update regarding Vercel service disruption on October 20, 2025 Agents at work, a partnership with Salesforce and Slack Running Next.js in ChatGPT: How to Build ChatGPT Apps Talha Tariq joins Vercel as CTO of Security Just another (Black) Friday Server rendering benchmarks: Fluid Compute and Cloudflare Workers Towards the AI Cloud: Our Series F Collaborating with Anthropic on Claude Sonnet 4.5 to power intelligent coding agents Preventing the stampede: Request collapsing in the Vercel CDN BotID uncovers hidden SEO poisoning How we made global routing faster with Bloom filters What you need to know about vibe coding Scale to one: How Fluid solves cold starts Addressing security & quality issues with MCP tools - Vercel AI agents at scale: Rox’s Vercel-powered revenue operating system Helly Hansen migrated to Vercel and drove 80% Black Friday growth Introducing Vercel Drains: Complete observability data, anywhere Introducing x402-mcp: Open protocol payments for MCP tools MongoDB Atlas is now available on the Vercel Marketplace The second wave of MCP: Building for LLMs, not developers A more flexible Pro plan for modern teams Critical npm supply chain attack response - September 8, 2025 Stress testing Biome's noFloatingPromises lint rule Open SDK strategy Preparing for the worst: Our core database failover test AI-powered prototyping with design systems - Vercel – Vercel AI Gateway: Production-ready reliability for your AI apps - Vercel – Vercel Rethinking prototyping, requirements, and project delivery at Code and Theory - Vercel – Vercel
AI Gateway production index
Harpreet AroraProduct, AI infrastructureYvonne ZhouProduct, Mone · 2026-05-12 · via Vercel News

Ask which AI model is best, and the answer changes before the ink dries. That's what happens in an industry where new models are released weekly.

Every benchmark measures a different race, and every race crowns its own winner, but Vercel has a unique view of the industry through production workloads. AI Gateway serves tens of trillions of tokens across hundreds of models through real applications and agents.

What we're seeing:

  • Anthropic leads in spend despite a higher unit price, Google leads in volume

  • OSS models are gaining traction, but there is no loyalty to specific labs

  • OpenAI spend share is growing quickly after recent model updates

  • High-volume workloads route across 30+ distinct models on average

  • Agentic workloads carry 59% of all token volume (up 2x over 6 months)

This report is built on data from seven months of production traffic from AI Gateway, with usage from over 200K+ unique teams.

Link to headingAnthropic leads in spend; Google leads in volume

Cost and volume rankings disagree because they measure two different workloads, even for the same customer.

By spend in April 2026, Anthropic took 61%, Google 21%, and OpenAI 12%.

By token volume, the picture flipped. 38% of April traffic through AI Gateway routed to Google, 26% to Anthropic, 13% to OpenAI, and 10% to xAI. Smaller labs split the rest.

Some models are positioned to win by being cheap enough per token to carry huge volume, while others are priced high enough to make sense only for quality-critical work. The different models are not competing for the same call. In aggregate the same customer base sits on both leaderboards, with premium reasoning calls landing on Claude Opus and cheap fast calls landing on Gemini Flash. Spend follows the high-stakes calls, and volume follows the low-stakes ones, with the labs each holding a different layer of the same applications.

Volume-vs-spend also changes quickly at the lab level. A few specific signals:

  • Gemini Flash helped Google take the lead on volume at a smaller share of spend

  • Claude Opus helps Anthropic lead on spend with less volume than Google

  • OpenAI's spend share tripled from March to April after the GPT-5.4/5.5 releases

  • Google's spend share climbed from 8% in March to 21% in April as Gemini Flash usage scaled

Link to headingSpend follows the cost of being wrong

The same cost/volume divide exists at a finer grain inside specific kinds workloads:

  • Personal assistants account for 20% of cost on 40% of token volume

  • Coding agents sit roughly balanced at 22% of cost on 20% of tokens

  • Back office agents run at 6% of cost on 15% of tokens

  • App generation runs at 7% of cost on 11% of tokens

What a workload spends per token is a function of how expensive a wrong answer is to the use case. Personal assistants can run on cheap, fast models because mistakes only impact individual users and are quickly corrected. Back-office workflows pay for stronger reasoning because errors can trigger legal, financial, or operational risks that outweigh the per-call savings. The per-token economics are a stake map: applications spend more per token when mistakes cost more.

The same pattern holds in a broader B2C/B2B split. B2C applications generate many low-cost calls, while B2B applications run fewer, more expensive ones. On a per-token basis, B2B costs roughly two times as much as B2C.

Link to headingNo single provider wins across use cases

Cutting the data by use case shows a fragmented provider landscape:

  • Anthropic notably leads in software building

  • Google over-indexes in consumer

  • OpenAI is the most evenly distributed

  • xAI and others are split across coding, consumer, and long-tail use cases

Anthropic's pattern is concentration at the high-stakes layer. As the workload moves from back office to consumer, Anthropic's token share drops from 71% down to 7%. Its cost share follows a much shallower curve and keeps the lead through three of the four categories. The revenue concentrates wherever the answer has to be right, regardless of how much volume passes through.

Google is the inverse shape. Its footprint concentrates in consumer, where Gemini Flash carries 28% of tokens at 15% of cost, and barely appears on the cost chart outside it. The position is a single-SKU bet that rises and falls with Flash adoption.

xAI is a price wedge. Grok carries 20% of building tokens and 18% of outreach tokens at materially smaller cost shares in each. xAI wins on price-to-quality fit, and whoever matches the price closes the wedge.

OpenAI is the most balanced of the four at 6% of building cost, 18% of consumer cost, and 28% of outreach cost. No single layer is load-bearing for OpenAI's overall share, which makes the company the least exposed of the four to disruption in any one layer.

Open-weights families like Kimi, MiniMax, and GLM rotate through the consumer and building tiers where the cost ceiling is lowest. Their cost share stays small, and their token share inside consumer and building is large enough that any cost-only view of the market understates them.

There is no single dominant provider across the whole market because there is no single dominant use case. The right question is not "Who is winning AI?", it is "Which models are winning the use case I care about?" The labs that look closest to even on a blended chart are competing for different layers of the same stack.

Link to headingApps are becoming more agentic

The shape of production AI requests has changed underneath all of this. In April 2026, 22.2% of AI Gateway requests ended with a tool call, up from 11.4% in October 2025. Measured by tokens, the shift is bigger. 58.9% of all tokens are now in tool-call requests, up from 31.6% six months ago.

By both measures the agentic share roughly doubled in half a year, but the more telling number is the gap between the two shares. 22.2% of requests carry 58.9% of tokens, which means tool-using requests are about 2.6× more token-heavy than the rest. The cost surface of AI has shifted from chat-shaped to agent-shaped, while headline request counts barely budged.

Every kind of round trip bills against the same meter, whether it's a function execution, an API call, a database query, or a code run, so an agent shipping ten tool calls bills roughly ten times the tokens a chat would. Where a chat bills one round trip per prompt, an agent bills a chain.

Link to headingLeaderboards rank one model, but production teams use 35+ at scale

At scale, multi-model stops being a choice and becomes standard agent architecture.

Teams running 1K to 10K requests averaged 3 distinct models. By the 10M+ requests bucket, the average is 35 models in regular use. The jump from 18 models in the 1M to 10M bucket to 35 in the 10M+ bucket is the inflection point.

A 35-model fleet runs as a routing graph, with a cheap classifier for intent detection, a frontier model for the reasoning step, an embedding model for retrieval, a fast model for summarization, and a vision model for screenshots. Every one of those models is swappable. If a provider raises prices, degrades quality, or has an outage, traffic redistributes across the rest in hours. At the scale that produces most of the spend on the leaderboards, switching between labs is closer to a config change than to a vendor migration, and the standard story about lab lock-in inverts the higher you go on the request-volume curve.

Link to headingNew models are adopted rapidly

The same fleet design explains how fast new releases get absorbed. When a new version ships inside a model family, traffic moves to it within weeks.

Claude Sonnet 4.6 absorbed most of the Sonnet family's share by its first full month after launch.

The Opus family is moving through the same shape now, with Claude Opus 4.7 taking share from Opus 4.6 on a near-identical curve.

Predecessor models stayed live and routable on AI Gateway throughout both windows, but teams moved anyway. The migration is a config change, and the labs no longer set the upgrade timeline of their own product lines.

Link to headingProvider outages have a hidden cost

Roughly 3.5% of requests on AI Gateway complete after a fallback. That means the initial route hit an error, a rate limit, or a timeout, and the gateway reissued the request to a healthy alternative fast enough that the user still got a successful response.

Measured in tokens the rescue rate runs at 5.1%, and in dollars at 4.9%. The token-weighted and cost-weighted rates run higher than the request-weighted rate because the requests that get rescued are, on average, bigger and more expensive than the ones that don't. Long context windows hit rate limits more often than short ones, multi-step agent runs accumulate failure across steps, and heavy reasoning calls time out under sustained load. Each of those failure modes targets the expensive end of the workload, which is why the dollar rate sits higher than the request rate.

A provider's SLA measures request-level uptime, but a production application experiences cost-weighted uptime, and the two come apart on exactly the calls that paid for the model.

Link to headingConclusion: Build for workload, not the lab

Production workloads are designed for efficiency, reliability, and flexibility, not to match the latest model leaderboards.

Across six cuts of the same data, the shape underneath stays the same. Different labs win different layers of the same applications, and the architecture that handles those layers is the one production teams at scale have already built for.

This echoes the early cloud era. Teams expanded compute first (more instances, regions, redundancy) and squeezed per-unit cost later. The 35-model fleets visible at the top of the spend curve are the same patter at a faster cadence; the optimization that follows happens at the routing layer.

For anyone shipping AI today:

  • Plan for multiple models across providers

  • Assume the need for fallbacks to optimize for uptime and cost

  • Design routing as a core unit of architecture from the beginning

We expect to revisit this data on a recurring cadence as the patterns shift. Live model rankings are available on the AI Gateway Leaderboards.

Link to headingAbout this data

This analysis is based on anonymized, aggregate routing data from the Vercel AI Gateway through April 2026.

A few notes on measurement:

  • Spend uses market-rate pricing (published list price) to provide a normalized view across teams that bring their own API keys.

  • Volume counts tokens routed through AI Gateway.

  • B2C, B2B, and use-case classifications are aggregate. No individual team or workload is identified.