惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

S
Securelist
Schneier on Security
Schneier on Security
Cloudbric
Cloudbric
S
Security @ Cisco Blogs
Webroot Blog
Webroot Blog
Attack and Defense Labs
Attack and Defense Labs
G
GRAHAM CLULEY
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
S
Schneier on Security
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
Latest news
Latest news
C
CXSECURITY Database RSS Feed - CXSecurity.com
D
Darknet – Hacking Tools, Hacker News & Cyber Security
H
Heimdal Security Blog
I
Intezer
GbyAI
GbyAI
T
The Blog of Author Tim Ferriss
罗磊的独立博客
O
OpenAI News
D
Docker
Cisco Talos Blog
Cisco Talos Blog
S
Secure Thoughts
S
Security Affairs
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
The Last Watchdog
The Last Watchdog
L
LINUX DO - 热门话题
AI
AI
B
Blog
C
Cybersecurity and Infrastructure Security Agency CISA
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
H
Help Net Security
爱范儿
爱范儿
博客园 - 司徒正美
Scott Helme
Scott Helme
博客园_首页
Recent Commits to openclaw:main
Recent Commits to openclaw:main
Blog — PlanetScale
Blog — PlanetScale
Simon Willison's Weblog
Simon Willison's Weblog
Google DeepMind News
Google DeepMind News
N
News and Events Feed by Topic
A
About on SuperTechFans
T
Threat Research - Cisco Blogs
P
Proofpoint News Feed
Y
Y Combinator Blog
C
CERT Recently Published Vulnerability Notes
T
Tenable Blog
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
V
V2EX - 技术
The Register - Security
The Register - Security

Vercel News

Vercel Open Source Program: Winter 2026 cohort How Notion Workers run untrusted code at scale with Vercel Sandbox How we run Vercel's CDN in front of Discourse From idea to secure checkout in minutes with Stripe Building Slack agents can be easy Scaling redirects to infinity on Vercel Advancing Python typing Gamma builds design-first agents with Vercel How Avalara turns pipe dreams into patent-pending with v0 Keeping community human while scaling with agents How OpenEvidence built a healthcare AI that physicians actually trust Security boundaries in agentic architectures Skills Night: 69,000+ ways agents are getting smarter Video Generation with AI Gateway We Ralph Wiggumed WebStreams to make them 10x faster How Stably ships AI testing agents in hours, not weeks How we built AEO tracking for coding agents Anyone can build agents, but it takes a platform to run them Introducing Geist Pixel The Vercel AI Accelerator is back with $6m in credits Making agent-friendly pages with content negotiation The Vercel OSS Bug Bounty program is now available Introducing the new v0 Run untrusted code with Vercel Sandbox, now generally available How Stripe built a game-changing app in a single flight with v0 How Sensay went from zero to product in six weeks AGENTS.md outperforms skills in our agent evals Agent skills explained: An FAQ Testing if "bash is all you need" AWS databases are now live on the Vercel Marketplace and v0 Use Perplexity Web Search with Vercel AI Gateway Introducing: React Best Practices Nick Bogaty joins Vercel as Chief Revenue Officer How Mux shipped durable video workflows with their @mux/ai SDK How to build agents with filesystems and bash How we made v0 an effective coding agent Stopping the slow death of internal tools Building AI-Generated Pixel Trading Cards with Vercel AI Gateway We removed 80% of our agent’s tools AI SDK 6 Our $1 million hacker challenge for React2Shell Cline now runs on Vercel AI Gateway How to prompt v0 Build smarter workflows with Notion and v0 Vercel launches partner certification Inside Workflow DevKit: How framework integrations work React2Shell Security Bulletin | Vercel Knowledge Base Billions of requests: Black Friday-Cyber Monday 2025 Investing in the Python ecosystem AWS Databases coming to the Vercel Marketplace How we built the v0 iOS app Workflow Builder: Build your own workflow automation platform Security through design: Creating the improved Firewall experience Vercel Open Source Program: Fall 2025 cohort Self-driving infrastructure Vercel collaborates with Google for Gemini 3 Pro Preview launch Vercel: The anti-vendor-lock-in cloud How Nous Research used BotID to block automated abuse at scale How AI Gateway runs on Fluid compute What we learned building agents at Vercel Build and deploy data applications on Snowflake with v0 BotID Deep Analysis catches a sophisticated bot network in real-time Vercel Agent can now run AI investigations Vercel achieves TISAX AL2 compliance to serve automotive partners Bun runtime on Vercel Functions David Totten Joins Vercel to Lead Global Field Engineering Vercel Ship AI 2025 recap You can just ship agents AI agents and services on the Vercel Marketplace Built-in durability: Introducing Workflow Development Kit Zero-config backends on Vercel AI Cloud Introducing Vercel Agent: Your new Vercel teammate Update regarding Vercel service disruption on October 20, 2025 Agents at work, a partnership with Salesforce and Slack Running Next.js in ChatGPT: How to Build ChatGPT Apps Talha Tariq joins Vercel as CTO of Security Just another (Black) Friday Server rendering benchmarks: Fluid Compute and Cloudflare Workers Towards the AI Cloud: Our Series F Collaborating with Anthropic on Claude Sonnet 4.5 to power intelligent coding agents Preventing the stampede: Request collapsing in the Vercel CDN BotID uncovers hidden SEO poisoning How we made global routing faster with Bloom filters What you need to know about vibe coding Scale to one: How Fluid solves cold starts Addressing security & quality issues with MCP tools - Vercel AI agents at scale: Rox’s Vercel-powered revenue operating system Helly Hansen migrated to Vercel and drove 80% Black Friday growth Introducing Vercel Drains: Complete observability data, anywhere Introducing x402-mcp: Open protocol payments for MCP tools MongoDB Atlas is now available on the Vercel Marketplace The second wave of MCP: Building for LLMs, not developers A more flexible Pro plan for modern teams Critical npm supply chain attack response - September 8, 2025 Stress testing Biome's noFloatingPromises lint rule Open SDK strategy Preparing for the worst: Our core database failover test AI-powered prototyping with design systems - Vercel – Vercel AI Gateway: Production-ready reliability for your AI apps - Vercel – Vercel Rethinking prototyping, requirements, and project delivery at Code and Theory - Vercel – Vercel
Protecting against token theft
Malte UblCTO, VercelEric DoddsContent Engineer · 2026-05-29 · via Vercel News

HTTP requests are inexpensive. Vercel charges ~$2/million, a fraction of a cent per call. But a single prompt to an agent on a frontier model can cost $2, making AI a million times more expensive, and inference theft one of the highest-margin businesses an attacker can run. We have seen this type of attack on our own APIs.

If you have AI endpoints exposed to the internet, the risk of abuse is high and can easily run up bills in the tens of thousands of dollars or more.

Protecting those endpoints requires verification to run on every AI request, not on the session or signup. Rate limits and auth walls aren't sufficient on their own because checks that run once per session get amortized away across thousands of stolen calls.

At Vercel, we gate every AI request through BotID deep analysis, and you can do the same on your own endpoints with a few lines of code.

Link to headingWhat inference theft is

Inference theft is the unauthorized use of someone else's paid AI inference, either for free consumption or downstream resale. The operator pays per AI call; the attacker pays nothing for inference and then resells the tokens at a discount. This goes beyond rate-limit abuse to actual resale of a stolen resource in a market.

Link to headingWhich AI endpoints are at risk?

Any internet-facing endpoint that gives a caller meaningful control over an LLM prompt is a target. The more general the endpoint, the higher the payout per stolen call.

AI playgrounds, like the AI SDK Playground, are the most dangerous shape because the caller has maximum control over the prompt, the model, and often the parameters. Stolen calls land cleanly into any standard client.

Support bots and documentation assistants are less exposed when system prompts are fixed server-side, but attackers have learned how to talk the models around system prompts cheaply enough to make resale viable.

Resale value tracks how easily the stolen calls can be dropped into a provider-compatible client.

Link to headingWhy web defenses don't mitigate inference theft

IP rate limits and auth walls were built to defend against attacks with dramatically lower per-call economics, where gaming IPs and accounts weren't worth the cost.

The payoff from stolen inference is high enough that attackers will procure residential proxy IPs by the thousands and register throwaway accounts at whatever scale it takes to defeat your gate. Rate limits get diluted across the fleet of IP addresses, and real accounts pass authentication.

Link to headingThe architecture of abuse

Sophisticated attackers wrap your custom AI endpoint in an OpenAI- or Anthropic-compatible adapter and fan calls out through residential proxies.

The adapter is the key component. It is a one-time engineering cost that presents the victim's idiosyncratic API as OpenAI- or Anthropic-compatible, so stolen inference can drop into any standard coding agent or SDK. Reselling at even five to ten percent of the list price, with zero marginal inference cost, can make for a generous-margin business.

A recent example is Chipotlai Max, a forked coding agent that ships with a proxy turning Chipotle's customer-support chatbot into an OpenAI-compatible endpoint. The project openly solicits help in porting the same inference-theft approach to Home Depot, Lowe's, Target, and Starbucks.

The adapter also serves as the session boundary for the attacker's downstream users. They authenticate to the adapter, not to your endpoint. By the time a call hits your API, it has already crossed the boundary you were planning to defend. The check has to run on the call the adapter proxies, not on the session it sits behind.

Link to headingThe shape of a real attack on our own endpoint

On April 12, 2026, traffic to the Vercel docs AI chat endpoint spiked to roughly ten times normal volume on Anthropic's Claude Haiku 4.5 model. Traffic rose to 1,300 requests per minute at peak, which would have translated to an inference cost run rate of over ten thousand dollars per day.

The attack came in through residential proxies that obscured the real client IPs. Across hundreds of thousands of bot requests over two days, standard per-IP rate limits had nothing useful to act on.

Link to headingHow to defend against inference theft

Protecting AI endpoints against inference theft requires verification of every request. We use Vercel's BotID with deep analysis, called inside the route handler before the AI request lands.

Link to headingVerification has to run on every AI request

If our gate had run at session start instead of per request, the attacker would have paid the bypass cost once and walked away with hundreds of thousands of stolen calls. Any check that runs per session amortizes the attacker's bypass cost across every subsequent inference call. Per-request gates force that ratio down to one, and even at high inference prices, defeating a check on every call isn't worth the cost.

This is where the cost asymmetry works in the defender's favor. Inference is the most expensive resource per call that the attacker steals, but verification is one of the cheapest protection costs per call.

Link to headingImplementing request verification with BotID deep analysis

Traditional image CAPTCHAs no longer hold up against modern attackers because the same AI models that make inference worth stealing can easily bypass them.

We deploy Vercel BotID on our AI endpoints, gating every request. BotID is an invisible CAPTCHA with deep analysis powered by Kasada that uses client-side machine learning to distinguish humans from bots without a visible challenge, so it can run on every request rather than only at session start.

BotID deep analysis detected and blocked more than ten thousand bot requests in the first minutes of the spike. Within twenty-four hours, request volume on the endpoint was flat at normal levels.

Server-side, checkBotId() runs inside the route handler and returns a classification for the request currently being served.

// app/api/ai-chat/route.ts

import { checkBotId } from 'botid/server';

import { NextRequest, NextResponse } from 'next/server';

export async function POST(request: NextRequest) {

const verification = await checkBotId();

if (verification.isBot) {

return NextResponse.json({ error: 'Access denied' }, { status: 403 });

}

// Your existing AI SDK call path

}

The route also has to be declared on the client. Without this, checkBotId() fails because BotID doesn't attach the challenge headers to the request:

// instrumentation-client.ts

import { initBotId } from 'botid/client/core';

initBotId({

protect: [{ path: '/api/ai-chat', method: 'POST' }],

});

See the BotID docs for the next.config.ts wrapper and the full setup.

Link to headingProtect inference, not just access

Inference will remain orders of magnitude more expensive than the requests it carries, so resale will remain profitable, and attackers will keep iterating.

To protect your AI endpoints:

  • Audit which of your AI endpoints are exposed

  • Prioritize by attack likelihood: more caller prompt control means an easier target

  • Gate every endpoint on every request

Protect your AI endpoints with Vercel BotID

Stop bots from draining your AI budget: see how to gate your endpoints with Vercel BotID in a few steps.

Read the guide