惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

美团技术团队
T
Troy Hunt's Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
S
Schneier on Security
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
Cisco Talos Blog
Cisco Talos Blog
AWS News Blog
AWS News Blog
NISL@THU
NISL@THU
The Hacker News
The Hacker News
Know Your Adversary
Know Your Adversary
L
Lohrmann on Cybersecurity
SecWiki News
SecWiki News
S
Security Affairs
www.infosecurity-magazine.com
www.infosecurity-magazine.com
Help Net Security
Help Net Security
L
LINUX DO - 热门话题
Application and Cybersecurity Blog
Application and Cybersecurity Blog
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
I
Intezer
S
Secure Thoughts
罗磊的独立博客
Attack and Defense Labs
Attack and Defense Labs
G
GRAHAM CLULEY
Exploit-DB.com RSS Feed
Exploit-DB.com RSS Feed
博客园_首页
Cyberwarzone
Cyberwarzone
IT之家
IT之家
T
Threatpost
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
The Cloudflare Blog
博客园 - 叶小钗
Cloudbric
Cloudbric
量子位
Scott Helme
Scott Helme
N
News | PayPal Newsroom
L
LINUX DO - 最新话题
O
OpenAI News
C
Cyber Attacks, Cyber Crime and Cyber Security
Security Archives - TechRepublic
Security Archives - TechRepublic
C
Cybersecurity and Infrastructure Security Agency CISA
J
Java Code Geeks
有赞技术团队
有赞技术团队
月光博客
月光博客
大猫的无限游戏
大猫的无限游戏
W
WeLiveSecurity
宝玉的分享
宝玉的分享
P
Privacy International News Feed
A
Arctic Wolf
K
KPMG report finds enterprise disconnect between AI and its ROI | CIO
云风的 BLOG
云风的 BLOG

Vercel News

Vercel Open Source Program: Winter 2026 cohort How Notion Workers run untrusted code at scale with Vercel Sandbox How we run Vercel's CDN in front of Discourse From idea to secure checkout in minutes with Stripe Building Slack agents can be easy Scaling redirects to infinity on Vercel Advancing Python typing Gamma builds design-first agents with Vercel How Avalara turns pipe dreams into patent-pending with v0 Keeping community human while scaling with agents How OpenEvidence built a healthcare AI that physicians actually trust Security boundaries in agentic architectures Skills Night: 69,000+ ways agents are getting smarter Video Generation with AI Gateway We Ralph Wiggumed WebStreams to make them 10x faster How Stably ships AI testing agents in hours, not weeks Anyone can build agents, but it takes a platform to run them Introducing Geist Pixel The Vercel AI Accelerator is back with $6m in credits Making agent-friendly pages with content negotiation The Vercel OSS Bug Bounty program is now available Introducing the new v0 Run untrusted code with Vercel Sandbox, now generally available How Stripe built a game-changing app in a single flight with v0 How Sensay went from zero to product in six weeks AGENTS.md outperforms skills in our agent evals Agent skills explained: An FAQ Testing if "bash is all you need" AWS databases are now live on the Vercel Marketplace and v0 Use Perplexity Web Search with Vercel AI Gateway Introducing: React Best Practices Nick Bogaty joins Vercel as Chief Revenue Officer How Mux shipped durable video workflows with their @mux/ai SDK How to build agents with filesystems and bash How we made v0 an effective coding agent Stopping the slow death of internal tools Building AI-Generated Pixel Trading Cards with Vercel AI Gateway We removed 80% of our agent’s tools AI SDK 6 Our $1 million hacker challenge for React2Shell Cline now runs on Vercel AI Gateway How to prompt v0 Build smarter workflows with Notion and v0 Vercel launches partner certification Inside Workflow DevKit: How framework integrations work React2Shell Security Bulletin | Vercel Knowledge Base Billions of requests: Black Friday-Cyber Monday 2025 Investing in the Python ecosystem AWS Databases coming to the Vercel Marketplace How we built the v0 iOS app Workflow Builder: Build your own workflow automation platform Vercel Open Source Program: Fall 2025 cohort Self-driving infrastructure Vercel collaborates with Google for Gemini 3 Pro Preview launch Vercel: The anti-vendor-lock-in cloud How Nous Research used BotID to block automated abuse at scale How AI Gateway runs on Fluid compute What we learned building agents at Vercel Build and deploy data applications on Snowflake with v0 BotID Deep Analysis catches a sophisticated bot network in real-time Vercel achieves TISAX AL2 compliance to serve automotive partners Bun runtime on Vercel Functions David Totten Joins Vercel to Lead Global Field Engineering Vercel Ship AI 2025 recap You can just ship agents AI agents and services on the Vercel Marketplace Built-in durability: Introducing Workflow Development Kit Zero-config backends on Vercel AI Cloud Introducing Vercel Agent: Your new Vercel teammate Update regarding Vercel service disruption on October 20, 2025 Agents at work, a partnership with Salesforce and Slack Running Next.js in ChatGPT: How to Build ChatGPT Apps Talha Tariq joins Vercel as CTO of Security Just another (Black) Friday Server rendering benchmarks: Fluid Compute and Cloudflare Workers Towards the AI Cloud: Our Series F Collaborating with Anthropic on Claude Sonnet 4.5 to power intelligent coding agents Preventing the stampede: Request collapsing in the Vercel CDN BotID uncovers hidden SEO poisoning How we made global routing faster with Bloom filters What you need to know about vibe coding Scale to one: How Fluid solves cold starts Addressing security & quality issues with MCP tools - Vercel AI agents at scale: Rox’s Vercel-powered revenue operating system Helly Hansen migrated to Vercel and drove 80% Black Friday growth Agentic Infrastructure Zero Data Retention on AI Gateway Optimizing Vercel Sandbox snapshots How Waldium made a blog platform work for humans and AI alike How FLORA shipped a creative agent on Vercel's AI stack Agent responsibly Making Turborepo 96% faster with agents, sandboxes, and humans Unified reporting for all AI Gateway usage new.website joins forces with v0 SERHANT.'s playbook for rapid AI iteration Two startups at global scale without DevOps Chat SDK brings agents to your users 360 billion tokens, 3 million customers, 6 engineers Meet the 2026 Vercel AI Accelerator Cohort Build knowledge agents without embeddings
How we built AEO tracking for coding agents
Eric DoddsContent EngineerAllen ZhouDX Engineer, Vercel · 2026-02-09 · via Vercel News

AI has changed the way that people find information. For businesses, this means it's critical to understand how LLMs search for and summarize their web content.

We're building an AI Engine Optimization (AEO) system to track how models discover, interpret, and reference Vercel and our sites.

For end users on our marketing team, responses are consistently formatted across coding agents. For end users on our marketing team, responses are consistently formatted across coding agents.

For end users on our marketing team, responses are consistently formatted across coding agents.

This started as a prototype focused only on standard chat models, but we quickly realized that wasn’t enough. To get a complete picture of visibility, we needed to track coding agents.

For standard models, tracking is relatively straightforward. We use AI Gateway to send prompts to dozens of popular models (e.g. GPT, Gemini, and Claude) and analyze their responses, search behavior, and cited sources.

Coding agents, however, behave very differently. Many Vercel users interact with AI through their terminal or IDE while actively working on projects. In early sampling, we found that coding agents perform web searches in roughly 20% of prompts. Because these searches happen inline with real development workflows, it’s especially important to evaluate both response quality and source accuracy.

Measuring AEO for coding agents requires a different approach than model-only testing. Coding agents aren’t designed to answer a single API call. They’re built to operate inside a project and expect a full development environment, including a filesystem, shell access, and package managers.

That creates a new set of challenges:

  1. Execution isolation: How do you safely run an autonomous agent that can execute arbitrary code?

  2. Observability: How do you capture what the agent did when each agent has its own transcript format, tool-calling conventions, and output structure?

Link to headingThe coding agent AEO lifecycle

Coding agents are typically accessed at some level through CLIs rather than APIs. Even if you’re only sending prompts and capturing responses, the CLI still needs to be installed and executed in a full runtime environment.

Vercel Sandbox solves this by providing ephemeral Linux MicroVMs that spin up in seconds. Each agent run gets its own sandbox and follows the same six-step lifecycle, regardless of the CLI it uses.

  1. Create the sandbox. Spin up a fresh MicroVM with the right runtime (Node 24, Python 3.13, etc.) and a timeout. The timeout is a hard ceiling, so if the agent hangs or loops, the sandbox kills it.

  2. Install the agent CLI. Each agent ships as an npm package (i.e., @anthropic-ai/claude-code, @openai/codex, etc.). The sandbox installs it globally so it's available as a shell command.

  3. Inject credentials. Instead of giving each agent a direct provider API key, we set environment variables that route all LLM calls through Vercel AI Gateway. This gives us unified logging, rate limiting, and cost tracking across every agent, even though each agent uses a different underlying provider (though the system allows direct provider keys as well).

  4. Run the agent with the prompt. This is the only step that differs per agent. Each CLI has its own invocation pattern, flags, and config format. But from the sandbox's perspective, it's just a shell command.

  5. Capture the transcript. After the agent finishes, we extract a record of what it did, including which tools it called, whether it searched the web, and what it recommended in the response. This is agent-specific (covered below).

  6. Tear down. Stop the sandbox. If anything went wrong, the catch block ensures the sandbox is stopped anyway so we don't leak resources.

In the code, the lifecycle looks like this.

import { Sandbox } from "@vercel/sandbox";

// Step 1: Create the sandbox

sandbox = await Sandbox.create({

resources: { vcpus: 2 },

timeout: 10 * 60 * 1000

});

// Step 2: Install the agent CLI

for (const setupCmd of agent.setupCommands) {

await sandbox.runCommand("sh", ["-c", setupCmd]);

}

// Step 3: Inject AI Gateway credentials (via env vars in step 4)

// Step 4: Run the agent

const fullCommand = `AI_GATEWAY_API_KEY='${aiGatewayKey}' ${agent.command}`;

const result = await sandbox.runCommand("sh", ["-c", fullCommand]);

// Step 5: Capture transcript (agent-specific — see next section)

// Step 6: Tear down

await sandbox.stop();

Link to headingAgents as config

Because the lifecycle is uniform, each agent can be defined as a simple config object. Adding a new agent to the system means adding a new entry, and the sandbox orchestration handles everything else.

export const AGENTS: Agent[] = [

{

id: "anthropic/claude-code",

name: "Claude Code",

setupCommands: ["npm install -g @anthropic-ai/claude-code"],

buildCommand: (prompt) => `echo '${prompt}' | claude --print`,

},

{

id: "openai/codex",

name: "OpenAI Codex",

setupCommands: ["npm install -g @openai/codex"],

buildCommand: (prompt) => `codex exec -y -S '${prompt}'`,

},

];

runtime determines the base image for the MicroVM. Most agents run on Node, but the system supports Python runtimes too.

setupCommands is an array because some agents need more than a global install. For example, Codex also needs a TOML config file written to ~/.codex/config.toml.

buildCommand is a function that takes the prompt and returns the shell command to run. Each agent's CLI has its own flags and invocation style.

Link to headingUsing the AI Gateway for routing

We wanted to use the AI Gateway to centralize management of cost and logs. This required overriding the provider’s base URLs via environment variables inside the sandbox. The agents themselves don’t know this is happening and operate as if they are talking directly to their provider.

Here’s what this looks like for Claude Code:

const claudeResult = await sandbox.runCommand(

'claude',

['-p', '-m', options.model, '-y', options.prompt]

{

env: {

ANTHROPIC_BASE_URL: AI_GATEWAY.baseUrl,

ANTHROPIC_AUTH_TOKEN: options.apiKey,

ANTHROPIC_API_KEY: '', // intentionally blank as AI Gateway handles auth

},

}

);

ANTHROPIC_BASE_URL points to AI Gateway instead of api.anthropic.com. The agent's HTTP calls go to Gateway, which proxies them to Anthropic.

ANTHROPIC_API_KEY is set to empty string on purpose — Gateway authenticates via its own token, so the agent doesn't need (or have) a direct provider key.

This same pattern works for Codex (override OPENAI_BASE_URL) and any other agent that respects a base URL environment variable. Provider API credentials can also be used directly.

Link to headingThe transcript format problem

Once an agent finishes running in its sandbox, we have a raw transcript, which is a record of everything it did.

The problem is that each agent produces them in a different format. Claude Code writes JSONL files to disk. Codex streams JSON to stdout. OpenCode also uses stdout, but with a different schema. They use different names for the same tools, different nesting structures for messages, and different conventions.

We needed all of this to feed into a single brand pipeline, so we built a four-stage normalization layer:

  1. Transcript capture: Each agent stores its transcript differently, so this step is agent-specific.

  2. Parsing: Each agent has its own parser that normalizes tool names and flattens agent-specific message structures into a single unified event type.

  3. Enrichment: Shared post-processing that extracts structured metadata (URLs, commands) from tool arguments, normalizing differences in how each agent names its args.

  4. Summary and brand extraction: Aggregate the unified events into stats, then feed into the same brand extraction pipeline used for standard model responses.

Link to headingStage 1: Transcript capture

This happens while the sandbox is still running (step 5 in the lifecycle from the previous section).

Claude Code writes its transcript as a JSONL file on the sandbox filesystem. We have to find and read it out after the agent finishes:

async function captureTranscript(sandbox) {

const workdir = sandbox.getWorkingDirectory();

const projectPath = workdir.replace(/\\//g, '-');

const claudeProjectDir = `~/.claude/projects/${projectPath}`;

// Find the most recent .jsonl file

const findResult = await sandbox.runShell(

`ls -t ${claudeProjectDir}/*.jsonl 2>/dev/null | head -1`

);

const transcriptPath = findResult.stdout.trim();

return await sandbox.readFile(transcriptPath);

}

Codex and OpenCode both output their transcripts to stdout, so capture is simpler — filter the output for JSON lines:

function extractTranscriptFromOutput(output: string) {

const lines = output.split('\\n').filter(line => {

const trimmed = line.trim();

return trimmed.startsWith('{') && trimmed.endsWith('}');

});

return lines.join('\\n');

}

The output of this stage is the same for all agents: a string of raw JSONL. But the structure of each JSON line is still completely different per agent, and that's what the next stage handles.

Link to headingStage 2: Parsing tool names and message shapes

We built a dedicated parser for each agent that does two things at once: normalizes tool names and flattens agent-specific message structures into a single formatted event type.

Tool name normalization

The same operation has different names across agents:

Operation

Claude Code

Codex

OpenCode

Read a file

Read

read_file

read

Write a file

Write

write_file

write

Edit a file

StrReplace

patch_file

patch

Run a command

Bash

shell

bash

Search the web

WebFetch

(varies)

(varies)

Each parser maintains a lookup table that maps agent-specific names to ~10 canonical names:

export type ToolName =

| 'file_read' | 'file_write' | 'file_edit'

| 'shell' | 'web_fetch' | 'web_search'

| 'glob' | 'grep' | 'list_dir'

| 'agent_task' | 'unknown';

const claudeToolMap = {

Read: 'file_read', Write: 'file_write', Bash: 'shell',

WebFetch: 'web_fetch', Glob: 'glob', Grep: 'grep', /* ... */

};

const codexToolMap = {

read_file: 'file_read', write_file: 'file_write', shell: 'shell',

patch_file: 'file_edit', /* ... */

};

const opencodeToolMap = {

read: 'file_read', write: 'file_write', bash: 'shell',

rg: 'grep', patch: 'file_edit', /* ... */

};

Message shape flattening

Beyond naming, the structure of events varies across agents:

  • Claude Code nests messages inside a message property and mixes tool_use blocks into content arrays.

  • Codex has Responses API lifecycle events (thread.started, turn.completed, output_text.delta) alongside tool events.

  • OpenCode bundles tool call + result in the same event via part.tool and part.state.

The parser for each agent handles these structural differences and collapses everything into a single TranscriptEvent type:

export interface TranscriptEvent {

timestamp?: string;

type: 'message' | 'tool_call' | 'tool_result' | 'thinking' | 'error';

role?: 'user' | 'assistant' | 'system';

content?: string;

tool?: {

name: ToolName; // Canonical name

originalName: string; // Agent-specific name (for debugging)

args?: Record<string, unknown>;

result?: unknown;

};

}

The output of this stage is a flat array of TranscriptEvent[] , which is the same shape regardless of which agent produced it.

Link to headingStage 3: Enrichment

After parsing, a shared post-processing step runs across all events. This extracts structured metadata from tool arguments so that downstream code doesn't need to know that Claude Code puts file paths in args.path while Codex uses args.file:

if (['file_read', 'file_write', 'file_edit'].includes(event.tool.name)) {

const path = extractFilePath(args);

if (path) event.tool.args = { ...args, _extractedPath: path };

}

if (event.tool.name === 'web_fetch') {

const url = extractUrl(args);

if (url) event.tool.args = { ...args, _extractedUrl: url };

}

Link to headingStage 4: Summary and brand extraction

The enriched TranscriptEvent[] array gets summarized into aggregate stats (total tool calls by type, web fetches, errors) and then fed into the same brand extraction pipeline used for standard model responses. From this point forward, the system doesn't know or care whether the data came from a coding agent or a model API call.

Link to headingOrchestration with Vercel Workflow

This entire pipeline runs as a Vercel Workflow. When a prompt is tagged as "agents" type, the workflow fans out across all configured agents in parallel and each gets its own sandbox:

export async function probeTopicWorkflow(topicId: string) {

"use workflow";

const agentPromises = AGENTS.map((agent, index) => {

const command = agent.buildCommand(topicData.text);

return queryAgentAndSave(topicData.text, run.id, {

id: agent.id,

name: agent.name,

setupCommands: agent.setupCommands,

command,

}, index + 1, totalQueries);

});

const results = await Promise.all(agentPromises);

}

Link to headingWhat we’ve learned

  • Coding agents contribute a meaningful amount of traffic from web search. Early tests on a random sample of prompts showed that coding agents execute search around 20% of the time. As we collect more data we will build a more comprehensive view of agent search behavior, but these results made it clear that optimizing content for coding agents was important.

  • Agent recommendations have a different shape than model responses. When a coding agent suggests a tool, it tends to produce working code with that tool, like an import statement, a config file, or a deployment script. The recommendation is embedded in the output, not just mentioned in prose.

  • Transcript formats are a mess. And they are getting messier as agent CLI tools ship rapid updates. Building a normalization layer early saved us from constant breakage.

  • The same brand extraction pipeline works for both models and agents. The hard part is everything upstream: getting the agent to run, capturing what it did, and normalizing it into a structure you can grade.

Link to headingWhat’s next

  • Open sourcing the tool. We're planning to release an OSS version of our system so other teams can track their own AEO evals, both for standard models and coding agents.

  • Deep dive on methodology. We are working on a follow-up post covering the full AEO eval methodology: prompt design, dual-mode testing (web search vs. training data), query-as-first-class-entity architecture, and Share of Voice metrics.

  • Scaling agent coverage. Adding more agents as the ecosystem grows and expanding the types of prompts we test (not just "recommend a tool" but full project scaffolding, debugging, etc.).