惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

GbyAI
GbyAI
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
P
Proofpoint News Feed
L
Lohrmann on Cybersecurity
S
Secure Thoughts
Attack and Defense Labs
Attack and Defense Labs
人人都是产品经理
人人都是产品经理
Stack Overflow Blog
Stack Overflow Blog
W
WeLiveSecurity
O
OpenAI News
SecWiki News
SecWiki News
博客园 - Franky
NISL@THU
NISL@THU
Microsoft Azure Blog
Microsoft Azure Blog
T
Tor Project blog
Microsoft Security Blog
Microsoft Security Blog
aimingoo的专栏
aimingoo的专栏
Security Latest
Security Latest
H
Hacker News: Front Page
Google Online Security Blog
Google Online Security Blog
P
Privacy & Cybersecurity Law Blog
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
D
Darknet – Hacking Tools, Hacker News & Cyber Security
月光博客
月光博客
李成银的技术随笔
Spread Privacy
Spread Privacy
F
Full Disclosure
F
Fortinet All Blogs
T
The Exploit Database - CXSecurity.com
Vercel News
Vercel News
AWS News Blog
AWS News Blog
WordPress大学
WordPress大学
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
V
Visual Studio Blog
J
Java Code Geeks
博客园 - 三生石上(FineUI控件)
G
Google Developers Blog
云风的 BLOG
云风的 BLOG
博客园 - 司徒正美
Engineering at Meta
Engineering at Meta
Last Week in AI
Last Week in AI
P
Palo Alto Networks Blog
宝玉的分享
宝玉的分享
T
True Tiger Recordings
N
News and Events Feed by Topic
酷 壳 – CoolShell
酷 壳 – CoolShell
Cisco Talos Blog
Cisco Talos Blog
N
News | PayPal Newsroom
S
SegmentFault 最新的问题
Jina AI
Jina AI

The New Stack | DevOps, Open Source, and Cloud Native News

After becoming cloud computing’s telemetry standard, OpenTelemetry graduates into the AI infrastructure era Building the agentic agreement enterprise: How developers are unlocking agentic experiences with Docusign’s MCP server and platform NanoCo bets the future of enterprise AI is one sandboxed agent per employee Why six AI labs built the same product for knowledge workers in four months LLMs were trained on an inaccessible web — AudioEye data shows AI is still building one Cursor bets on cheaper coding with Composer 2.5 and Kimi K2.5 At Google I/O 2026, Antigravity gets a new job description Anthropic hires OpenAI co-founder Andrej Karpathy to lead Claude pre-training research Google launches $100 AI Ultra plan and cuts top tier to $200 Google’s Gemini 3.5 Flash beats the frontier models Google now lets developers use GPT and Claude in Android Studio Google wants to make the web agent-ready Google now lets you vibe code native Android apps in AI Studio Valkey just had a 17x year. Its lead maintainer still doesn’t want Redis to die. Anthropic debuts MCP tunnels and self-hosted sandboxes to lock down AI agent infrastructure Why production RAG systems give confident, wrong answers at scale Steve Yegge’s AI agent orchestration project Gas Town comes to the cloud — and brings the Wasteland with it Pulumi bets infrastructure’s next decade belongs to AI agents Why Google’s Remy leaks have enterprise architects rethinking the AI stack GitHub will start paying some bug bounty hunters in swag instead of cash AI security readiness is now the No. 1 obstacle to adoption, Linux Foundation finds The Mac mini just became infrastructure The cleanup cost of AI-generated code GitHub takes aim at Claude Code and Codex with its new Copilot app Forward deployed engineer is AI’s hottest job as OpenAI and Google race to hire. Here’s how to become one. Why Block handed Goose to the Linux Foundation AWS found bugs in 60% of software requirements. Its fix isn’t more AI — it’s a 50-year-old logic engine. The software fix that could shrink AI’s energy bill without new hardware Why AI is failing in the security operations center The hidden cost of build vs. buy for agentic AI in regulated industries OpenAI brings Codex to the ChatGPT mobile app Cloud code: Conductor joins rush toward remote coding agents GitLab is betting a 19th-century economic theory will shape its AI era Anthropic splits billing again: Agent SDK gets separate credit pools The Rust sidecar pattern that fixes Python AI’s biggest weakness Fivetran’s CPO: Closed data stacks won’t survive the agent era MinIO’s MemKV promises 95% better GPU utilization by ending AI recompute tax Red Hat’s skill packs give AI agents something a bigger model never could: 20 years of institutional memory Anthropic’s Claude Code agent view is a better dashboard. So why aren’t developers convinced? OpenAI’s Daybreak and Anthropic’s Glasswing have nearly identical benchmarks — and 3 of the same partners I tested OpenAI’s three claims about GPT-5.5 Instant, and only one fully held up Temporal hits 3,000 paying customers with its crash-proof workflow engine Cloud native application challenges: installing the walking skeleton Cimento emerges from stealth to secure the one thing no firewall can protect Why agent harnesses fail inside cloud-native systems How to build a skills library for your engineering team Why enterprise AI needs customization The new FinOps problem isn’t cloud bills Jensen Huang and Bill McDermott bet on OpenShell to secure enterprise AI agents The API portal is the clearest signal of whether your company can handle AI agents AI is creating a generation of developers who can’t debug their own code Red Hat is betting on AgentOps to close the gap between AI experiments and production AI teams are spending months on web scrapers that SerpApi replaces with one API call Living off the agent: The new tactic hijacking enterprise AI SAP launches managed Joule Studio with Cursor and Claude Code support SAP launches AI Agent Hub at Sapphire 2026 to tame vendor agent sprawl As agentic dev tools boom, workflow auditability becomes the constraint Anthropic’s Claude Platform comes to AWS Anthropic trains Claude to resist blackmail & self-preservation behavior via agentic misalignment How AI-native systems are built Why your AI agent doesn’t actually remember anything Why 157,000 developers are hedging against Anthropic with OpenCode Claude can now follow users across Outlook, Word, Excel, and PowerPoint Why Prometheus couldn’t see Cilium metrics at 2 a.m. Anthropic puts the “myth” in Mythos with its HackerOne bug bounty program The attack surface moved inside the agent. So did Arcjet. Tanzu Platform’s 15-year head start meets the AI moment Datadog and T-Mobile leaders reveal the reality of deploying AI agents in production How Anthropic and Elon Musk cornered Sam Altman this week OpenAI Codex arrives in the browser with new Chrome extension “Several known limitations”: Developers react to Cursor’s promising but still-moving SDK AI startups are scrambling to survive in big tech’s shadow “The terminal still matters”: Amp rebuilds its CLI for an agentic future beyond the command line Anthropic recruited SpaceX’s 220,000-GPU Colossus 1 to fix what Claude users kept complaining about How Microsoft is governing thousands of Kubernetes clusters without manual intervention Temporal reveals serverless option for its Durable Execution platform OpenAI brings GPT-5-level reasoning to its speech models Elastic architects reveal how to query observability data in plain English I tested the new OpenAI Codex features on a real Python codebase, and it’s the strongest Claude Code rival yet GitHub builds an immune system for AI coding agents running on MCP With the launch of Meko, Yugabyte targets the data layer that’s breaking multi-agent AI systems The introverts’ edge: How AI is leveling the developer floor How a Cursor AI agent wiped PocketOS’s production database in under 10 seconds Why long-running AI agents break on HTTP and how Ably is fixing it Anthropic will let its managed agents dream Developers will use whatever AI coding tool they want. ServiceNow is building for that reality. Why Atlassian is letting Claude Code into its own data graph Kubernetes finally lands user namespace support, but shared kernel problem remains The company that made RAG mainstream is now betting against it Why PHP performance keeps getting bumped from the roadmap How NetEase Games cut LLM cold starts from 42 minutes to 30 seconds Why the Linux Foundation adopted MCP, with Jim Zemlin and Mazin Gilbert “AI systems do not understand”: New report flags systemic failures in AI coding AI and Claude: The internal rebellion that changed Amazon’s rules OpenAI rolls out GPT-5.5 Instant as default ChatGPT model, promises more accurate responses The context window has been shattered: Subquadratic debuts a 12-million-token window AI has a sprawling data problem. Airbyte has just launched a tool to fix it. “Real maturity problems”: Not every developer is thrilled with Bun after Anthropic acquisition AI agents need to spend money — Stripe and iWallet are building the rails ScyllaDB cut Sprig’s read latency 4X after Redis and ClickHouse hit a wall
Cut your AI search costs without sacrificing quality
Monica White · 2026-05-21 · via The New Stack | DevOps, Open Source, and Cloud Native News

The cost that’s driving your AI search bill

Every organization running AI-powered search faces the same hidden cost driver: query embeddings. Documents are embedded once. Queries are embedded continuously for every user, every search, every second. At scale, this quickly becomes one of the largest line items in your AI infrastructure budget.

Together, Vespa AI and Voyage AI have solved this problem with a technique called asymmetric retrieval. Use the best embedding model available for your documents (once, at indexing time), then embed queries for free using a tiny, locally running model. Voyage AI’s voyage-4 model family is built for exactly this. All four models share a common vector space, making the split practical without any reindexing or architectural changes.

“Every organization running AI-powered search faces the same hidden cost driver: query embeddings.”

Bottom line for decision-makers: Your query embedding bill effectively goes to zero and your search path becomes more resilient, all without replacing your existing search infrastructure.

The problem: Symmetry is expensive

The conventional approach uses the same embedding model for both documents and queries. It’s simple, but it ignores a critical asymmetry in how those two operations work.

Document EmbeddingQuery Embedding
FrequencyOnce per documentEvery single request
Latency sensitivityNone, no user is waitingOn the critical path, 24/7
Cost @ 10K QPSAmortized, negligible~$15,500/month

At 10,000 queries per second with ~30-token queries, you generate roughly 777 billion tokens per month, all routed through an external API at real cost.

The solution: Asymmetric retrieval with Voyage AI + Vespa

Voyage AI’s voyage-4 family introduces four models (voyage-4-large, voyage-4, voyage-4-lite, and voyage-4-nano) that all produce embeddings in a shared vector space. You can embed documents with the most powerful model and query with the smallest, and they remain fully compatible.

Vespa now has native support for this workflow, running voyage-4-nano locally inside its container nodes, with no API calls, no rate limits, and no additional cost.

How it works

Step 1: index time: documents → voyage-4-large (API)

Embed each document once with Voyage AI’s top-tier model. The results are the highest accuracy, with no latency pressure. Cost is fully amortized over the document’s lifetime.

Step 2: query time: queries → voyage-4-nano (local)

Embed every user query with a tiny model running inside Vespa. Runs in single-digit milliseconds on CPU. Zero external API dependency. Zero cost.

Read the full technical blog.

Business impact at a glance

MetricSymmetric (traditional)Asymmetric (Vespa + Voyage AI)
Query embedding cost @ 10K QPS❌ ~$15,500 / month✅ $0 / month
Query embedding latency❌ API round-trip (10–80ms)✅ <5ms on CPU (local)
Retrieval quality vs. OpenAI v3 LargeBaseline✅ +14.05% NDCG@10
API dependency on the critical path❌ Yes, outages affect search✅ No, fully self-contained
Re-indexing to upgrade the query model❌ Required✅ Not required
Multi-tier document quality❌ Not supported✅ Supported

Why operational resilience matters

Eliminating the external API from the query path is more than a cost optimization, it’s a reliability decision.

“Eliminating the external API from the query path is more than a cost optimization, it’s a reliability decision.”

RiskTraditional ArchitectureAsymmetric Architecture
API outageSearch goes downNo impact, fully local
Rate limitingDropped/delayed requests on traffic spikesNo rate limits
ScalingDays to negotiate a higher API quotaMinutes to add Vespa container nodes

With asymmetric retrieval, the query path is entirely self-contained. Search works regardless of third-party API status.

Advanced: two-phase ranking for maximum accuracy

Vespa combines this architecture with a two-phase ranking strategy that delivers both speed and precision at large scale.

Vespa stores document vectors in two forms, compact binary embeddings (16× smaller in memory) for fast first-phase retrieval, and full-precision bfloat16 (on disk) for accurate second-phase reranking. The result is binary-speed search with full-precision accuracy.

Phase 1: full index scan

Hamming distance on binary vectors. ~1 billion distance calculations per second. Retrieves the top 2,000 candidates from the entire corpus in milliseconds.

Phase 2: precision reranking

Bfloat16 dot-product on top candidates only. Full-precision vectors are paged from disk for the top 2,000 results. Accurate, and bounded in compute.

Binary quantization also reduces storage: a 2,048-dimension vector shrinks from 4,096 bytes to 256 bytes, a 16× reduction, with negligible impact on final ranking quality.

Designed for enterprise scale

Vespa separates stateless container nodes (where embedding runs) from content clusters (where data lives), so query embedding capacity and document storage scale independently. Multi-tenant deployments can mix document embedding tiers within the same index, using voyage-4-large for premium customers and voyage-4-lite for cost-sensitive tiers, while all tenants share the same local query model.

When to use this architecture

ScenarioRecommendation
High QPS (>1,000 queries/sec)✅ Strong fit, savings scale linearly
Large document corpus✅ Strong fit, document embedding cost is amortized
Latency-sensitive applications✅ Strong fit, local inference eliminates network round-trips
Multi-tenant platforms✅ Strong fit, per-tier quality/cost control
Low volume (<100 QPS), latency-tolerantSingle model may be simpler at this scale
Maximum quality, cost not a concernSymmetric voyage-4-large for both is still an option

A joint solution from two AI search leaders

Vespa AI provides the industry’s leading open-source search and recommendation platform, powering AI applications at Spotify, Yahoo, and Perplexity.

Voyage AI delivers state-of-the-art embedding models. At the time of writing this blog, voyage-4-Large is ranked #1 on the RTEB benchmark across 29 retrieval datasets, outperforming Gemini Embedding 001 by +3.87%, Cohere Embed v4 by +8.20%, and OpenAI v3 Large by +14.05%.

Get started

ResourceLink
Full runnable notebook (pyvespa)Voyage AI Embeddings on Vespa Cloud
Voyage 4 model announcementVoyage AI Blog
Vespa embedding documentationdocs.vespa.ai/en/embedding
Binary quantization guideBinarizing Vectors in Vespa
Phased ranking documentationPhased Ranking in Vespa
Vespa community SlackJoin vespatalk.slack.com

TRENDING STORIES

Group Created with Sketch.