惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

酷 壳 – CoolShell
酷 壳 – CoolShell
H
Hacker News: Front Page
P
Palo Alto Networks Blog
T
ThreatConnect
Apple Machine Learning Research
Apple Machine Learning Research
博客园_首页
T
True Tiger Recordings
P
Privacy & Cybersecurity Law Blog
B
Blog
IT之家
IT之家
Last Week in AI
Last Week in AI
F
Full Disclosure
Hacker News: Ask HN
Hacker News: Ask HN
C
Comments on: Blog
Microsoft Azure Blog
Microsoft Azure Blog
C
Cybersecurity and Infrastructure Security Agency CISA
Microsoft Security Blog
Microsoft Security Blog
博客园 - 【当耐特】
N
News and Events Feed by Topic
NISL@THU
NISL@THU
腾讯CDC
雷峰网
雷峰网
Security Latest
Security Latest
李成银的技术随笔
M
Microsoft Research Blog - Microsoft Research
L
LangChain Blog
L
Lohrmann on Cybersecurity
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
C
Check Point Blog
Y
Y Combinator Blog
Recent Announcements
Recent Announcements
博客园 - Franky
N
News | PayPal Newsroom
V
V2EX
A
About on SuperTechFans
The Register - Security
The Register - Security
月光博客
月光博客
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Google Online Security Blog
Google Online Security Blog
MyScale Blog
MyScale Blog
Cisco Talos Blog
Cisco Talos Blog
Vercel News
Vercel News
WordPress大学
WordPress大学
C
Cyber Attacks, Cyber Crime and Cyber Security
The Hacker News
The Hacker News
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
爱范儿
爱范儿
A
Arctic Wolf
L
LINUX DO - 最新话题
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More

AI

100 things we announced at I/O 2026 We’re announcing new community investments in Missouri. A new experiment brings better group meetings to Google Beam A new era for AI Search Gemini 3.5: frontier intelligence with action Everything new in our Google AI subscriptions, fresh from I/O 2026 I/O 2026: Welcome to the agentic Gemini era New ways to create and get things done in Google Workspace How AI Mode is changing the way people search in the U.S. I/O 2026 The new AI-powered Google Finance is expanding to Europe. See what happens when creative legends use AI to make ads for small businesses 5 gardening tips you can try right in Search Google is partnering with XPRIZE and Range Media Partners on the $3.5 million Future Vision film competition. The latest AI news we announced in April 2026 Reduce friction and latency for long-running jobs with Webhooks in Gemini API Celebrating 20 years of Google Translate: Fun facts, tips and new features to try Join the new AI Agents Vibe Coding Course from Google and Kaggle 8 Gemini tips for organizing your space (and life) Here’s how our TPUs power increasingly demanding AI workloads. Elevating Austria: Google invests in its first data center in the Alps. We're launching two specialized TPUs for the agentic era. 3 new ways Ads Advisor is making Google Ads safer and faster 7 ways to travel smarter this summer, with help from Google A new way to explore the web with AI Mode in Chrome New ways to create personalized images in the Gemini app Gemini 3.1 Flash TTS: the next generation of expressive AI speech Turn your best AI prompts into one-click tools in Chrome Bringing people together at AI for the Economy Forum Create, edit and share videos at no cost in Google Vids We’re creating a new satellite imagery map to help protect Brazil’s forests. The latest AI news we announced in March 2026 Build with Veo 3.1 Lite, our most cost-effective video generation model Watch James Manyika talk AI and creativity with LL COOL J. Transform your headphones into a live personal translator on iOS. Gemini 3.1 Flash Live: Making audio AI more natural and reliable Search Live is expanding globally Lyria 3 Pro: Create longer tracks in more Google products Build with Lyria 3, our newest music generation model Bringing the power of Personal Intelligence to more people Our latest investment in open source security for the AI era How AI is helping improve heart health in rural Australia Gemini in Google Sheets just achieved state-of-the-art performance. How our open-source AI model SpeciesNet is helping to promote wildlife conservation Ask a Techspert: How does AI understand my visual searches? The latest AI news we announced in February Use Canvas in AI Mode to get things done and bring your ideas to life, right in Search. Create new worlds in Project Genie with these 4 tips
New ways to balance cost and reliability in the Gemini API
2026-04-02 · via AI

Apr 02, 2026

Introducing Flex and Priority inference: advanced controls for developers to optimize costs and reliability through a single, unified interface.

Hussein Hassan Harrirou

Engineering, Gemini API

Your browser does not support the audio element.

Listen to article

This content is generated by Google AI. Generative AI is experimental

[[duration]] minutes

Today, we are adding two new service tiers to the Gemini API: Flex and Priority. These new options give you granular control over cost and reliability through a single, unified interface.

As AI evolves from simple chat into complex, autonomous agents, developers typically have to manage two distinct types of logic:

  • Background tasks: High-volume workflows like data enrichment or "thinking" processes that don't need instant responses.
  • Interactive tasks: User-facing features like chatbots and copilots where high reliability is needed.

Until now, supporting both meant splitting your architecture between standard synchronous serving and the asynchronous Batch API. Flex and Priority help to bridge this gap. You can now route background jobs to Flex and interactive jobs to Priority, both using standard synchronous endpoints. This eliminates the complexity of async job management while giving you the economic and performance benefits of specialized tiers.

Flex Inference: scale innovation for 50% less

Flex Inference is our new cost-optimized tier, designed for latency-tolerant workloads without the overhead of batch processing.

  • 50% price savings: Pay half the price of the Standard API by downgrading criticality of your request (making them less reliable, and adding latency).
  • Synchronous simplicity: Unlike the Batch API, Flex is a synchronous interface. You use the same familiar endpoints without managing input/output files or polling for job completion.
  • Ideal use cases: Background CRM updates, large-scale research simulations, and agentic workflows where the model "browses" or "thinks" in the background.

Get started fast by simply configuring the service_tier parameter in your request:

Flex tier will be available for all paid tiers and is available for GenerateContent and Interactions API requests.

Priority Inference: Highest reliability for critical apps

The new Priority Inference tier offers our highest level of assurance at a premium price point. This helps to ensure your most important traffic is not preempted, even during peak platform usage.

  • Highest criticality: Priority requests get highest criticality leading to higher reliability, even during peak load.
  • Graceful downgrade: If your traffic exceeds your Priority limits, overflow requests are automatically served at the Standard tier instead of failing. This keeps your application online and helps to ensure business continuity.
  • Transparent response: The API response indicates which tier served your request, giving you full visibility into your performance and billing.
  • Ideal use cases: Real-time customer support bots, live content moderation pipelines, and time-sensitive requests.

To use Priority Inference, simply set the service_tier parameter accordingly:

Priority inference will be available to users with Tier 2 / 3 paid projects across the `GenerateContent` API and Interactions API endpoints.

Visit the Gemini API documentation to see the full pricing breakdown and start optimizing your production tiers today. To see it in action, check out the cookbook for runnable code examples.

Related stories

.