惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

人人都是产品经理
人人都是产品经理
W
WeLiveSecurity
Recorded Future
Recorded Future
P
Privacy & Cybersecurity Law Blog
V
Vulnerabilities – Threatpost
C
Cybersecurity and Infrastructure Security Agency CISA
G
GRAHAM CLULEY
S
Securelist
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
小众软件
小众软件
The Hacker News
The Hacker News
The Cloudflare Blog
D
Darknet – Hacking Tools, Hacker News & Cyber Security
V
V2EX
C
Cisco Blogs
Cisco Talos Blog
Cisco Talos Blog
腾讯CDC
Recent Announcements
Recent Announcements
Jina AI
Jina AI
K
Kaspersky official blog
The GitHub Blog
The GitHub Blog
云风的 BLOG
云风的 BLOG
酷 壳 – CoolShell
酷 壳 – CoolShell
GbyAI
GbyAI
F
Fortinet All Blogs
T
ThreatConnect
S
Schneier on Security
罗磊的独立博客
Y
Y Combinator Blog
C
Check Point Blog
T
The Exploit Database - CXSecurity.com
宝玉的分享
宝玉的分享
aimingoo的专栏
aimingoo的专栏
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
I
Intezer
F
Full Disclosure
T
Troy Hunt's Blog
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
WordPress大学
WordPress大学
Application and Cybersecurity Blog
Application and Cybersecurity Blog
V
V2EX - 技术
C
Comments on: Blog
T
Tenable Blog
Project Zero
Project Zero
H
Help Net Security
A
Arctic Wolf
Google DeepMind News
Google DeepMind News
NISL@THU
NISL@THU
博客园 - 【当耐特】
F
Fox-IT International blog

DEV Community

Cooking an AI Campaign in 5 Minutes with Google Cloud AI APIs Your PM Retrospectives Are Lying to You How I Built a Free, Self-Hosted Pipeline That Auto-Generates Faceless YouTube Shorts TypeScript 54 to 58: The Features That Actually Matter in 2026 How to Tailor Your CV to Any Job Posting in 2026 95. Fine-Tuning LLMs: Make a General Model Do Your Specific Job What Is a Frontend Developer Roadmap and Why You Need One Building an MCP server so Claude can query my SaaS analytics directly Google I/O 2026 and the Rise of the AI Ecosystem Your Docker Builds Are Slow Because You're Doing It Wrong (And I Built a Tool to Prove It) How do you verify GitHub contributions without trusting self-reported skills? CV vs Resume: What's the Difference and Which Do You Need? student Devs: Build AI Agents & Compete for $55K in Prizes 🚀 How to Write a Cover Letter That Actually Gets You Interviews Battle-Tested: What Getting Hacked Taught Me About Web & Cyber Security Unda folders za kuandika code >> mkdir src >> cd src >> mkdir controllers database routes services utils >> cd .. Directory: C:\Users\mwaki\microfinance-system Mode LastWriteTime Length Name Code Coverage .NET AI slop debt" is technical debt on fast forward. Nobody's ready. Multi-Head Latent Attention (MLA) Memoria - A Local AI Reading Companion Powered by Gemma 4 Stop Trusting Your Accuracy Score: A Practical Guide to Evaluating Logistic Regression Models Serious Question: Is the Developer Job Actually in Risk Due to AI? published: true tags: #discuss #career #ai #help rav2d: We ported an AV2 video decoder from C to Rust — here's why Your New Domain's First Week of GA4 Is a Lie: 4 Days of Raw Data from a Launch Gemma Guide - Real-Time Spatial Awareness for Blind Users From YAML to AI Agents: Building Smarter DevOps Pipelines with MCP A Field Guide to Human–AI Relations (For the Newly Bewildered Mortal) The AI Agent That Learns While It Works — A Complete Guide to Hermes Agent Inviting collaborators to work on ArchScope ArchScope is an interactive web-based tool that lets you design, visualize, and test system architectures with real-time performance simulations. Github - ArchScope is an interactive web-based tool that lets you Gemma 4: Google's Open-Weight AI Is a Game Changer for Developers Confessions of a Git Beginner: Why the Terminal Stopped Scaring Me Docker 容器化实战:从零到生产部署 🚀 I Built a Full Stack Miro Clone with Real-Time Collaboration using Next.js Building an African Economic Data Pipeline with Python, DuckDB & World Bank API llms.txt vs robots.txt vs ai.txt: The Developer's Cheat Sheet Intigriti Challenge 0526 Writeup Business Logic Flaws: How Attackers Skip Steps in Your App to Get What They Should Never Have Why Vibe Coders Need Boilerplates to Save Time, Tokens, and Build More Secure SaaS Projects Idle Cloud Cost Is the New Egress Cost Quark's Outlines: Python Traceback Objects Ghost in the Stack (Part 1): Why uninitialized variables remember old data Building a High-Performance Local Chess Assistant Extension with WebAssembly Stockfish and Manifest V3 Breaking the Trade-off Between Self-Custody and Intelligent Automation on the Stellar Network I Open-Sourced a Practical Fullstack Interview Preparation Repository (React + Node + System Design) 🚀 How I Started Coding as a Student (Beginner-Friendly Guide) WordPress vs. Ghost: Why Automated Bot Attacks Are Making us think much I tested 4 AI agent-governance tools against an open spec - here's the matrix zkML Inference Proof: What the Receipt Proves, and What the Model Still Does Not I Scored 1000/1000 on AWS Certified AI Practitioner (AIF-C01) Here's Every Resource I Used Go - Struct and Interface Handling JSON Requests in Go Storing Kamal secrets in AWS Secrets Manager and deploying to a cheap Hetzner VPS How I Caught and Fixed an N+1 Query in My Django REST API I got tired of paying $10/month to remove image backgrounds – so I built it for free How to Start Coding as a Student: A Complete Beginner’s Guide 🚀 Storing Kamal secrets in AWS Secrets Manager and deploying to a cheap Hetzner VPS What Are Buffers? Build AI Agents with Hot Dev The Client Onboarding Checklist That Prevents 90% of Project Problems Scalable Treasure Hunts Are a Myth, But We Almost Made One Gemini 3.5 Flash Has a 1M Token Context Window. Here's What You Can Actually Build With It. I built a ultra-polished developer portfolio template using React & Tailwind v4 (with zero-JSX configuration) Gemini CLI Is Dead. Here's the Better Thing That Replaced It Post-quantum cryptography for embedded and IoT: secure boot, TLS and OTA Understanding Optimistic Preloading in Modern Applications Nobody Wants to Read Your Code (And You Don't Want to Read Theirs) A clothing pairing app E2B vs E4B vs 31B Dense: The Practical Guide to Choosing the Right Gemma 4 Model I built an AI app store screenshot generator because Figma made me cry — looking for brutal feedback Hello DEV Community — My Developer Journey Begins Adaptable apps on ChromeOS: a post-mortem The WordPress Paradox: Why It’s Here to Stay (and How to Stop Ruining It) I built a local voice AI that can change to 9 different personalities! UXRay: I Built an AI That Roasts Your UI Like a Senior Designer Would Wyrly DI: Type-safe Dependency Injection for Modern TypeScript The contract is the interface: agent-driven Steampipe Stave in one command Gemma 4's Hidden Superpower: Why Built-in Thinking Tokens Change Everything for Evaluation Tasks ⚡ WordPress Performance: The Real Truth They Don't Tell You A Mobile App Usually Needs an Admin System First Customer Portals Should Remove Repeated Admin Work Episode 4: The Time Loop (Layers & Caching) I Built ContextForge with Gemma 4: A Project Memory Generator for Developers and AI Coding Agents Why shadow DOM beat iframe for inline tooltips HOW TO CREATE USER AND ASSIGN ROLES IN AZURE WITH ENTRA ID When AI Blackmail Goes Viral Episode 3: The Secret Scroll (The Dockerfile) Monte Carlo Simulation for Engineers: Turning Uncertainty Into Numbers The tokens-per-byte trap: character-level 'compression' adds tokens Nobody Reads Your Code Anymore Why I built a collection of 5 free, zero-signup career finance tools for solo builders 🚀 New React Challenge: Instant UI with useOptimistic Resolvendo a Alucinação da IA na Arquitetura de Software com Code Property Graphs e .NET 9 S1 — Clean Backtrace Crashes: How to Diagnose and Fix Them Cómo solucionar el bucle infinito en useEffect con objetos y arrays The Brutal Reality of Running Gemma 4 Locally I made Claude Code refuse to write code unless the ticket scores 80/100 I Fed React's Entire Hooks Transition History to Gemma 4. Here's What It Found That We Missed. Building a Private RAG System: Lessons from a Local-First AI Journal CodePulse AI — Reviving an AI-Powered Repository Intelligence Platform How to Split Video into Segments with FFmpeg (CLI + API)
Google shipped three Gemini "Flash" models. Picking the wrong one could 6 your AI bill
chintanonweb · 2026-05-23 · via DEV Community

This is a submission for the Google I/O 2026 Writing Challenge

Google shipped three Gemini "Flash" models. Picking the wrong one could 6× your AI bill.

I opened Google AI Studio right after the Google I/O 2026 keynote to try the new model everyone was talking about — and got hit with a small wave of confusion. I went looking for "the new Flash model" and found three of them sitting in the same dropdown, names so similar I had to read them twice:

  • Gemini 3.5 Flash
  • Gemini 3.1 Flash Lite
  • Gemini 3 Flash Preview

Three different version numbers. All called "Flash." And when I read their price tags, I found something the keynote didn't dwell on: the gap between the cheapest and the priciest is 6×. Pick the wrong one for your workload and you don't get a slightly bigger bill — you get a 6× bigger bill, for tasks that didn't need it.

Here's the lineup decoded with real numbers, the 6× trap explained, and a decision guide for which "Flash" you should actually reach for.

💡 [Screenshot spot: the AI Studio "Model selection" panel showing all three Flash models stacked together — your proof and hero image.]

The three Flash models, decoded

Pricing is per 1 million tokens, from Google's official Gemini API pricing:

Model Built for Input Output Released
Gemini 3.1 Flash Lite 🆕 High-volume, translation, simple data processing $0.25 $1.50 May 7, 2026
Gemini 3 Flash Preview Speed + frontier intelligence; keeps Computer Use $0.50 $3.00 Dec 17, 2025
Gemini 3.5 Flash 🆕 Frontier agentic + coding $1.50 $9.00 May 19, 2026 (I/O day)

Read that again and the naming makes no sense as a guide: the highest number (3.5) is the most expensive and newest, "Lite" (3.1) is the cheap workhorse, and the lowest number (3) is actually the oldest of the three — a December 2025 preview that's somehow priced in the middle. Only two of them (3.5 Flash and 3.1 Flash Lite) are the genuinely new I/O-era models. The version number tells you nothing about recency or price — you have to read every card.

The 6× trap, in plain terms

Compare the two ends. Gemini 3.5 Flash costs 6× more than 3.1 Flash Lite on both input and output. And output is where it bites, because most AI apps generate far more tokens than they consume — every reply, every summary, every generated line of code is output you pay $9.00 vs $1.50 for.

Run the math on a modest chatbot producing 50M output tokens a month:

  • 3.5 Flash: 50M × $9.00/1M = $450/month
  • 3.1 Flash Lite: 50M × $1.50/1M = $75/month

Same volume. $375/month — $4,500/year — purely from which "Flash" you clicked. If your tasks are translation, classification, or simple extraction, you're paying 6× for "frontier coding intelligence" you never use.

But "cheaper" isn't always "right" — the benchmarks

Lite isn't just a price cut; it's a different capability tier. Google's published numbers (3.1 Flash Lite, 3.5 Flash):

  • 3.1 Flash Lite — LMArena Elo ~1432, GPQA Diamond 86.9%. Genuinely strong for the price, but tuned for throughput.
  • 3.5 Flash — SWE-Bench Pro 55.1%, Terminal-Bench 2.1 76.2%. Built to hold up across long, multi-step agentic and coding tasks where one wrong step compounds.

So the real question isn't "which is cheaper" — it's "does my task actually need the frontier coding model, or am I overpaying for headroom I don't use?"

Which Flash should you actually use?

The decision guide the model picker should have come with:

Use Gemini 3.1 Flash Lite (the $0.25/$1.50 one) for: classification, tagging, extraction, translation, simple summaries — high-volume work with a clear right answer. At 6× cheaper, this is most production traffic.

Use Gemini 3.5 Flash (the $1.50/$9.00 one) for: real agentic workflows and code generation where quality compounds and a wrong early step ruins everything downstream. Pay for it when the output is high-value — and only after you've tested that Lite isn't good enough.

Use Gemini 3 Flash Preview (the $0.50/$3.00 one) when: you need Computer Use — controlling a browser/UI. Notably, 3.5 Flash dropped Computer Use, so for that specific capability Google says stick with 3 Flash Preview (details). Just remember "Preview" can change or disappear.

The meta-rule: default to Lite, upgrade only when you can prove you need to. Most teams will do the opposite — grab the highest version number, ship it, and quietly overpay 6× forever.

Two cost levers nobody mentioned

The price-per-token is only half the bill. Two settings move it a lot:

1. Caching is a 10× input discount. Gemini 3.5 Flash's cached input is $0.15 vs $1.50 — ten times cheaper. If your prompts share a big fixed chunk (a system prompt, a document, a schema), caching it slashes input cost. Most people never turn it on.

2. The "Thinking level" dial controls how hard — and how expensively — the model reasons. Gemini 3.x replaces the old token-budget setting with a thinkingLevel of minimal / low / medium / high (docs). More thinking = better on hard problems, but more time and more tokens. The defaults differ by model — 3.5 Flash defaults to medium, Flash Lite to minimal — and Google notes that routing the bulk of your calls to low/minimal thinking can cut spend 50–70%. So your bill isn't just which model; it's how hard you let it think. Match the effort to the task.

Two details worth knowing before you ship

  • The free tier is real but capped. All three have a rate-limited free tier, plus 5,000 free Google Search grounding prompts per month (then ~$14 per 1,000). Great for prototyping; watch the grounding cap.
  • Their knowledge cutoff is January 2025 — about 16 months before they launched. Every Flash card in AI Studio lists a Jan 2025 cutoff, which means these May-2026 models don't know about anything from 2025–2026 out of the box — including I/O 2026 itself. For anything current, flip on Grounding with Google Search (5,000 free prompts/month, then ~$14 per 1,000). A new model is not the same as an up-to-date one.

The takeaway

Google's I/O 2026 story was "Gemini Flash is fast, smart, and cheap." The truth in the model picker is more useful: there isn't one Flash, there are several, and the difference between them is a 6× cost swing hiding behind nearly identical names — before you even touch caching or the thinking dial.

That's not a complaint. Having a $0.25 workhorse and a frontier coding model in the same family is genuinely great. It just means the most important decision you'll make isn't "should I use Gemini" — it's "which Flash, with what thinking level, for this task." Get that right and you get frontier AI at workhorse prices. Get it wrong and you pay frontier prices for workhorse work.

Open AI Studio, put the three Flash cards side by side, and match each of your app's tasks to the cheapest model that can actually do it. Five minutes — and it can cut your AI bill by more than 6×.


Pricing, model details, and the thinking-level defaults are from Google's official Gemini API docs and AI Studio during the I/O 2026 window (Gemini 3.5 Flash GA'd May 19, 2026); verify current numbers before relying on them, as they change. Master announcement list: "100 things we announced at Google I/O 2026". I drafted this with AI assistance and verified every number against Google's docs and AI Studio myself — the analysis and screenshots are mine.