惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

GbyAI
GbyAI
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
P
Proofpoint News Feed
L
Lohrmann on Cybersecurity
S
Secure Thoughts
Attack and Defense Labs
Attack and Defense Labs
人人都是产品经理
人人都是产品经理
Stack Overflow Blog
Stack Overflow Blog
W
WeLiveSecurity
O
OpenAI News
SecWiki News
SecWiki News
博客园 - Franky
NISL@THU
NISL@THU
Microsoft Azure Blog
Microsoft Azure Blog
T
Tor Project blog
Microsoft Security Blog
Microsoft Security Blog
aimingoo的专栏
aimingoo的专栏
Security Latest
Security Latest
H
Hacker News: Front Page
Google Online Security Blog
Google Online Security Blog
P
Privacy & Cybersecurity Law Blog
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
D
Darknet – Hacking Tools, Hacker News & Cyber Security
月光博客
月光博客
李成银的技术随笔
Spread Privacy
Spread Privacy
F
Full Disclosure
F
Fortinet All Blogs
T
The Exploit Database - CXSecurity.com
Vercel News
Vercel News
AWS News Blog
AWS News Blog
WordPress大学
WordPress大学
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
V
Visual Studio Blog
J
Java Code Geeks
博客园 - 三生石上(FineUI控件)
G
Google Developers Blog
云风的 BLOG
云风的 BLOG
博客园 - 司徒正美
Engineering at Meta
Engineering at Meta
Last Week in AI
Last Week in AI
P
Palo Alto Networks Blog
宝玉的分享
宝玉的分享
T
True Tiger Recordings
N
News and Events Feed by Topic
酷 壳 – CoolShell
酷 壳 – CoolShell
Cisco Talos Blog
Cisco Talos Blog
N
News | PayPal Newsroom
S
SegmentFault 最新的问题
Jina AI
Jina AI

DEV Community

Pretty normal Both Camps in the 'Left Behind' Argument Are Right About Each Other Flutter MCP Toolkit v3 🔐 Working with Private Symfony Recipes Rate limiting in web apps: what to protect before picking a library Rate limiting en aplicaciones web: qué proteger antes de elegir una librería What Are Lakehouse Catalogs? The Role of Catalogs in Apache Iceberg What It Really Takes to Become a Senior Software Engineer Microservices Were Never About Technology JS Crime Scene: The Misleading Array Project-as-code for a Directus v9 backend When the API literally burned your database after a typo COOKIES DPRK Hacking Trends 2026: AI‑Powered Supply Chain and Developer Environment Attacks Phone control for AI coding sessions is not a tiny terminal PayPal and Crypto Are Not Equals: How I Built a Gumroad Alternative for Restricted Countries Exploring Tech as a Content Writer I Raised Gemma 4's Token Cap. The Dense Model Stopped Refusing. React Server Components Don't Make Your App Fast by Default Multi-Stage Builds for a Next.js App — Reduce Image Size by 70% I Built a Chrome Extension That Teaches Vocabulary While You Browse Why I Walked Back from Next.js and RSC to a Plain SPA and a Separate Backend NeuralPocket: Private On-Device AI with Gemma 4 — Android & Web Github Speckit: Revolucionando o Desenvolvimento com SDD Cloud Cost Elasticity I Built a Payment System for Bangladesh—Heres Why Stripe Failed Us Polyglot Persistence in Microservices: Choosing the Right Database for Each Service Centralized Authentication for a Multi-Brand Laravel Ecosystem How I made a perfect recording button. Simple yet complex thing. Mumbli – my personal Wispr Flow Getting Paid Should Not Be a Geopolitical Nightmare: My NOWPayments Integration Story Four Layers of Validation in Kubernetes with Claude Code Prompt Flow — a visual side project for flow design, trace, and integration steps (looking for feedback) AI Citation Registry: Temporal Gaps in Government Publishing Cycles ShowDev: I built a 100% local, zero-upload PDF editor using WebAssembly JavaC Written by an AI Pipeline, Verified by Three Models. Is It Slop? Part1 Vulkan: Drawing Triangle 1 Why I Stopped Using useEffect to Sync State — and What I Use Instead Por qué dejé de usar useEffect para sincronizar estado y qué uso ahora Migrating a Long-Running WordPress Site to Payload CMS (And All The Chaos That Came With It) Hidden Partitioning: How Iceberg Eliminates Accidental Full Table Scans Azure DevOps Structure Explained: Organizations, Projects, and Repos Without the Mess A Simple React Hook for localStorage State, Expiry, and Sync I sold you on /scratchpad. Then I migrated to /note. Fixing WSL Errors on Windows 11 Your app is not Netflix. Stop building like it is. Resolving inter-service communication issue I built an email cleaner. CSV parsing took longer than the actual validators. How I Would Learn Full-Stack Development in 2026 If I Started From Zero Partition Evolution: Change Your Partitioning Without Rewriting Data What Google Play's I/O 2026 Updates Look Like From a Solo Indie Puzzle Developer Forgetting the Myth of "Ease of Integration" When Selling Digital Products with Bitcoin My 4-Step Regex Debugging Workflow (That Actually Saves Time) Stop Scraping Betting Sites: How to Build a Real-Time Sports Tracker in Python Civic Identity and Responsibility in Modern Democracy OLTP vs OLAP Are binaries really executable code ? The lie of the 80%: why software progress charts don't work What a Datacenter in Space Actually Buys You: Three Server Racks Is AI Actually Citing Your Site? How to Measure What Google Rankings Can't Accessibility - This looks like a job for a developer advocate! I built a Mac app that turns web pages into live widgets How to Teach Source Evaluation When Your Students Use ChatGPT More Context Does Not Mean More Trust RAG Series (24): Code RAG — Teaching AI to Understand Your Codebase Past the JVM Design decisions behind my “Irregular German Verbs” iOS app WordPress 7.0 "Armstrong" Is Live — Post-Release Deep Dive 🎺 Performance and Apache Iceberg's Metadata I Shipped a Bug to Production That Cost Us 3 Hours of Downtime 程序人生:在代码与时间之间 The Wrong Way to Think About XRPL Event Infrastructure What I Learned About MND, Voice Banking, and Why Assistive Tech Is Personal $1.50/Month Email Infrastructure That Beats Your $20 SendGrid Plan Cloud Unit Economics: The Metrics DevOps and FinOps Teams Actually Need Bypassing Payment Platform Restrictions Was The Best Decision I Ever Made For My Digital Product Business The Hidden Life of a Container: A Complete Lifecycle When a port is already in use, there is no interactive way to find it — so I built `port-peek` Como Sumir com o Barulho do Teclado Mecânico no Ubuntu Usando o NoiseTorch Google I/O 2026 dropped a bomb on Android tooling, and nobody's talking about it (or maybe they are 😅) Mentoring Junior Developers: What Actually Works How I Prevented Claude Code from Breaking My Architecture with 18 Tests That Run in 0.4 Seconds I Controlled an ESP32 Drone Using Only My Voice vite HMR is silently the reason ur laptop fan wont stop AI Agents Security for Developers: Don't Let Your Agents Become a Liability Single List Keyboard Handling 9 SaaS development companies worth knowing (a technical look) Material Nova — The Best VS Code Theme of 2026 Inference Routing Is Becoming an Infrastructure Placement Problem I just build a League MBTI Analytics Why I Built My Own Site with Astro, Not WordPress when I use WordPress for a Living Hello! I'm a balloon artist who started 3D modeling 7 Next.js 16 Caching Bugs That Compile Fine and Break Silently in Production I got tired of writing READMEs so I built a tool that generates them from your GitHub URL FrontGate: a Lightweight Package Proxy for Supply Chain Security Why Your Expense Tracking Architecture Keeps Breaking Stop your AI trading agent from hallucinating technical analysis Breaking the Monorepo Barrier in a Crypto Store for Digital Products Imposter Syndrome Is Something We All Struggle With at Some Point in Our Careers
Google Just Shipped Gemini 3.5 Flash. Here's What Developers Actually Need to Know.
Om Shree · 2026-05-21 · via DEV Community

The Flash series has always been Google's answer to the speed-vs-intelligence tradeoff. With Gemini 3.5 Flash, Google is making a different argument: you shouldn't have to choose.

The Problem It's Solving

The history of "fast" AI models is a history of compromise. You got low latency, but you gave up reasoning depth. You got cheaper inference, but you got worse results on multi-step tasks. The whole Flash premise — intelligence at Flash-level speed and cost — has always been aspirational. With Gemini 3.5 Flash, the benchmarks suggest Google has actually closed a meaningful portion of that gap, particularly for the workload that matters most right now: agentic execution.

How Gemini 3.5 Flash Actually Works

Gemini 3.5 Flash is designed for sub-agent deployment, multi-step workflows, and long-horizon tasks at scale, with particular effectiveness in rapid agentic loops involving complex coding cycles and iterations. That's the framing Google leads with, and the architecture reflects it.

The model supports a 1M token context window, 65k max output tokens, and thinking — the same set of tools and platform features as Gemini 3 Flash. The key architectural addition is thought preservation: the model now maintains intermediate reasoning across multi-turn conversations automatically. When present in the conversation history, reasoning context carries forward, which improves performance on complex multi-step tasks like iterative debugging and code refactoring. No API changes are needed.

The thinking system itself has also changed. The default thinking effort level is now medium, changed from high in Gemini 3 Flash Preview. medium yields very good results across a wide range of tasks while being faster and more cost-efficient. For complex problems, high encourages the model to think more deeply. Google's explicit recommendation: start at medium, drop to low for speed-sensitive agentic loops, escalate to high only for hard reasoning or math. The old thinking_budget numeric parameter is gone — use the thinking_level string enum instead.

One important note for teams running computer-use workloads: Computer Use is not supported in Gemini 3.5 Flash at this moment. For Computer Use workloads, continue using Gemini 3 Flash Preview.

What Developers Are Actually Using It For

The benchmark most worth examining for this audience is MCP Atlas — a multi-step workflows benchmark using MCP. Gemini 3.5 Flash scores 83.6% on MCP Atlas, leading the comparison set that includes Gemini 3.1 Pro (78.2%), Claude Opus 4.7 (79.1%), and GPT-5.5 (75.3%). If you're building anything involving MCP tool chains, that number is directly relevant.

On Finance Agent v2 (financial analysis and decision-making), Gemini 3.5 Flash scores 57.9%, ahead of Claude Sonnet 4.6 (51.0%), Claude Opus 4.7 (51.5%), and GPT-5.5 (51.8%).

The coding story is also compelling in a specific way. JetBrains reports that Gemini 3.5 Flash delivers coding and reasoning quality close to Gemini Pro while preserving the speed and cost profile that makes Flash ideal for real-time developer workflows, with low-reasoning coding performance improved by 10–20% compared to the previous Flash generation.

Enterprise validation comes from Box: Gemini 3.5 Flash beat Gemini 3 Flash by 19.6% on Box's enterprise work evaluation set, which was designed to reflect the kinds of real-world multi-step tasks their customers perform daily. For Life Sciences customers, Gemini 3.5 Flash can extract data and make calculations with 96.4% greater accuracy, and for Financial Services firms, it can build financial reports from structured data with 46.7% greater accuracy.

Why This Is a Bigger Deal Than It Looks

The MCP Atlas score deserves more attention than it's getting. For anyone building agentic systems using the Model Context Protocol — and the infrastructure around it is growing fast — having a model that leads on multi-step MCP workflows at Flash pricing changes the economics of what you can deploy. MCP-native tooling like Glama.ai and other agentic middleware layers become more viable when your inference costs stay low without sacrificing orchestration quality.

The thought preservation feature is the other architectural shift worth watching. Most developers managing multi-turn agentic sessions today are manually engineering state — reconstructing context, summarizing prior steps, managing memory externally. With Gemini 3.5 Flash, the model uses reasoning context from all previous turns when thought signatures are present in the conversation history; the SDKs handle this automatically. That's less scaffolding code your team has to maintain.

There is one behavioral change that could silently degrade quality if you migrate without testing: the default thinking effort changed from high to medium. Teams should verify quality, speed, and cost after migration, and note that thought preservation is now on by default — reasoning context carries forward across turns, which improves performance but may increase token usage.

Availability and Access

Gemini 3.5 Flash is generally available (GA), stable, and ready for scaled production use. The model ID is gemini-3.5-flash, last updated May 2026.

The model is accessible via the Gemini App, Gemini API, Google AI Studio, Google Antigravity, Gemini Enterprise Agent Platform, and Android Studio. It supports function calling, structured output, search grounding, Google Maps grounding, URL context, file search, code execution, and thinking — all available in the same request via combined tool use.

On the paid tier, input pricing runs $1.50 per million tokens and output at $9.00 per million tokens (including thinking tokens). Context caching is $0.15 per million tokens, with storage at $1.00 per million tokens per hour. Batch inference halves those rates. A free tier is available for experimentation through Google AI Studio.

For teams migrating from Gemini 3 Flash Preview: update the model string from gemini-3-flash-preview to gemini-3.5-flash, replace thinking_budget with thinking_level, remove temperature/top_p/top_k from your config (no longer recommended), and add id and matching name to all FunctionResponse parts. The full migration checklist is worth reading before touching production.

The speed-vs-intelligence tradeoff that has defined the Flash tier since its inception is getting smaller with each generation. The MCP Atlas score, the thought preservation architecture, and the enterprise validation from Box all point at the same conclusion: Gemini 3.5 Flash is the most credible case yet that "fast and cheap" doesn't have to mean "less capable" for agentic workloads specifically.

Follow for more coverage on MCP, agentic AI, and AI infrastructure.