惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

有赞技术团队
有赞技术团队
T
The Blog of Author Tim Ferriss
SecWiki News
SecWiki News
S
SegmentFault 最新的问题
aimingoo的专栏
aimingoo的专栏
Microsoft Security Blog
Microsoft Security Blog
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
腾讯CDC
I
InfoQ
D
DataBreaches.Net
MyScale Blog
MyScale Blog
T
Tailwind CSS Blog
Martin Fowler
Martin Fowler
Jina AI
Jina AI
F
Fox-IT International blog
G
Google Developers Blog
博客园 - 叶小钗
阮一峰的网络日志
阮一峰的网络日志
T
Threat Research - Cisco Blogs
I
Intezer
D
Docker
S
Securelist
T
The Exploit Database - CXSecurity.com
The Register - Security
The Register - Security
Cyberwarzone
Cyberwarzone
P
Privacy & Cybersecurity Law Blog
IT之家
IT之家
李成银的技术随笔
F
Fortinet All Blogs
The Hacker News
The Hacker News
Recent Announcements
Recent Announcements
Microsoft Azure Blog
Microsoft Azure Blog
M
Microsoft Research Blog - Microsoft Research
V
Vulnerabilities – Threatpost
T
Tenable Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
P
Proofpoint News Feed
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
C
CXSECURITY Database RSS Feed - CXSecurity.com
L
LangChain Blog
云风的 BLOG
云风的 BLOG
N
News | PayPal Newsroom
B
Blog RSS Feed
Malwarebytes
Malwarebytes
Attack and Defense Labs
Attack and Defense Labs
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
C
Cybersecurity and Infrastructure Security Agency CISA
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
L
LINUX DO - 最新话题
C
Cisco Blogs

DEV Community

How to Start Coding as a Student: A Complete Beginner’s Guide 🚀 Storing Kamal secrets in AWS Secrets Manager and deploying to a cheap Hetzner VPS What Are Buffers? Build AI Agents with Hot Dev The Client Onboarding Checklist That Prevents 90% of Project Problems Scalable Treasure Hunts Are a Myth, But We Almost Made One I built a ultra-polished developer portfolio template using React & Tailwind v4 (with zero-JSX configuration) Gemini CLI Is Dead. Here's the Better Thing That Replaced It Post-quantum cryptography for embedded and IoT: secure boot, TLS and OTA Understanding Optimistic Preloading in Modern Applications Nobody Wants to Read Your Code (And You Don't Want to Read Theirs) A clothing pairing app E2B vs E4B vs 31B Dense: The Practical Guide to Choosing the Right Gemma 4 Model I built an AI app store screenshot generator because Figma made me cry — looking for brutal feedback Hello DEV Community — My Developer Journey Begins Adaptable apps on ChromeOS: a post-mortem The WordPress Paradox: Why It’s Here to Stay (and How to Stop Ruining It) I built a local voice AI that can change to 9 different personalities! UXRay: I Built an AI That Roasts Your UI Like a Senior Designer Would Wyrly DI: Type-safe Dependency Injection for Modern TypeScript The contract is the interface: agent-driven Steampipe Stave in one command Gemma 4's Hidden Superpower: Why Built-in Thinking Tokens Change Everything for Evaluation Tasks ⚡ WordPress Performance: The Real Truth They Don't Tell You A Mobile App Usually Needs an Admin System First Customer Portals Should Remove Repeated Admin Work Episode 4: The Time Loop (Layers & Caching) I Built ContextForge with Gemma 4: A Project Memory Generator for Developers and AI Coding Agents Why shadow DOM beat iframe for inline tooltips HOW TO CREATE USER AND ASSIGN ROLES IN AZURE WITH ENTRA ID When AI Blackmail Goes Viral Episode 3: The Secret Scroll (The Dockerfile) Monte Carlo Simulation for Engineers: Turning Uncertainty Into Numbers The tokens-per-byte trap: character-level 'compression' adds tokens Nobody Reads Your Code Anymore Why I built a collection of 5 free, zero-signup career finance tools for solo builders 🚀 New React Challenge: Instant UI with useOptimistic Resolvendo a Alucinação da IA na Arquitetura de Software com Code Property Graphs e .NET 9 S1 — Clean Backtrace Crashes: How to Diagnose and Fix Them Cómo solucionar el bucle infinito en useEffect con objetos y arrays The Brutal Reality of Running Gemma 4 Locally I made Claude Code refuse to write code unless the ticket scores 80/100 I Fed React's Entire Hooks Transition History to Gemma 4. Here's What It Found That We Missed. Building a Private RAG System: Lessons from a Local-First AI Journal CodePulse AI — Reviving an AI-Powered Repository Intelligence Platform How to Split Video into Segments with FFmpeg (CLI + API) I've audited dozens of estate agency websites. The same 5 problems show up every single time. Part 1: Taming Asynchronous JavaScript: How to Build a "Mailbox" Queue Building My AI-Powered VS Code Extension 🚀 Google Login in Express with PassportJS & JWT Great example of Gemma 4 moving beyond chatbots into real-world decision support. Using AI to guide everyday actions like recycling shows how impactful applied LLMs can be when designed for usability, not just capability. #Gemma4 #AI #Sustainability Building a Production AI Chatbot for an Educational Institute: Architecture, Lessons & Full Stack Deep-Dive Google Login in Express with PassportJS & JWT How I reclaimed 47GB on my MacBook by cleaning developer project junk Operators Are Not Oracles: How We Learned to Stop Worrying and Love the Configuration I Built 6 Free Developer Tools for AI APIs, Cron, Docker, and Self-Hosting How I Built a Real-Time Precious Metals Price Feed for 30,000 Concurrent Users in Laravel How to Use a SERP API to Validate Whether a Project Idea Is Worth Building Gemma 4 discussions often focus on capability, but real-world impact depends on deployment context. For offline education, especially in low-connectivity regions, latency, cost, and local inference matter as much as model strength. Local Mind Explores it Space Complexity + Ω and Θ Notations Google I/O 2026 Just Confirmed the Shift From AI Chatbots to AI Agents How to Add API Monitoring to an Express App in 5 Minutes (2026) Designing an In-Game Inflation Tracking Algorithm for Web Utility Apps Google AI Studio Just Changed the Shape of App Development If you struggle to learn then this is for you. Best AI Agent Security & Guardrails Tools in 2026: LLM Guard vs NeMo vs Guardrails AI Building Dynamic RBAC in React 19: From Permission Strings to Component-Level Access Control How to Build a Self-Hosted AI Code Review Tool in Python Why We Switched from React to HTMX in Production: A 200-Site Case Study Gemma-Loom: The Intent-Based Virtual Machine (IVM) for Edge Sovereignty Java实习海投攻略:3天300个沟通,我是怎么拿到面试的 I Deployed Netflix's Web Server in 30 Seconds (And So Can You) - Docker Project 1 Debugging Android 14 WebRTC Disconnects on a coturn Relay Path 1/30 Days System Design Question Testing FastAPI + SQLAlchemy with Real PostgreSQL Fixtures: No More Mocking Misery FAQ Schema Markup Generators: What They Actually Do (and What They Don't Tell You) How a pure-TypeScript flex layout engine closed the last WASM-Yoga gap Spot instances as GitHub Actions runners Agents Need Receipts, Not Just Better Prompts readmegen — Generate beautiful README.md in seconds (12 templates, open source) When AI Reads Blueprints: The Hidden Attack Surface of Multimodal Engineering Intelligence Simplicity scales — complexity kills side projects AI does exactly what you ask — that's the problem How a model upgrade silently broke our extraction prompt (and how we caught it) The Best Form Backend for Static Sites in 2026 # ⛽ I Built a Cross-Platform Fuel Finder with React & Supabase: The Indie Dev Journey The 11 Major Cloud Service Providers in 2025 Membangun Karya Visual: Mengintip Fasilitas Multimedia dan Studio Kreatif Amikom What Is IOPS? Visualizing Database Design: From Interactive Canvas to Drizzle, Prisma, and SQL in Real-time A tool to make your GitHub README impossible to ignore 🚀 Zero-Downtime Blue-Green and IP-Based Canary Deployments on ECS Fargate I reproduced a Claude Code RCE. The bug pattern is everywhere. We Replaced Our RAG Pipeline With Persistent KV Cache. Here's What We Found. Jenkins CI/CD Pipeline for a Dockerized Node.js Application: Manual Trigger vs Automatic Trigger Using GitHub Webhooks How to Stream Live Forex Rates to Google Sheets API: A Complete Guide Small Models Will Beat Giant Models (And Most People Haven’t Realized Why Yet) How I Built 5 Linux Automation Scripts on AWS EC2 I built TokenPatch to measure AI coding cost per applied patch I built a Chrome extension to stop squinting at the web Producer audit clean, six tests red
Gemini 3.5 Flash Has a 1M Token Context Window. Here's What You Can Actually Build With It.
pulkitgovran · 2026-05-23 · via DEV Community

This is a submission for the Google I/O Writing Challenge


"1 million token context window" sits in every I/O recap summary. Then people move on.

It sounds like a spec-sheet number — impressive in the abstract, like a car rated for 700 horsepower. Sure. But what road are you actually driving on?

I want to make it concrete. Gemini 3.5 Flash shipped GA at Google I/O 2026. Here's what 1M context actually unlocks, with working code and one real experiment I ran.


What Shipped

Gemini 3.5 Flash is the first generally available model in the 3.5 series. GA on day one — no preview suffix, stable, ready for production.

Feature Value
Context window 1,000,000 tokens
Max output 65,000 tokens
Thinking Built-in
Speed ~4x faster than frontier models
Pricing $1.50 / 1M input · $9 / 1M output

The benchmark story: 3.5 Flash outperforms Gemini 3.1 Pro across almost all benchmarks, at 4x the speed. That's the classic Flash bet — you trade some ceiling on niche hard tasks for speed and cost everywhere else.

In my testing: requests that took 8–10 seconds on 3.1 Pro land in 2–3 seconds on 3.5 Flash. At scale, that's the difference between an interactive tool and a batch job.


Get Started in 3 Minutes

pip install google-genai

Enter fullscreen mode Exit fullscreen mode

Grab a free API key from AI Studio — no billing required to test.

from google import genai

client = genai.Client(api_key="YOUR_API_KEY")

response = client.models.generate_content(
    model="gemini-3.5-flash",
    contents="What's the most underrated pattern in async Python?",
)

print(response.text)

Enter fullscreen mode Exit fullscreen mode

That's the baseline. Now the part that matters.


What 1M Tokens Actually Lets You Do

One million tokens is roughly 750,000 words. That's:

  • The entire source code of a medium-sized web app
  • Six months of Slack export from a busy engineering channel
  • A 300-page legal agreement plus all its referenced attachments
  • A full year of support tickets

Previously, reasoning over a full codebase meant chunking it, embedding it, retrieving relevant pieces, and hoping retrieval didn't miss the thing that mattered.

With 1M context, you just send it. One call. The model sees everything simultaneously.

Bold opinion: Most "RAG pipeline" complexity is a workaround for insufficient context window. 1M tokens doesn't eliminate RAG entirely, but it eliminates a huge class of retrieval problems for the applications most developers are actually building.


Tutorial: Whole-Codebase Code Review

Here's a real use case: feed your entire project to Gemini 3.5 Flash and get a structured security review.

import os
from pathlib import Path
from google import genai

client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])

def load_codebase(root: str, extensions: list[str] = [".py", ".ts", ".js"]) -> str:
    parts = []
    for path in sorted(Path(root).rglob("*")):
        if path.suffix in extensions and ".git" not in path.parts:
            parts.append(f"\n\n### FILE: {path}\n")
            parts.append(path.read_text(errors="ignore"))
    return "".join(parts)

codebase = load_codebase("./src")

response = client.models.generate_content(
    model="gemini-3.5-flash",
    contents=f"""You are a security-focused code reviewer.

Review this entire codebase for:
1. SQL injection vulnerabilities
2. Unvalidated user input in system calls
3. Hardcoded secrets or credentials
4. Insecure direct object references
5. Missing authentication checks

For each issue: file path, severity (critical/high/medium/low), what's wrong, suggested fix.

Codebase:
{codebase}""",
)

print(response.text)

Enter fullscreen mode Exit fullscreen mode

One API call. No chunking, no retrieval pipeline, no missed cross-file context.

The model sees api/routes.py and middleware/auth.py simultaneously — it'll catch a vulnerability that's only exploitable because a check is missing in auth.py, which chunk-based retrieval would likely miss.


I Tried It: Security Review on UXRay

I ran this on my own project — UXRay, a ~3,000-line Next.js + TypeScript app.

The whole codebase fit in a single call with room to spare. Gemini 3.5 Flash returned:

  • 2 high-severity issues: missing rate limiting on the Playwright screenshot endpoint; base64 image data not sanitized before passing to the subprocess
  • 1 medium: API key readable from client-side bundle under certain Next.js config
  • 3 informational: minor input validation gaps, non-exhaustive error handling

The rate-limiting issue was real and I hadn't caught it. The client-side key issue was a valid config warning specific to my setup.

Total time: 14 seconds. For a codebase security review I'd normally spend an hour on.


Thinking Mode

Gemini 3.5 Flash has built-in thinking — the model reasons through a problem before producing its answer.

from google import genai
from google.genai import types

client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])

response = client.models.generate_content(
    model="gemini-3.5-flash",
    contents="Design a database schema for a multi-tenant SaaS with row-level security.",
    config=types.GenerateContentConfig(
        thinking_config=types.ThinkingConfig(
            thinking_budget=8192
        )
    ),
)

print(response.text)

Enter fullscreen mode Exit fullscreen mode

The Migration Gotcha Nobody Mentions

If you're coming from gemini-3-flash-preview, there's a silent behavior change.

The preview model's thinking defaulted to high. The GA model defaults to medium. Migrate without setting thinking_budget explicitly and the model quietly uses fewer thinking tokens — faster and cheaper, but less thorough on complex tasks.

Set it explicitly:

# Equivalent to old default (high)
thinking_config=types.ThinkingConfig(thinking_budget=16384)

# Faster/cheaper (new GA default)
thinking_config=types.ThinkingConfig(thinking_budget=4096)

Enter fullscreen mode Exit fullscreen mode

Don't leave this implicit in production. You will notice the output quality difference on anything that requires multi-step reasoning.


Structured Output (Machine-Readable Results)

The API supports constrained JSON output via response schema. The model outputs valid JSON matching your spec — no parsing heuristics, no regex, no retries.

import json
from google import genai
from google.genai import types

schema = {
    "type": "object",
    "properties": {
        "summary": {"type": "string"},
        "issues": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "file": {"type": "string"},
                    "severity": {
                        "type": "string",
                        "enum": ["critical", "high", "medium", "low"]
                    },
                    "description": {"type": "string"},
                    "fix": {"type": "string"},
                },
                "required": ["file", "severity", "description", "fix"]
            }
        },
        "risk_score": {"type": "integer", "minimum": 0, "maximum": 100}
    },
    "required": ["summary", "issues", "risk_score"]
}

response = client.models.generate_content(
    model="gemini-3.5-flash",
    contents=f"Security review:\n\n{codebase}",
    config=types.GenerateContentConfig(
        response_mime_type="application/json",
        response_schema=schema,
    ),
)

result = json.loads(response.text)
print(f"Risk score: {result['risk_score']}/100")
for issue in result["issues"]:
    print(f"[{issue['severity'].upper()}] {issue['file']}: {issue['description']}")

Enter fullscreen mode Exit fullscreen mode

Validate with Zod, Pydantic, or any schema library and you can render the output directly in a UI without post-processing.


What You Can Actually Build Now

The 1M context + structured output + thinking combination makes a category of applications practical that weren't before:

Whole-codebase refactoring advisor. Ask for a prioritized list of refactors with cross-file impact analysis. No chunking.

Full contract analysis. A 300-page agreement fits easily. Ask for all clauses that limit liability, conflict with your agreements, or require notice periods — across the entire document at once.

Support ticket patterns. Six months of tickets in one prompt. "What are the top 5 root causes of customer friction?" across all of them.

End-to-end PR review. Send the full diff and the codebase it applies to. The model evaluates whether the change breaks invariants elsewhere, not just whether the diff is internally correct.

Bold opinion: The PR review use case alone justifies integrating Gemini 3.5 Flash into CI. A model that can see the full codebase context when reviewing a diff will catch things that diff-only review structurally cannot — and at 14 seconds, it's fast enough to be a non-blocking CI step.


Get the API Key

AI Studio → sign in → API Keys → Create. Free tier, no billing required to test.

Model ID: gemini-3.5-flash. No suffix, no preview. That's the GA signal.


Gemini 3.5 Flash docs at ai.google.dev. Quickstart at Google AI for Developers.

Tags: googleio gemini ai python tutorial