Gemini 3.5 Flash Has a 1M Token Context Window. Here's What You Can Actually Build With It.

This is a submission for the Google I/O Writing Challenge

"1 million token context window" sits in every I/O recap summary. Then people move on.

It sounds like a spec-sheet number — impressive in the abstract, like a car rated for 700 horsepower. Sure. But what road are you actually driving on?

I want to make it concrete. Gemini 3.5 Flash shipped GA at Google I/O 2026. Here's what 1M context actually unlocks, with working code and one real experiment I ran.

What Shipped

Gemini 3.5 Flash is the first generally available model in the 3.5 series. GA on day one — no preview suffix, stable, ready for production.

Feature	Value
Context window	1,000,000 tokens
Max output	65,000 tokens
Thinking	Built-in
Speed	~4x faster than frontier models
Pricing	$1.50 / 1M input · $9 / 1M output

The benchmark story: 3.5 Flash outperforms Gemini 3.1 Pro across almost all benchmarks, at 4x the speed. That's the classic Flash bet — you trade some ceiling on niche hard tasks for speed and cost everywhere else.

In my testing: requests that took 8–10 seconds on 3.1 Pro land in 2–3 seconds on 3.5 Flash. At scale, that's the difference between an interactive tool and a batch job.

Get Started in 3 Minutes

pip install google-genai

Grab a free API key from AI Studio — no billing required to test.

from google import genai

client = genai.Client(api_key="YOUR_API_KEY")

response = client.models.generate_content(
    model="gemini-3.5-flash",
    contents="What's the most underrated pattern in async Python?",
)

print(response.text)

That's the baseline. Now the part that matters.

What 1M Tokens Actually Lets You Do

One million tokens is roughly 750,000 words. That's:

The entire source code of a medium-sized web app
Six months of Slack export from a busy engineering channel
A 300-page legal agreement plus all its referenced attachments
A full year of support tickets

Previously, reasoning over a full codebase meant chunking it, embedding it, retrieving relevant pieces, and hoping retrieval didn't miss the thing that mattered.

With 1M context, you just send it. One call. The model sees everything simultaneously.

Bold opinion: Most "RAG pipeline" complexity is a workaround for insufficient context window. 1M tokens doesn't eliminate RAG entirely, but it eliminates a huge class of retrieval problems for the applications most developers are actually building.

Tutorial: Whole-Codebase Code Review

Here's a real use case: feed your entire project to Gemini 3.5 Flash and get a structured security review.

import os
from pathlib import Path
from google import genai

client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])

def load_codebase(root: str, extensions: list[str] = [".py", ".ts", ".js"]) -> str:
    parts = []
    for path in sorted(Path(root).rglob("*")):
        if path.suffix in extensions and ".git" not in path.parts:
            parts.append(f"\n\n### FILE: {path}\n")
            parts.append(path.read_text(errors="ignore"))
    return "".join(parts)

codebase = load_codebase("./src")

response = client.models.generate_content(
    model="gemini-3.5-flash",
    contents=f"""You are a security-focused code reviewer.

Review this entire codebase for:
1. SQL injection vulnerabilities
2. Unvalidated user input in system calls
3. Hardcoded secrets or credentials
4. Insecure direct object references
5. Missing authentication checks

For each issue: file path, severity (critical/high/medium/low), what's wrong, suggested fix.

Codebase:
{codebase}""",
)

print(response.text)

One API call. No chunking, no retrieval pipeline, no missed cross-file context.

The model sees api/routes.py and middleware/auth.py simultaneously — it'll catch a vulnerability that's only exploitable because a check is missing in auth.py, which chunk-based retrieval would likely miss.

I Tried It: Security Review on UXRay

I ran this on my own project — UXRay, a ~3,000-line Next.js + TypeScript app.

The whole codebase fit in a single call with room to spare. Gemini 3.5 Flash returned:

2 high-severity issues: missing rate limiting on the Playwright screenshot endpoint; base64 image data not sanitized before passing to the subprocess
1 medium: API key readable from client-side bundle under certain Next.js config
3 informational: minor input validation gaps, non-exhaustive error handling

The rate-limiting issue was real and I hadn't caught it. The client-side key issue was a valid config warning specific to my setup.

Total time: 14 seconds. For a codebase security review I'd normally spend an hour on.

Thinking Mode

Gemini 3.5 Flash has built-in thinking — the model reasons through a problem before producing its answer.

from google import genai
from google.genai import types

client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])

response = client.models.generate_content(
    model="gemini-3.5-flash",
    contents="Design a database schema for a multi-tenant SaaS with row-level security.",
    config=types.GenerateContentConfig(
        thinking_config=types.ThinkingConfig(
            thinking_budget=8192
        )
    ),
)

print(response.text)

The Migration Gotcha Nobody Mentions

If you're coming from gemini-3-flash-preview, there's a silent behavior change.

The preview model's thinking defaulted to high. The GA model defaults to medium. Migrate without setting thinking_budget explicitly and the model quietly uses fewer thinking tokens — faster and cheaper, but less thorough on complex tasks.

Set it explicitly:

# Equivalent to old default (high)
thinking_config=types.ThinkingConfig(thinking_budget=16384)

# Faster/cheaper (new GA default)
thinking_config=types.ThinkingConfig(thinking_budget=4096)

Don't leave this implicit in production. You will notice the output quality difference on anything that requires multi-step reasoning.

Structured Output (Machine-Readable Results)

The API supports constrained JSON output via response schema. The model outputs valid JSON matching your spec — no parsing heuristics, no regex, no retries.

import json
from google import genai
from google.genai import types

schema = {
    "type": "object",
    "properties": {
        "summary": {"type": "string"},
        "issues": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "file": {"type": "string"},
                    "severity": {
                        "type": "string",
                        "enum": ["critical", "high", "medium", "low"]
                    },
                    "description": {"type": "string"},
                    "fix": {"type": "string"},
                },
                "required": ["file", "severity", "description", "fix"]
            }
        },
        "risk_score": {"type": "integer", "minimum": 0, "maximum": 100}
    },
    "required": ["summary", "issues", "risk_score"]
}

response = client.models.generate_content(
    model="gemini-3.5-flash",
    contents=f"Security review:\n\n{codebase}",
    config=types.GenerateContentConfig(
        response_mime_type="application/json",
        response_schema=schema,
    ),
)

result = json.loads(response.text)
print(f"Risk score: {result['risk_score']}/100")
for issue in result["issues"]:
    print(f"[{issue['severity'].upper()}] {issue['file']}: {issue['description']}")

Validate with Zod, Pydantic, or any schema library and you can render the output directly in a UI without post-processing.

What You Can Actually Build Now

The 1M context + structured output + thinking combination makes a category of applications practical that weren't before:

Whole-codebase refactoring advisor. Ask for a prioritized list of refactors with cross-file impact analysis. No chunking.

Full contract analysis. A 300-page agreement fits easily. Ask for all clauses that limit liability, conflict with your agreements, or require notice periods — across the entire document at once.

Support ticket patterns. Six months of tickets in one prompt. "What are the top 5 root causes of customer friction?" across all of them.

End-to-end PR review. Send the full diff and the codebase it applies to. The model evaluates whether the change breaks invariants elsewhere, not just whether the diff is internally correct.

Bold opinion: The PR review use case alone justifies integrating Gemini 3.5 Flash into CI. A model that can see the full codebase context when reviewing a diff will catch things that diff-only review structurally cannot — and at 14 seconds, it's fast enough to be a non-blocking CI step.

Get the API Key

AI Studio → sign in → API Keys → Create. Free tier, no billing required to test.

Model ID: gemini-3.5-flash. No suffix, no preview. That's the GA signal.

Gemini 3.5 Flash docs at ai.google.dev. Quickstart at Google AI for Developers.

Tags: googleio gemini ai python tutorial

推荐订阅源

DEV Community