Google I/O 2026: AI Built an OS in 12 Hours. I Spent Mine Sorting Screenshots. 🤦

This is a submission for the Google I/O Writing Challenge

I haven't watched a tech keynote in a really long time. They usually feel like 2 hours of "the future is here!" slides with product demos that never work on stage.

But Google I/O 2026 actually got me. Watched the whole thing and came away genuinely excited and slightly stressed about how fast things are moving.

They Built a Full OS in 12 Hours

Google's team built a full blown OS using Gemini agents. 93 sub-agents, 15K model requests, 2.6 billion tokens, all under $1,000 in credits. Twelve hours total.

Less than a thousand dollars. For 2.6 billion tokens. That's wild efficiency.

Gemini 2.5 Flash clocks 200 tokens/second output speed. Claude Sonnet sits around 40-60/sec. That gap is huge when you're running 93 agents in parallel and explains why this was even possible in that timeframe.

Gemini Spark Felt Like a MoltBot Alternative

Gemini Spark runs agents in a secure private cloud fully managed by Google. You don't worry about infrastructure, just give it tasks. The closest thing I could think of was MoltBot but living entirely inside Google's ecosystem.

Agentic AI you don't have to host, secure, or manage yourself sounds genuinely appealing. Haven't gotten deep access yet but it's on my list.

Search Is Getting Agentic

For years Google Search has been a box you type into. Now it's a box that talks back, remembers, and proactively reaches out.

They showed concert updates for your city. You can ask Search to notify you when shows for an artist you like get announced. I live in Mumbai and would love to test how well this actually works here because currently I find out about shows after tickets are sold out.

AI autocomplete that doesn't just finish your query but suggests better ones based on intent. Proactive updates that follow up on your interests without you asking again. Search is slowly becoming something closer to a smart assistant.

Yes, Google, take my data, please.

Agents Are Buying Coffee Now

This was the part that felt most science-fiction-turned-real.

Universal Commerce Protocol lets AI agents interact with e-commerce systems. Agent Payments Protocol lets them actually complete payments on your behalf.

The live demo had an agent order coffee through DoorDash, selecting the item, going through checkout, completing payment. No human in the loop at any point.

Picture this: you tell your AI "book me a flight to Bangalore next Friday under ₹8,000, window seat" and it just... does it. Hotels too. Payments too.

Is this a privacy nightmare waiting to happen? Probably worth thinking about. But as a demo, genuinely impressive.

Gemini Live Replied in Haryanvi and I Was Not Ready

Small moment but a meaningful one. During the Gemini Live demo, the model replied in Haryanvi, a regional dialect spoken mostly in Haryana, India. Not just Hindi. Haryanvi. That was cool to see.

Work Onward (Inspiring real Antigravity and Stitch work usage case study)

Holly Jooyoung Diamond built Work Onward using Antigravity and Stitch. The problem she was solving was real: how do restaurant owners post jobs easily?

Job postings via SMS so owners don't need to sit at a laptop filling out forms. Multilingual job descriptions automatically so the listings open up to more candidates immediately.

Really inspiring to see that the access to build ideas is with everyone now.

Small thing but I actually used Gemini myself while writing this article to fetch the specific timestamp of the Work Onward demo from the keynote video because I had forgotten the details after watching. That worked surprisingly well.

WeatherNext Predicted a Category 5 Hurricane 3 Days Out

WeatherNext predicted a Category 5 hurricane in Jamaica three full days before it happened.

3-day advance warnings at category-level accuracy are the kind of thing that saves lives. Not a dev tool but a reminder that the same underlying models powering our autocomplete are doing genuinely important work elsewhere.

Fine-Tuning Gemma 4 with Antigravity

This is the one I keep coming back to.

Google showed fine-tuning Gemma 4 directly from the Antigravity CLI for custom use cases. I haven't worked much with local models yet but the cost of calling a large cloud LLM for every single query adds up fast, especially for repetitive domain-specific tasks.

If you're building a product that does the same type of classification or extraction thousands of times a day, running that against a fine-tuned Gemma locally is far cheaper than hitting a frontier API each time. That's the promise here.

I want to try this. Will write about it when I do.

Playing With the Antigravity CLI

Enough keynote recap. Here's what I actually tried.

Installing it

curl -fsSL https://antigravity.google/cli/install.sh | bash

One thing I hit right away: after installation, typing agy or antigravity opened the Antigravity IDE instead of the CLI. Turned out I had an older IDE version installed and its PATH entry was winning.

Had to manually remove the old PATH entry from ~/.zshrc and re-source it. After that the CLI came up fine. Not sure if it's a bug or just my setup, but if you hit the same thing, check your .zshrc for conflicting PATH entries before assuming something is broken.

Organizing 200+ Screenshots

I had a Desktop full of screenshots going back two years. Totally unorganized, no naming convention, nothing. I thought, let's see what Antigravity does with this.

My prompt was: organize my screenshots, categorize them into folders, and give them meaningful names.

Gemini came back with a solid plan: a Swift OCR tool using macOS's native Vision Framework to extract text from each screenshot, paired with a Python script that classifies them into folders (Coding, Meetings, AI-Assistants, Communication, Finance, Design, Media, General) and renames them with date and keyword info like 2024-12-15_Brave_GitHub_GoogleMeet.png.

Using macOS's native Vision Framework instead of a third-party OCR library was a smart call, zero extra dependencies.

I gave the green flag and it started running. Midway through the dry run I got impatient and used the /btw command to check progress without interrupting the session. That's a genuinely useful feature, like tapping someone on the shoulder to ask "hey how far along are you" without stopping their work.

Files got organized in the end but honestly OCR alone isn't enough context to make great categorization decisions. A screenshot of a GitHub PR and a screenshot of VS Code might have similar text but very different purposes. Some files ended up in slightly wrong folders.

Not the model's fault, it's a genuinely hard problem. But it got me thinking: if the agent could actually see the screenshot using vision instead of just reading extracted text, the categories would be way more accurate.

Modern Web Guidance Plugin

agy plugin install https://github.com/GoogleChrome/modern-web-guidance

This gives Antigravity context about modern web best practices, similar to what the Chrome Modern Web Guidance docs cover for Claude Code.

Using the CLI feels noticeably faster than AI-powered IDEs. No Electron overhead, no waiting for a UI to re-render. Just your terminal, the model, and results.

Every time I use a native CLI tool I wonder why Teams and VS Code are built on Electron. I get the history. Still.

I tried redesigning a section of my portfolio to add a tech stack display with icons. Gave it a screenshot reference and it couldn't nail the visual. Gave it a full URL to a reference site and the result was still not great honestly. Not sure if my prompts were bad or if this is just a CLI vs IDE thing, because the IDE felt like it gave better results. I don't know, need to experiment more.

Gemini Omni and SynthID

Almost forgot: Gemini Omni now has an evolved sense of physics for generating stylized videos. Generated video that actually behaves like it understands how things move and interact.

Also SynthID and C2PA credentials for detecting AI-generated content. As generated media gets better, tooling to authenticate what's real becomes critical infrastructure. Good to see it being built in at the platform level.

Final Thoughts

Google I/O 2026 didn't feel like a hype keynote. It felt like a company that had spent a year quietly building and was now ready to show it.

Fine-tuning Gemma 4 is the thing I most want to play with.

The agents-everywhere story is clearly where things are heading. The question is whether the underlying protocols stay open enough that indie devs can build on top of them. Hoping they do.

What was your favourite part of Google I/O 2026? Let me know in the comments, especially if you've played with Antigravity or Gemini Spark!

推荐订阅源

DEV Community