Voilaa! — Turning Any YouTube Video into an Interactive Learning App with Google Gemini
This post is my submission for DEV Education Track: Build Apps with Google AI Studio.
What I Built
Voilaa! is a full-stack educational playground that transforms any YouTube video into a rich, interactive learning experience — think live quizzes, flashcard decks, formula simulators, and data visualizations — all generated on-the-fly by Google Gemini AI.
The idea is simple: paste a YouTube URL, choose your academic depth, and within seconds Gemini analyzes the video's content and synthesizes a fully functional, self-contained interactive HTML learning app tailored to that exact lesson.
How It Works
The magic is a two-stage AI chain running entirely on the server side:
Stage 1 — Semantic Analyst (Gemini 1.5 Flash / Pro)
The first model acts as a pedagogist. It watches the video and produces a structured JSON payload containing:
-
spec— A self-contained curriculum blueprint describing the app's core mechanics, interactions, and educational goals. -
flashcards— At least 5 key terms + definitions extracted from the video for active recall.
The prompt I crafted for this stage was the most important piece of the whole project:
You are a pedagogist and product designer with deep expertise in crafting
engaging learning experiences via interactive web apps.
Examine the contents of the attached video. Then, provide the following in JSON:
1. "spec": A detailed spec for an interactive web app designed to complement
the video and reinforce its key ideas. The spec must be thorough and
self-contained (must not mention it is based on a video).
2. "flashcards": A list of at least 5 key terms and concise definitions
extracted from the video.
The goal of the app is to enhance understanding through simple and playful
design. A junior web developer should be able to implement it in a single
HTML file (with all styles and scripts inline). The spec must clearly outline
the core mechanics, and those mechanics must be highly effective in reinforcing
the video's key ideas.
Stage 2 — Software Architect (Gemini 1.5 Pro)
The second model receives the spec and generates a pristine, single-file HTML/CSS/JS application — no frameworks, no external dependencies — ready to run inside a sandboxed iframe.
Key prompt controls exposed to the user:
- Temperature — Controls AI creativity (0.0 → 1.0)
- Academic Intensity — Concise, Balanced, or Detailed lesson depth
- Model Selection — Swap between Flash and Pro at each stage
The Tech Stack
| Layer | Technology |
|---|---|
| Frontend | React 18 + Vite (SPA) |
| Styling | Tailwind CSS + motion animations |
| Code Editor | Monaco Editor (same engine as VS Code) |
| Charts | Recharts |
| Icons | Lucide React |
| AI |
@google/genai TypeScript SDK |
| Backend | Node.js + Express 5 |
| Runtime |
tsx (direct TypeScript execution) |
The Gemini API key lives exclusively on the server — never exposed to the client bundle.
The Workspace
Once a learning app is generated, users get a three-tab workspace:
🖥️ Render Tab
A live sandboxed <iframe> running the generated app — fully interactive, no page reload required.
📝 Source Tab
A full Monaco Editor showing (and letting you edit) the raw generated HTML/JS/CSS. Any saved changes hot-reload the preview instantly.
📋 Spec Tab
Inspect or edit the curriculum blueprint produced by the Semantic Analyst — great for prompting a regeneration with tweaks.
There's also a Zen Mode (fades surrounding UI to focus on the lesson) and Fullscreen Mode for distraction-free study.
Demo
🔗 Live App → (https://voilaa-498153626537.us-west1.run.app/)
Example: Paste a YouTube tutorial on music theory
→ Gemini analyzes chord progressions, tension, and resolution
→ Generates an interactive piano simulator with chord-click feedback
→ Flashcard deck covers: Tonic, Dominant, Leading Tone, Cadence, Voice Leading
Example: Paste a YouTube lecture on sorting algorithms
→ Gemini generates a step-by-step animated bubble sort / merge sort visualizer
→ Flashcards cover time complexity, in-place sorting, stability, etc.
My Experience
What surprised me most
I expected the hardest part to be the frontend sandbox mechanics. It wasn't. The hardest part was prompt engineering the Semantic Analyst.
Early versions of the spec prompt produced specs that were either too vague ("make an interactive quiz") or too ambitious ("build a multi-page React app with a backend"). The breakthrough was adding the constraint:
"A junior web developer should be able to implement it in a single HTML file."
This single sentence dramatically improved output quality — Gemini started producing specs with clearly scoped, concrete mechanics instead of wishful thinking.
What I learned
Two-model chains unlock quality you can't get from one prompt. Separating "think about what to build" from "write the code" produced dramatically better results. The planning model could focus entirely on pedagogy; the coding model could focus entirely on implementation.
Temperature matters more than model choice for creative educational content. A temperature of ~0.75 produced the most varied and playful learning apps, while staying coherent.
Keeping the API key server-side is non-negotiable. Even for a hackathon demo, having Express proxy all Gemini calls protects your quota and prevents key leakage.
Sandboxed iframes are underrated. Running user-generated HTML inside
<iframe sandbox="allow-scripts">meant I could ship AI-generated code directly to the browser without worrying about XSS or DOM pollution.
What I'd build next
- YouTube transcript API integration — Right now Gemini infers video content from the URL + title. Native transcript ingestion would let the Semantic Analyst work with the full verbatim script.
- Lesson history — Save and revisit previously generated apps per video.
- Share links — Let users publish their generated learning apps with a short URL.
- Collaborative editing — Let study groups co-edit the spec and regenerate together.
Voilaa! was a genuinely fun project to build. The combination of Gemini's multimodal understanding and the flexibility of the @google/genai SDK made what could have been a complex AI integration feel surprisingly clean. If you've got a YouTube rabbit hole you're currently lost in — try turning it into an interactive lesson instead. 🎬✨





















