Voilaa! — Turning Any YouTube Video into an Interactive Learning App with Google Gemini

Education Track: Build Apps with Google AI Studio

Voilaa! — Turning Any YouTube Video into an Interactive Learning App with Google Gemini

This post is my submission for DEV Education Track: Build Apps with Google AI Studio.

What I Built

Voilaa! is a full-stack educational playground that transforms any YouTube video into a rich, interactive learning experience — think live quizzes, flashcard decks, formula simulators, and data visualizations — all generated on-the-fly by Google Gemini AI.

The idea is simple: paste a YouTube URL, choose your academic depth, and within seconds Gemini analyzes the video's content and synthesizes a fully functional, self-contained interactive HTML learning app tailored to that exact lesson.

How It Works

The magic is a two-stage AI chain running entirely on the server side:

Stage 1 — Semantic Analyst (Gemini 1.5 Flash / Pro)

The first model acts as a pedagogist. It watches the video and produces a structured JSON payload containing:

spec — A self-contained curriculum blueprint describing the app's core mechanics, interactions, and educational goals.
flashcards — At least 5 key terms + definitions extracted from the video for active recall.

The prompt I crafted for this stage was the most important piece of the whole project:

You are a pedagogist and product designer with deep expertise in crafting 
engaging learning experiences via interactive web apps.

Examine the contents of the attached video. Then, provide the following in JSON:
1. "spec": A detailed spec for an interactive web app designed to complement 
   the video and reinforce its key ideas. The spec must be thorough and 
   self-contained (must not mention it is based on a video).
2. "flashcards": A list of at least 5 key terms and concise definitions 
   extracted from the video.

The goal of the app is to enhance understanding through simple and playful 
design. A junior web developer should be able to implement it in a single 
HTML file (with all styles and scripts inline). The spec must clearly outline 
the core mechanics, and those mechanics must be highly effective in reinforcing 
the video's key ideas.

Stage 2 — Software Architect (Gemini 1.5 Pro)

The second model receives the spec and generates a pristine, single-file HTML/CSS/JS application — no frameworks, no external dependencies — ready to run inside a sandboxed iframe.

Key prompt controls exposed to the user:

Temperature — Controls AI creativity (0.0 → 1.0)
Academic Intensity — Concise, Balanced, or Detailed lesson depth
Model Selection — Swap between Flash and Pro at each stage

The Tech Stack

Layer	Technology
Frontend	React 18 + Vite (SPA)
Styling	Tailwind CSS + motion animations
Code Editor	Monaco Editor (same engine as VS Code)
Charts	Recharts
Icons	Lucide React
AI	`@google/genai` TypeScript SDK
Backend	Node.js + Express 5
Runtime	`tsx` (direct TypeScript execution)

The Gemini API key lives exclusively on the server — never exposed to the client bundle.

The Workspace

Once a learning app is generated, users get a three-tab workspace:

🖥️ Render Tab

A live sandboxed <iframe> running the generated app — fully interactive, no page reload required.

📝 Source Tab

A full Monaco Editor showing (and letting you edit) the raw generated HTML/JS/CSS. Any saved changes hot-reload the preview instantly.

📋 Spec Tab

Inspect or edit the curriculum blueprint produced by the Semantic Analyst — great for prompting a regeneration with tweaks.

There's also a Zen Mode (fades surrounding UI to focus on the lesson) and Fullscreen Mode for distraction-free study.

Demo

🔗 Live App → (https://voilaa-498153626537.us-west1.run.app/)

Example: Paste a YouTube tutorial on music theory

→ Gemini analyzes chord progressions, tension, and resolution

→ Generates an interactive piano simulator with chord-click feedback

→ Flashcard deck covers: Tonic, Dominant, Leading Tone, Cadence, Voice Leading

Example: Paste a YouTube lecture on sorting algorithms

→ Gemini generates a step-by-step animated bubble sort / merge sort visualizer

→ Flashcards cover time complexity, in-place sorting, stability, etc.

My Experience

What surprised me most

I expected the hardest part to be the frontend sandbox mechanics. It wasn't. The hardest part was prompt engineering the Semantic Analyst.

Early versions of the spec prompt produced specs that were either too vague ("make an interactive quiz") or too ambitious ("build a multi-page React app with a backend"). The breakthrough was adding the constraint:

"A junior web developer should be able to implement it in a single HTML file."

This single sentence dramatically improved output quality — Gemini started producing specs with clearly scoped, concrete mechanics instead of wishful thinking.

What I learned

Two-model chains unlock quality you can't get from one prompt. Separating "think about what to build" from "write the code" produced dramatically better results. The planning model could focus entirely on pedagogy; the coding model could focus entirely on implementation.
Temperature matters more than model choice for creative educational content. A temperature of ~0.75 produced the most varied and playful learning apps, while staying coherent.
Keeping the API key server-side is non-negotiable. Even for a hackathon demo, having Express proxy all Gemini calls protects your quota and prevents key leakage.
Sandboxed iframes are underrated. Running user-generated HTML inside <iframe sandbox="allow-scripts"> meant I could ship AI-generated code directly to the browser without worrying about XSS or DOM pollution.

What I'd build next

YouTube transcript API integration — Right now Gemini infers video content from the URL + title. Native transcript ingestion would let the Semantic Analyst work with the full verbatim script.
Lesson history — Save and revisit previously generated apps per video.
Share links — Let users publish their generated learning apps with a short URL.
Collaborative editing — Let study groups co-edit the spec and regenerate together.

Voilaa! was a genuinely fun project to build. The combination of Gemini's multimodal understanding and the flexibility of the @google/genai SDK made what could have been a complex AI integration feel surprisingly clean. If you've got a YouTube rabbit hole you're currently lost in — try turning it into an interactive lesson instead. 🎬✨

推荐订阅源

DEV Community