Six months ago I started Keyvello (keyvello.com) — an AI video generator that turns a prompt into a complete short-form video in 2–5 minutes. Here's the technical breakdown for fellow builders.
The problem
Faceless creators on TikTok / YouTube Shorts / Reels spend 2–4 hours per video on scripting, voiceovers, B-roll, captions, and editing. Most burn out before they post 10 videos.
The stack
- Frontend: Next.js 16, React 19, TypeScript, Tailwind CSS 4, Radix UI
- Backend: Next.js API Routes (App Router)
- DB: Supabase (Postgres + Auth + RLS)
- AI: GPT-5.5 for scripts, Fal.ai for images, ElevenLabs for voices
-
Video: FFmpeg via
fluent-ffmpeg, Sharp for image processing - Storage: Cloudflare R2 (S3-compatible)
- Payments: Dodo Payments
- Compute: Vercel for the app, Modal for the video pipelines
- State: Zustand
The pipeline
prompt → GPT-4o script → scene splitter → parallel(Flux images + ElevenLabs audio) → FFmpeg composition (Modal) → R2 upload → status update
What surprised me
- Modal beats running FFmpeg in Vercel. Cold starts on Vercel functions made 60s+ videos impossible. Modal webhooks solved it.
- RLS is non-negotiable from day one. Retro-fitting row-level security at 1K users is painful.
-
Credit refunds need their own RPC. I hit a silent failure with
increment_user_creditsgetting blocked by a trigger. Useadd_creditsinstead. - Users want templates, not raw control. I shipped a "blank canvas" mode early. Nobody used it. The 11 named templates (AI Stories, Fake Texts, Stick Animation, etc.) do 95% of generations.
What's next
Better lipsync for the talking-avatar templates. Tighter cost controls per template tier. Affiliate program.
If you're building something in AI video, would love to compare notes — drop a comment.






















