Drop in a long video — podcast, interview, talk, stream — and VibeClip cuts it into vertical, captioned, ready-to-post shorts. Then you refine every clip by chatting: “make clip 2 punchier,” “bigger captions,” “add a zoom at 0:05,” “undo.”
Quick start · Features · How it works · Bring your own key · Configuration · Contributing
Left: the raw clip. Right: after one sentence — “make it mrbeast style and add gameplay underneath” — captioned, reframed to 9:16, and split-screened. Real pipeline output, not a mockup.
Footage: Andy Dickinson (CC-BY) · gameplay: Orbital - No Copyright Gameplay (CC-BY) · Minecraft © Mojang.
⚡ Quick start
Spin up a private instance in three commands. All you add is one LLM key.
git clone https://github.com/oktaydbk54/vibeclip.git cd vibeclip cp .env.example .env # add ONE line: OPENAI_API_KEY=sk-... docker compose up -d --build # → open http://localhost:8765
With the defaults (EMAIL_MODE=console, REQUIRE_EMAIL_VERIFICATION=false) sign-up logs
you straight in — no email provider needed. Bring an OpenAI or DeepSeek key
(DeepSeek is the cheap one), or point LLM_BASE_URL at any OpenAI-compatible server
(Ollama, LM Studio, OpenRouter…). Prefer no Docker? See local install.
✨ What it does
| 🎬 Long → shorts, automatically | Transcribes on-device, scores the strongest moments (hook / flow / value — not a dumb keyword scan), reframes to 9:16 around the speaker, and burns word-synced captions. |
| 💬 Edit by chatting | A tool-calling agent turns plain language into real edits — trims, filler-word removal (“uhh”/“ee”), zooms, styles, music, b-roll, brand overlays. One undo reverts a whole multi-step plan. |
| 🎨 Styles in one shot | hormozi, mrbeast, podcast_minimal, kinetic — captions, pace, zoom, music and SFX applied together. Drop in your own preset as a JSON file. |
| 🖥️ A real studio UI | Web app with a live 9:16 preview, clip cards, a CapCut-style timeline, and the chat copilot right beside it. |
| 🔑 Your key, your data | Bring your own LLM key (OpenAI · Gemini · Claude · DeepSeek · any compatible endpoint). Nothing is proxied through us — there is no “us.” |
| 🏠 Self-host first | One Docker command. Speech-to-text and every render run locally via faster-whisper + ffmpeg. AGPL-3.0, no SaaS lock-in. |
🛠 How it works
upload
│
┌──────▼───────┐ faster-whisper (local, no API key)
│ transcribe │
└──────┬───────┘
┌──────▼────────────┐ LLM "brain" (your key) — structure + scored moments
│ analyze structure │
│ find highlights │
└──────┬────────────┘
┌──────▼───────┐ per clip, replayed from cached intermediates (~2–4s/edit)
│ auto edit │ jumpcut → 9:16 reframe → captions → music+ambience (ducked)
│ │ → SFX → fades · then your chat commands layer on top
└──────┬───────┘
export → vertical MP4, publish-ready
Only two things ever hit the network: your chosen LLM (to understand intent and score moments) and, optionally, Pexels (stock b-roll). Speech-to-text and all rendering stay on your machine.
🔑 Bring your own key (BYOK)
VibeClip never ships with a key and never proxies your prompts anywhere except the provider you choose. Two ways to supply one:
- Per instance — set
OPENAI_API_KEY(orDEEPSEEK_API_KEY, or any OpenAI-compatible endpoint viaLLM_BASE_URL) in.env. - Per user — each account pastes its own key on the in-app Settings page, with a live test-connection. Keys are encrypted at rest and never sent back to the browser.
| Provider | Routed via | Notes |
|---|---|---|
| OpenAI | native | Default, best-supported. |
| DeepSeek | native | The budget pick — a typical short costs a few cents. |
| Google Gemini | OpenAI-compat endpoint | gemini-2.5-flash / pro. |
| Anthropic Claude | OpenAI-compat endpoint | claude-haiku / sonnet. |
| Anything else | LLM_BASE_URL |
Ollama, LM Studio, OpenRouter, your own proxy… |
Speech-to-text runs locally and needs no key.
⚙️ Configuration
Everything is driven by .env (see .env.example for the full, commented list). The ones
that matter most:
| Variable | Default | Purpose |
|---|---|---|
OPENAI_API_KEY |
— | Your LLM key (preferred). |
DEEPSEEK_API_KEY |
— | Cheaper fallback, used if no OpenAI key. |
LLM_BASE_URL |
— | Any OpenAI-compatible endpoint (local models, proxies). |
EMAIL_MODE |
console |
console prints OTP to the log; resend sends real email. |
REQUIRE_EMAIL_VERIFICATION |
false |
true enforces email confirmation (public instances). |
HOSTED_STUDIO |
true |
true = the landing offers login/signup (use your own instance). false = a public marketing site that points everyone to GitHub to self-host (no login). |
GA_MEASUREMENT_ID |
— | Empty = no analytics injected (self-host default). |
SITE_URL |
http://localhost:8765 |
Public base URL for blog canonical/OG/sitemap. |
VIDEO_ENCODER |
libx264 |
Use h264_videotoolbox on Apple Silicon. |
VIBECLIP_BIND |
127.0.0.1 |
docker-compose publish address (0.0.0.0 to expose). |
MAX_UPLOAD_SECONDS |
0 |
Longest uploadable video, seconds. 0 = no limit (self-host). |
MAX_PROJECTS_PER_USER |
0 |
Projects per account. 0 = unlimited; cap it on a public instance. |
Run without Docker
Requirements: Python 3.12+, ffmpeg, and the DejaVu fonts (for caption rendering).
cp .env.example .env # add your LLM key uv sync # or: pip install -e . python -m chat.app # → http://127.0.0.1:8765
First run downloads the Whisper model. Prefer the terminal? python -m chat.cli <video.mp4>.
📦 Bundled assets & licensing
The repo bundles a small library of royalty-free media (music, ambience, SFX, demo
footage) for the built-in styles. Some tracks are CC-BY (Kevin MacLeod) and require
crediting in your video description — see the CREDITS files under assets/. VibeClip
never bundles or uses copyrighted/branded game footage.
🤝 Contributing
Issues and PRs welcome — start with CONTRIBUTING.md. Security reports:
see SECURITY.md. Be excellent to each other (code of conduct).
📄 License
GNU AGPL-3.0 — see LICENSE. You can self-host and modify VibeClip freely;
if you run a modified version as a network service, you must offer that modified source to
its users. Copyright © 2026 the VibeClip authors.
Built for people who'd rather talk to their editor than fight it.




























