OpenBrief Review: Local-First Video AI Summarizer 2026

Originally published on andrew.ooo — visit the original for any updates, code snippets that aged out, or follow-up posts.

TL;DR

OpenBrief is an open-source desktop app that does what most people actually want from a "video AI tool": paste a link, get a transcript, get a grounded summary, and chat with the content — without uploading anything to a SaaS. It hit 88 points on Show HN on May 26, 2026, and it's basically a polished GUI for yt-dlp + Whisper with an LLM layer wired in for summaries and Q&A.

Key facts:

Tauri v2 desktop app — macOS, Windows, and Linux from a single Rust/React codebase
Bundled yt-dlp — paste a YouTube/Vimeo/arbitrary URL, it downloads locally
On-device transcription — Whisper (via whisper.cpp and transcribe-rs) runs against the audio on your machine
Bring-your-own LLM — OpenAI GPT, Anthropic Claude, Google Gemini, or OpenRouter (DeepSeek) for summaries and chat
Grounded summaries — markdown briefs with timestamped takeaways tied back to transcript spans
Chat with media — Q&A against the transcript using the LLM of your choice
Text-to-speech — listen to the generated brief (Supertonic 3 / Qwen3-TTS on the roadmap)
AGPL-3.0 licensed, monorepo organized as a pnpm/Turborepo workspace
Honest limitation: LLM summaries are still cloud-by-default — local Gemma 4 / Qwen3-ASR support is on the roadmap but not shipped yet

If you've ever wanted a private alternative to NotebookLM or Read.ai for your own video backlog, OpenBrief is the most complete open-source attempt to date.

Why "Local-First" Video AI Suddenly Matters

For three years, the default flow for "summarize this video" has been: upload to a SaaS, get a summary, hope they don't train on it. Read.ai, Fireflies, Otter, NotebookLM — they all do good work, but every minute of audio you feed them is a minute on someone else's GPU.

For a lot of use cases that's fine. For others — legal depositions, internal all-hands recordings, anything under NDA, anything you don't want to count toward a per-minute pricing tier — it isn't. The market quietly noticed:

yt-dlp keeps gaining stars (currently 100K+) because people want their videos as files, not stream-only assets
whisper.cpp made on-device transcription genuinely usable on a MacBook
Local LLMs (Llama 3, Qwen 3, Gemma 3, DeepSeek V4) crossed the "good enough for summaries" line in 2025

What was missing was a polished desktop app that stitched these together with a sane UX. OpenBrief is the first Show HN this year that does it without feeling like a research demo.

What Makes OpenBrief Different

OpenBrief isn't doing any one thing nobody else has done. The interesting part is how it composes them.

1. Tauri v2 instead of Electron

Most "local AI" desktop apps reach for Electron because the team already knows React. OpenBrief uses Tauri v2, which means:

Rust backend, system webview frontend — installer is ~10MB instead of ~150MB
Native filesystem access for the helper sidecar (yt-dlp, ffmpeg) without IPC gymnastics
Lower memory footprint while a 90-minute Whisper transcription is running

The src-tauri/ directory exposes Rust commands that the React renderer calls. The helper sidecar — a separate binary that wraps yt-dlp and ffmpeg — is bundled at build time via pnpm setup:dev-sidecars. This is the right shape for a desktop tool that needs to shell out to native binaries; it avoids the "user has to install yt-dlp themselves" trap that kills adoption of a lot of similar projects.

2. Grounded summaries, not free-floating ones

A "grounded summary" in OpenBrief means each bullet in the generated brief is tied to a timestamp span in the original transcript. Click a takeaway → jump to the spot in the audio/video. This is the same idea NotebookLM popularized with its citation-numbered answers, and it matters for one reason: summaries hallucinate, transcripts don't. When the summary says "the speaker proposed a 40% price cut at 12:34," you can verify that's actually what they said.

The implementation is straightforward — the prompt template forces the LLM to emit JSON with { text, start_ts, end_ts } for each point, and the UI renders them as clickable chips. If you've built RAG over transcripts before, you've written approximately this code; OpenBrief just ships it as the default.

3. Pluggable model layer

The README's "Model Support" table is honest about what's shipped vs. roadmap:

Model type	Supported today	Roadmap
Speech-to-text	Whisper	Parakeet, Qwen3-ASR
Text-to-speech	(placeholder)	Supertonic 3, Qwen3-TTS
LLM	OpenAI, Anthropic, Gemini, OpenRouter (DeepSeek)	Local Gemma 4
Video embeddings	—	Frame/clip semantic search

The architectural win is that all four slots are service abstractions, not hardcoded. You wire your key into settings, the rest of the app doesn't care. Adding a local LLM later is "implement the LLM service interface against Ollama" — not "rewrite the summary pipeline."

Install and First Run

OpenBrief is currently distributed as a source build — the Tauri team hasn't pushed signed installers yet. You need:

# Prerequisites
# - Node.js ^22.21.0
# - pnpm 11.0.9
# - Rust + Cargo
# - Tauri v2 platform prerequisites for your OS
#   (Xcode on macOS, MSVC on Windows, webkit2gtk on Linux)

git clone https://github.com/tantara/openbrief
cd openbrief/client
pnpm install

# If pnpm flags ignored native build scripts on first install:
pnpm approve-builds  # approve the listed packages
pnpm install         # rerun

# Build and run the desktop app
cd apps/tauri
pnpm setup:dev-sidecars     # builds the yt-dlp helper sidecar
pnpm prepare:media-assets   # downloads Whisper model + ffmpeg
pnpm dev                    # launches the desktop window

First-run experience:

App opens to an empty library.
Paste a YouTube URL into the import bar.
The helper sidecar shells out to yt-dlp, downloads the audio (or video), and stores it in your local app data directory.
Whisper kicks off in a background worker. On an M2 MacBook Air, a 30-minute video transcribes in ~3 minutes against whisper-base. The bundled model size and the underlying whisper.cpp build determine real-world speed.
Once the transcript exists, the Summarize button calls your configured LLM with the grounded-summary prompt.
The right-hand pane shows the brief with clickable timestamps; the chat box at the bottom lets you ask follow-up questions against the transcript.

Configuring API keys

Settings → Models. Each provider gets its own key field:

OpenAI: sk-... — used for gpt-4o, gpt-4o-mini, or whatever model id you set
Anthropic: sk-ant-... — Claude models for summary + chat
Google: Gemini API key
OpenRouter: sk-or-... — routes to DeepSeek and 100+ others, cheap for bulk summarization

Keys are stored in the platform secure store (Keychain on macOS, Credential Manager on Windows, libsecret on Linux), not in plaintext config.

Real Use Cases

A few patterns that justify the install:

Conference talk backlog. I have ~40 saved YouTube talks I'll "watch later." OpenBrief into a library, summarize each, scan the briefs, watch the three that earn it. This is the use case the Show HN comments kept coming back to.

Podcast research. Drop a 90-minute episode in, ask: "Did they mention any specific tools or products?" The chat-with-transcript flow surfaces names you'd never catch on a normal listen.

Internal recordings. All-hands, customer interviews, sales calls — anything you can't or shouldn't upload to a SaaS. The transcript stays on disk, the summary is generated by your own API key (which you can route to a self-hosted LLM via OpenRouter or an OpenAI-compatible proxy).

Course / lecture notes. Long-form educational content compresses well into timestamped briefs. The grounded format makes it easy to re-watch the parts you actually need.

What the Show HN Thread Said

Sentiment in the HN thread was warm but pointed:

"Basically a GUI for yt-dlp with AI on top" — the maintainer's own characterization, and the thread mostly agreed that's a feature, not a bug. Wrapping a powerful CLI tool in a clean desktop UX has product value.
Local LLM gap — the most common request was support for Ollama / llama.cpp for the summary step. Right now BYO-key works fine but the "local-first" claim has an asterisk while the LLM call still goes to a cloud API by default.
Comparisons to NotebookLM — several commenters framed it as "the open-source NotebookLM I've been waiting for," which is roughly accurate for video/audio specifically. NotebookLM still has the edge on multi-source synthesis and audio overviews.
Why Tauri over Electron — a few people appreciated the small installer size; Tauri v2 is becoming the default pick for new local AI tools (Jan, LM Studio's recent versions, Bolt, OpenBrief).

Who Should Use This (and Who Shouldn't)

Use OpenBrief if:

You watch / produce a lot of long-form video or audio content
You want transcripts and summaries to stay on your machine
You already pay for at least one LLM API and want to extract more value from that key
You're comfortable with a source build for now (signed installers coming)

Skip it (or wait) if:

You want fully local end-to-end — local Gemma 4 / Qwen3-ASR are on the roadmap but not shipped
You need team collaboration features — this is a single-user desktop app today
You need real-time meeting transcription — OpenBrief is post-hoc, not live (use Whisper-Live or Fireflies for that)
You're on a low-spec machine — Whisper transcription is CPU/GPU-heavy; a 90-minute video takes meaningful time on older hardware

Comparison with Alternatives

Tool	Local-first	Open source	Video download	Grounded summary	Chat	Cost
OpenBrief	✅ (transcript)	✅ AGPL-3.0	✅ yt-dlp	✅	✅	Free + your LLM API
NotebookLM	❌	❌	❌	✅	✅	Free (Google account)
Read.ai	❌	❌	❌	✅	✅	$0.04/min uploads
Otter.ai	❌	❌	❌	✅	✅	$8.33/mo+
MacWhisper	✅	❌	❌	Limited	❌	$20–60 one-time
Whisper.cpp + custom	✅	✅	Manual	DIY	DIY	Free + dev time

The honest read: NotebookLM is still better for multi-source research with audio overviews. OpenBrief is better when privacy or per-minute cost is the constraint, or when you want one place for your personal video library.

Architecture: How a Single Import Actually Flows

For developers considering forking or contributing, here's the request lifecycle when you paste a URL:

[React UI: paste URL]
       │
       ▼  Tauri command: import_url
[Rust core] ──► [Helper sidecar: yt-dlp]
       │              │
       │              └─► writes .mp4/.m4a to app data dir
       │
       ▼  Tauri command: start_transcription
[Whisper worker (whisper.cpp via transcribe-rs)]
       │
       └─► writes transcript.json with { segments: [{start, end, text}] }
              │
              ▼  React reads transcript, renders timeline
              │
              ▼  User clicks "Summarize"
       [LLM service: OpenAI/Anthropic/Gemini/OpenRouter]
              │  prompt enforces grounded JSON schema
              ▼
       [Summary stored with timestamp links → rendered as clickable chips]

Every step is replaceable. The LLMService interface in packages/api/ is a few dozen lines; bolting on Ollama is straightforward, and the GitHub Issues already have a PR draft from a contributor doing exactly that.

Limitations Worth Knowing About

"Local-first" still routes summaries to cloud LLMs by default. Until local Gemma 4 ships, the transcript stays local but the summary prompt (which includes the transcript) goes to whichever API key you configured. If that's a dealbreaker, run a local OpenAI-compatible endpoint like LiteLLM or Ollama's OpenAI-compatible mode and point OpenBrief at http://localhost:11434.
No signed installers yet. Source build only on May 2026. On macOS that means xcode-select --install is a prerequisite.
Whisper model is fixed at install time. Switching from base to large-v3 for higher accuracy requires re-running pnpm prepare:media-assets.
AGPL-3.0 — fine for personal use and internal tools; if you want to fork it into a SaaS, the AGPL terms apply to network use. Read them before commercial deployment.
TTS is a placeholder. The "listen back" feature in the README requires Supertonic 3 / Qwen3-TTS support that isn't merged yet.

FAQ

Is OpenBrief free?

Yes — the app itself is open source under AGPL-3.0 and there's no subscription. You pay for the LLM API calls (your OpenAI / Anthropic / Gemini / OpenRouter key), which run a few cents per long video at current gpt-4o-mini or DeepSeek pricing.

Does OpenBrief work fully offline?

Transcription does — Whisper runs on your machine. Summaries and chat don't today, because they call your configured cloud LLM. Local LLM support (Gemma 4) is on the roadmap. As a workaround, point the OpenAI provider field at an Ollama or LiteLLM endpoint running on localhost.

How does OpenBrief compare to NotebookLM?

NotebookLM is stronger for multi-source synthesis (PDF + video + notes in one notebook) and ships an audio-overview generator that OpenBrief doesn't have yet. OpenBrief is stronger for privacy (everything stays on disk except the LLM call, which goes through your key), format flexibility (any video URL yt-dlp supports), and library management (it's built around a long-running personal collection rather than per-notebook sessions).

Can I use OpenBrief with a self-hosted LLM?

Yes — any OpenAI-compatible endpoint works. Configure the OpenAI provider with your local URL (e.g., http://localhost:11434/v1 for Ollama or LiteLLM) and a placeholder API key. End-to-end local pipelines work today with this trick; the official "local LLM" UI option is still on the roadmap.

What videos can it download?

Anything yt-dlp supports — that's YouTube, Vimeo, Twitch VODs, X/Twitter videos, podcasts hosted on most major platforms, and ~1,800 sites total. You can also import local .mp4, .m4a, .mp3, .wav files directly without downloading.

How fast is Whisper transcription?

Depends on the model and hardware. On an M2 MacBook Air with whisper-base, expect ~10x realtime (a 30-minute video transcribes in ~3 minutes). Larger models (large-v3) are slower but more accurate; M-series with Metal acceleration is the sweet spot. CPU-only on older Intel hardware can be 1–2x realtime.

Is there a Linux or Windows build?

The repo says all three platforms are supported, but since signed installers aren't published yet, Linux and Windows users build from source the same way macOS users do. The Tauri v2 prerequisites differ per platform — check the Tauri docs for your distro.

Bottom Line

OpenBrief is the first respectable open-source attempt at a personal video AI library — the kind of tool that sits between yt-dlp (too low-level) and NotebookLM (too cloud). It's not finished — local LLM support and signed installers are the two big gaps — but the architecture is right, the maintainer is active in the Show HN thread, and the AGPL license keeps it honest.

If you want to try it: clone the repo, run pnpm dev:tauri, and feed it one of your "watch later" videos. If the grounded-summary UX clicks, you'll probably end up importing your whole backlog within a day.

Repo: github.com/tantara/openbrief
Show HN: news.ycombinator.com/item?id=48272393

推荐订阅源

DEV Community