I built Voice2Sub because many subtitle and transcription workflows still start with uploading a media file to a browser tool.
That works for short public videos. But it becomes awkward when the file is long, private, local, or part of a repeat editing workflow.
Voice2Sub focuses on a desktop workflow:
- Import a local video or audio file
- Generate subtitles or transcript text with Whisper AI recognition
- Review the result
- Export SRT, VTT, TXT, LRC or CSV
Why I built it as a desktop app
A lot of creators, educators, podcasters and journalists work with media that they do not always want to upload to a browser tool.
Examples:
- private interviews
- long lectures
- course recordings
- podcasts
- internal meetings
- YouTube or TikTok editing workflows
- archived audio/video files
A local-first desktop app gives users more control over the file, the model, the output format and the processing workflow.
What Voice2Sub does
Voice2Sub is an AI subtitle generator and speech-to-text desktop app for video/audio files.
It currently focuses on:
- generating subtitles from local video/audio
- creating transcript text from speech
- exporting SRT, VTT, TXT, LRC and CSV
- running on Windows, macOS Apple Silicon and Linux
- supporting CUDA acceleration on compatible Windows/Linux systems
- supporting Metal acceleration on Apple Silicon Macs
- giving users more control over model selection and transcription settings
Why not just use an online subtitle generator?
Online tools are convenient, but a desktop workflow is useful when:
- the media file is large
- the content is private
- the user wants repeat processing
- the user wants local model control
- the user wants common subtitle export formats
- the user works across Windows, macOS or Linux
Voice2Sub is not trying to replace every online video editor. It is focused on a local subtitle and transcript workflow.
What I learned while building it
The AI part is only one piece of the product.
A desktop AI tool also needs:
- reliable model downloads
- offline and interrupted download handling
- safe retry/resume behavior
- cross-platform packaging
- clear error messages
- GPU acceleration setup
- update reliability
- localization
- clean export formats
- a first-run experience that does not confuse users
One thing I underestimated was how important the model download experience is. If the user cannot download or select an AI model, the whole product feels broken even if the transcription engine itself works.
Current platforms
Voice2Sub currently supports:
- Windows x64
- macOS Apple Silicon
- Linux x64
The app also supports hardware acceleration when available:
- CUDA on compatible NVIDIA systems
- Metal on Apple Silicon Macs
Current export formats
Voice2Sub can export:
- SRT
- VTT
- TXT
- LRC
- CSV
These formats cover common subtitle, transcript, lyric and editing workflows.
What I want to improve next
I am considering:
- batch subtitle generation
- better subtitle preview/editing
- translation workflow
- speaker detection
- better presets for YouTube, courses, podcasts and interviews
- more polish around the first-run onboarding experience
Links
If you work with subtitles, transcripts, video editing, podcasts or course content, I would love feedback on the workflow.





















