












Recently tried out the Gemini Omni Flash released by Google at I/O 2026 and here are my thoughts.
The biggest difference with this model is that you can edit videos through conversation. After generating a clip, you can simply say "change the background to a beach," "slow down the footage," or "add a person on the right," and it will only modify the part you specified while keeping the rest intact. You don't have to regenerate the entire clip like with Sora each time.
Key points:
- Supports multimodal input: text + images + audio + video can be fed in together
- Outputs 10-second clips with synchronized audio
- YouTube Shorts is free to use; the Gemini app requires AI Plus ($7.99/month)
- The developer API hasn't been opened yet, with a release expected "within a few weeks"
- All outputs are强制带 SynthID 水印
compared to Sora 2: Sora has better character consistency and can generate 25-second clips; Omni Flash excels in multimodal input and dialogue editing, with much lower iteration costs.
also has limitations: 10-second upper limit, cannot edit audio (to prevent deepfakes), text rendering is not very accurate, and complex motion scenes occasionally crash.
If you want to quickly experience video generation, you can check out [gemini omni]( https://www.veol.ai?utm_source=v2ex), which supports up to 4K output and charges per usage starting at $0.15.
Have you used any V friends? I feel that the direction of conversational editing is quite right, but the 10-second limit is indeed a bit short.
This content is automatically aggregated by InertiaRSS (RSS Reader) for reading reference only. Original from — Copyright belongs to the original author.