


























Google has started rolling out Gemini Omni Flash, its new multimodal AI model that can generate and edit videos using text, images, audio and video inputs. The rollout follows the model’s announcement during Google I/O 2026 and marks the point where users can now actively use the system inside the Gemini app, Google Flow and YouTube Shorts.
The company says the model is designed to combine reasoning and creative generation in a single system, allowing users to build and modify video content through natural conversation.
With Gemini Omni Flash, users can prompt the model to create videos from scratch or modify existing clips step by step. Each instruction builds on the previous one, allowing continuous refinement of scenes without breaking continuity. Google says this helps maintain consistency in characters, objects and environments across edits, even as the video changes through multiple iterations.
The model also supports multi-input workflows, where users can combine different types of inputs such as text prompts, images, video clips and audio references. This allows a single output video to be shaped using multiple reference points instead of relying on a single prompt. Google says the system is built to understand how these inputs relate to each other and produce a coherent final scene.
The rollout is part of Google’s broader push to integrate generative AI into its consumer ecosystem, especially platforms focused on short-form video creation. YouTube Shorts and the YouTube Create app are among the first platforms where Omni Flash capabilities are being introduced, signalling a tighter connection between AI generation tools and content creation pipelines.
The company also says all outputs generated through the system will include SynthID watermarking for identification of AI-generated content.
Gemini Omni Flash allows users to edit videos using natural language commands instead of traditional editing tools. Users can describe changes such as altering environments, adding objects or changing actions within a scene, and the model updates the video accordingly while preserving overall structure.
The system is designed to maintain visual continuity across edits, ensuring that characters and objects remain consistent as changes are made over multiple steps. Google says this makes the editing process more iterative and flexible compared to conventional video production tools.
The model also draws on Gemini’s broader world knowledge to improve realism in generated content. It uses this understanding to simulate physical interactions such as motion, lighting and environmental effects more accurately, according to Google.
Google has positioned Gemini Omni Flash as part of a wider shift toward multimodal AI systems that can handle creation and reasoning together. The model is designed to process multiple input formats and generate output video that reflects combined instructions rather than isolated prompts.
The company says the goal is to reduce the gap between idea and execution, allowing users to move from concept to finished video using a single conversational interface. Over time, Google plans to expand output formats beyond video, with support for images and audio also planned for future updates.
The rollout of Gemini Omni Flash is currently limited to select subscription tiers in the Gemini app, with broader access expected as the deployment expands.
Get the latest in engineering, tech, space & science - delivered daily to your inbox.
With over a decade-long career in journalism, Neetika Walter has worked with The Economic Times, ANI, and Hindustan Times, covering politics, business, technology, and the clean energy sector. Passionate about contemporary culture, books, poetry, and storytelling, she brings depth and insight to her writing. When she isn’t chasing stories, she’s likely lost in a book or enjoying the company of her dogs.
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。