



























Gemini Omni announced at the 2026 Google I/O technology developer conference in Mountain View, California, on May 19, 2026. (Photo by Karl Mondon / AFP via Getty Images)
AFP via Getty Images
For many creative teams, video has always been the most difficult format to keep alive. While text can be rewritten in seconds, and images can be reworked, resized, retouched and pushed across channels with relative ease, video carries more weight: a shoot, an edit, a review, an export, another round of approvals, then a set of cutdowns and localised versions that often feel like an entirely separate production cycle. Google’s Gemini Omni points towards a different kind of creative workflow.
The company describes Omni as a model that can “create anything from any input,” beginning with video, by combining images, audio, video and text into high-quality video outputs grounded in Gemini’s knowledge of the world. Google also says the model allows users to edit video through conversation, with each instruction building on the last whilst keeping characters, physics and the thread of the scene consistent.
The language already forming around the launch tells an interesting story. On LinkedIn, people are describing “liquid content”, “video as a conversation” and “AI video as a small creative pipeline.” Some of that is launch-week enthusiasm, but it does capture the part of Omni that is most interesting for brands, agencies and studios. Video is transitioning towards something less fixed, becoming something more akin to a “working surface”.
This opens up a new era of possibilities where we can examine what happens when video can be generated, revised, restyled, remixed, verified and redistributed almost as easily as text.
Most AI video coverage still circles the same checklist: how realistic is the motion, how clean are the hands, how convincing is the camera movement, how long can the clip run? These questions are still key, but Omni’s more significant contribution is based around the level of controllability.
MORE FOR YOU
Google says Gemini Omni Flash is the first model in the Omni family, rolling out to the Gemini app, Google Flow and YouTube Shorts. The Gemini product page describes it as a system that can blend text, images and video, create 10-second videos with native audio, turn up to five photos into video, and support multi-turn editing.
This plethora of input options potentially changes the flow of the creative process. Users can begin with a concept, then ask for the background to change, the wardrobe to shift, the lighting to soften, the movement to become more precise or the camera to move to a different angle. Google’s own wording is strikingly close to the language of a director working through a sequence: “Just tell Gemini what to fix.”
For creative teams, this moves the focus from a one-shot generation tool towards more of a conversational editing environment. The first prompt becomes the rough cut, while the following prompts become the revision cycle. The model becomes part of the process between idea and asset, as opposed to only being the machine that produces the first clip.
This does not make human qualities like taste, direction or production judgement any less important, but it does make the distance between idea, draft and iteration much shorter.
The most important shift begins at the point after the first video is created. In traditional campaign production, a video moves towards being locked. Once the master asset is signed off, teams create shorter edits, vertical versions, localised cuts, social versions, paid media versions and platform-specific formats, and each new variation adds time, cost and coordination.
With Omni, however, a brand film can swiftly become a whole set of social variants. A creator clip can be restyled without starting from scratch, product videos can be reworked and optimised for different markets, formats or audiences, and campaign ideas can move through multiple visual routes before anyone commits to a full production.
This kind of tool is particularly useful in an era when brands are being asked to produce more content across more surfaces, whilst maintaining consistency and control. WPP’s Production Studio, built with NVIDIA Omniverse, was launched as an AI-enabled end-to-end production application designed to streamline the creation of text, images and video for advertisers and marketers. WPP also says it directly addresses the challenge of producing brand-compliant and product-accurate content at scale, with human oversight at every stage of the workflow.
Adobe is moving in a similar direction from the enterprise creative workflow side. Firefly Creative Production is built to turn repetitive production work into reusable, governed workflows, helping teams scale on-brand images and videos across channels and regions. Adobe also describes it as a way to make creative production operational rather than experimental, with reviews, handoffs, asset systems and approvals tied into the same content stack.
Omni approaches from a slightly different angle, because it is woven into Gemini, Flow and YouTube, rather than presented as an enterprise content supply chain product. Even so, it points towards the same broader direction: video assets that remain editable, adaptable and responsive for longer; and this is where agencies and studios should pay attention.
The “agency-killer” question is too blunt to be very useful, but a more practical question examines which parts of the agency model become harder to charge for when video production becomes conversational.
Simple versioning, background swaps, format changes, first-pass concept films, social cutdowns, basic localisation and light reworks all become more exposed. These tasks still have value, especially when done well, but clients may increasingly expect them to happen faster, closer to the first idea and with fewer handoffs.
The more durable value moves towards creative judgement, brand systems, cultural intelligence, legal clearance, performance interpretation and campaign architecture. A model can produce more routes, but a strong agency still needs to understand which route matters, which one fits the brand, which one creates risk, and which one deserves budget.
This is where Gemini Omni could raise uncomfortable issues for parts of the production economy. If a client can ask for a shot to feel more premium, change the background to Tokyo at night, keep the same character, create three vertical edits and shift the tone of one version, then the coordination work around content production begins to compress. The premium moves towards taste, and repetition becomes harder to defend as the centre of the overall fee.
Google also has an advantage many AI video companies do not have: the places where people already create, search, watch and share. Gemini Omni is being placed across the Gemini app, Google Flow and YouTube Shorts. Whilst a creative model inside a standalone interface may impress early adopters, a creative model inside YouTube can begin to reshape everyday remix culture from a standing start.
The Verge reports that YouTube Shorts users will see a “reimagine” option within Shorts Remix, allowing them to transform clips into styles such as anime or pixel art, alter what appears in a video, add background actors or costumes, and even insert themselves into clips. Creators will be able to enable or disable this capability, and remixed Shorts will include a digital watermark and a link back to the original video.
On the surface of things, this could be a significant cultural development, as every video starts to become potential source material; this could be particularly powerful for brands. Campaign films can become a remix template, product moments can shift towards a more participatory format, and entertainment assets can travel through fan edits at a scale that would previously have required a large media plan.
On the flip side, this same opportunity also creates obvious tension. If every clip is increasingly editable, who actually controls context? And who ultimately benefits from the remix? Also, what happens when a creator’s identity, voice, gesture or product placement moves in a direction that they never intended?
Beyond Omni becoming more than a creative tool, it also sparks a wider argument about participation, permission and control in the AI video era.
Gemini Omni’s new avatar angle makes this subject even more sensitive. Google’s own Gemini page says users can add an AI avatar to create content that looks and sounds like them, without uploading an image every time. The support documentation says users record their face and voice to create the avatar, and that avatars currently require a personal Google Account with a Google AI plan, with some geographic restrictions and support currently limited to English.
For creators and marketers, the appeal is obvious; founders can appear in a product explainer without filming again and creators can generate multiple versions of a post for different regions or demographics. It is also now possible for a brand ambassador to be placed into different contexts with far less friction than a conventional shoot.
As well as all of the upside, we also need to consider the obvious sensitivity around this. A video that looks like someone, sounds like someone and behaves coherently inside a realistic scene is powerful creative material, but it also crosses into contentious areas such as identity, consent, artificiality and audience trust.
Google is clearly aware of the risk, with DeepMind saying that content created or edited with Omni in the Gemini app, Google Flow or YouTube includes SynthID digital watermarking and C2PA Content Credentials. It also says that content can be verified through the Gemini app, with verification coming soon to Chrome and Search.
This additional provenance layer will become increasingly essential as AI video becomes more realistic and more editable, since it can help people to understand whether a piece of content was created or altered by Google AI. However, it is also important to note that it does not settle every question around consent, compensation or context. Although a watermark can identify how something was made, it cannot decide whether the use of a likeness feels legitimate to a creator, a brand or an audience.
The phrase “living video asset” could easily sound like another piece of technology language in search of a market. In this case, it is more of a description of where the medium appears to be moving.
CEO of Google DeepMind, Demis Hassabis, speaks at the 2026 Google I/O conference
AFP via Getty Images
A living video asset can be revised, localised, restyled, extended, checked, remixed and redistributed across different contexts. It can carry a character, a location, a campaign idea, a style or a brand code through multiple versions. It keeps moving and evolving after the first output.
For agencies, this means more pressure on parts of production that rely on repetition, and more emphasis on creative direction, strategy and governance. For brands, it means speed and variation, with a larger responsibility around approvals, provenance and consistency. For creators, it opens new forms of expression, whilst raising new questions about remixing and likeness. For Google, it gives Gemini Omni the potential to become a creative layer running across Gemini, Flow and YouTube.
Gemini Omni points towards a future where video is no longer treated as a locked asset, but as something that can keep moving through a campaign, a platform and a culture. This should excite agencies and brands, but it should also make them more careful. The ability to revise endlessly is only useful if someone still knows what the work is meant to "say" in the first place, who it is meant to serve, and where the boundaries should sit. In the next phase of AI video, the most in-demand skill will not be in video generation at all, but more a case knowing when to stop.
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。