惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

博客园 - 司徒正美
aimingoo的专栏
aimingoo的专栏
MongoDB | Blog
MongoDB | Blog
云风的 BLOG
云风的 BLOG
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
酷 壳 – CoolShell
酷 壳 – CoolShell
博客园 - 聂微东
Y
Y Combinator Blog
T
Tailwind CSS Blog
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
S
SegmentFault 最新的问题
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
博客园 - 【当耐特】
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
J
Java Code Geeks
美团技术团队
Google DeepMind News
Google DeepMind News
博客园_首页
Apple Machine Learning Research
Apple Machine Learning Research
T
The Blog of Author Tim Ferriss

Forbes - Innovation

Crafoord Prize Winner Ramanathan: Climate Action Enters Its “How” Phase Marriage Benefits Men's Life Expectancy More Than Women's 3 Steps Not To Ignore In Nature Plans The Post-‘The Boys’ Finale ‘Vought Rising’ Trailer Is Here, And Quite Good 2026 America Innovates | Responsible For All Our Digital Maps, Jack Dangermond Loves The Word 'Where' 2026 America Innovates | Fracking Pioneer Harold Hamm Calls Oil And Gas The Most Reliable Energy For AI Why Tom Hardy Was Reportedly Just Fired From ‘Mobland’ Season 3 How Small Studios Outrun Bigger Teams Sony Launches Reon Pocket Pro Plus Wearable Air Conditioner In Time For Summer Heatwaves Industry 5.0 Is Changing The Meaning Of Automation Garmin Watches, Coros And More Now Pair Better With Strava NYT Connections Hints Today: Saturday, May 23 Groups And Answers (#1077) The Architectural Difference Between Legal Productivity AI And EDiscovery AI ‘The Mandalorian And Grogu’ Sets A Rotten Tomatoes Audience Score Record How AI Tools Are Redefining The Role Of Technical Founders Apple Spotlights Student Entrepreneurs In Great Ideas Start Here Campaign The Growing Cybersecurity Risks To The Supply Chain In The AI Era Your Website Is Decaying Consumer Intent Faster Than You Think With ‘Destiny 2’ Gone, No ‘Destiny 3’ Is Coming ​How Operational Access Can Ensure Readiness For The Next Storm Why Russians Are In Despair Over Truck-Busting ’Martian’ Drones New ‘Crimson Desert’ Patch Adds Another Long-Time Player Request The Architecture Behind Cost-Effective AI Agents How To Think About High-Stakes Dispute Resolution Why Do Our Fingers Get Wrinkly In Water? An Evolutionary Biologist Explains You Can Build A CRM In A Day. You Still Can't Run A Company In One. 6 Teachable Moments From An Atlanta Rush Hour Downpour Why Your AI-Generated Marketing Content Sounds Generic ​The Accountability Crisis In The Creator Economy Scaling Across Borders: What It Takes To Succeed Globally Apple Rolls Out Two Crucial Health Features For Apple Watch And AirPods In India Competitive Advantage In Logistics Isn't AI ​Why AI Can Write Code, But It Can't Teach Engineers Critical Thinking The Importance Of Red Teaming For Scaling Enterprise AI Agents Why The Next AI Moat Won’t Be Productivity, But Emotional Value Banking’s AI Problem Isn’t The Model. It’s The Plumbing The Case For Structural Reform Through Tokenization SpaceX Scrubs Starship Launch As $2 Trillion IPO Nears LEGO F1 Ferrari Helmet Review (43014): Rough Build, Spectacular Finish Oleksandr Usyk Vs. Rico Verhoeven: Date, Time And How To Watch If Majoring In Computer Science Is Doomed Due To AI, The Latest Claim Is That Majoring In Philosophy Is The Next Best Choice MVP's Nakisa Bidarian On Rousey-Carano Viewership, Shields' Ban And PFL Co-Promotion See A ‘Planet Parade’ As Three Worlds Shine After Sunset This Weekend Soundcore’s Liberty 5 Are First Earbuds To Use Anker’s Thus AI Chip Code Ninjas: The AI-In-Education Problem Isn’t Cheating. It’s Passivity. Today’s Wordle #1798 Hints And Answer For Friday, May 22 NYT ‘Pips’ Hints, Answers And Walkthrough For Friday, May 22 Apple Teases iOS 27 AI Upgrades With Major Accessibility Overhaul To iPhone Samsung Releases Free One UI 8.5 Upgrade To Millions Of Galaxy Phones How Instagram Became A Venture Capital Deal Engine ‘Star Wars: The Mandalorian And Grogu’: Which Movie Is Best? New Study: A Quarter Of College Students Using AI Daily Cheat With It NYT Connections Hints Today: Friday, May 22 Clues And Answers (#1,076) NYT Connections Answers Explained Friday May 22 NYT Strands Hint Today: Friday, May 22 Clues And Answers (Put Down Your Ruler) Quordle Hints Today: Friday, May 22 Clues And Answers Webb Telescope Detects Cloudy Mornings And Clear Nights On Alien World AI Flattening Organizations Is The Latest Chapter In A Continuing Story AI Was Supposed To Reduce Your Workload. Here’s Why It Hasn’t, And Here’s How It Can. DevOps Practices Tech Teams Must Strengthen In The AI Era The End Of ‘Destiny 2’: All Expansions Canceled, Maintenance Mode Incoming ‘The Mandalorian And Grogu’ Recap Before You See The Movie, Post-Credits Scene And More Fidelity Collective Buys Up Westone Audio And Etymotic Brands Why AI Profitability Belongs To Enterprise, Not Consumer Scale OpenAI And Anthropic Are Testing Two Very Different AI Business Models Kordata Launches To Advance Neurotech-Powered Clinical Trials Solving The Identity Crisis: Putting Today’s Fragmented Consumer Back Together These Are The Most- And Least-Expensive New Cars To Run At Today’s Fuel Prices New Reports And New Paradigms Show Drive In AI Smart Glasses Market Samsung Galaxy Z Fold 8: Price Rise, Bad Crease News Anthropic And Microsoft Team Up Why Nvidia Needs More Than GPUs To Win The AI Infrastructure Race Nvidia Is Expanding Infra Partnerships. Will A Big Deal Happen? Drug Overdose Deaths Fell in 2024. Why Experts Remain Cautious Microsoft Is Scrapping SMS 2FA Codes—What You Need To Do ‘Wax Heads’ Review: Somehow The Vital Connection Is Made Securing The Internet’s Humanity Netflix’s Best New Show Lands A Perfect Rotten Tomatoes Score As A Final Duffer Bros. Effort AI Might Not Bring On A Job Crisis, But A Workforce ‘Mismatch’ Could Why Post-Quantum Compliance For Banks Starts In Containers Do Your AI Agents Have Governance? Most Don’t, And They’re Live Why Complexity Is The Insider Threat Hiding In Plain Sight ‘Supergirl’ Is Starting To Feel Like It May Be A Big DCU Miss Google Confirms 2 Critical New Flaws—How To Jump The Update Queue Google Splits Its Agent Strategy For Two Developer Audiences Rethinking GRC In The Tokenized Economy ‘The Boys’ Series Finale Review Scores Are Way Under ‘Stranger Things’ Autonomous Data Stewardship: How AI Agents Are Redefining Master Data Management In Financial Services A Small Business Guide To Understanding Multistate Tax Obligations Why Performance Has Become The New Currency In Advertising The Plan For FEMA Reform, Less People In D.C.,More Responsibility For States There’s A Way ‘Gen V’ May Now Live On After ‘The Boys’ Finale Garmin Cirqa Price May Be Far Higher Than Expected Securing AI Cloud Systems: Intelligent Testing For Intelligent Systems 2 New Microsoft Defender Zero-Days Exploited—Patch Now Rolling Out 2 Tell-Tale Signs Of ‘Fake Love’ In A Relationship, By A Psychologist California Lets Cops Give Tickets To Robocars, Which Is Ridiculous Why Do Humans Have Unique Voices? An Evolutionary Biologist Explains The Anatomy That Makes You Unmistakable Of All The Professions AI Is Disrupting, Accounting Has The Worst Math How Connected Reporting And Dynamic Waterfalls Reshape Fund Services
The AI Video Race Is Moving Beyond Pretty Clips
Ron Schmelze · 2026-05-23 · via Forbes - Innovation
US-TECHNOLOGY-AI-GOOGLE

Google CEO Sundar Pichai speaks during the 2026 Google I/O technology developer conference in Mountain View, California, on May 19, 2026. (Photo by Karl Mondon / AFP via Getty Images)

AFP via Getty Images

Google used its latest I/O event this week to introduce Gemini Omni Flash, a new AI model that can take text, photos, video, and audio as inputs, then produce short video clips with audio. It is launching through the Gemini app, Google Flow, and YouTube Shorts, with current clips up to 10 seconds and longer formats planned. Google’s latest video announcements show that the industry is focusing on more than just another text-to-video demo. AI is working its way more into the process of video creation.

Early AI video tools worked like most other prompt-to-output generators. Type a prompt and get a clip, and if you don’t like it then just try again. Gemini Omni Flash moves closer to a video assistant. You can give it existing media, ask it to change that media and use conversation to guide the result. Google says the Omni family is designed around creating “anything from any input,” with video as the first major format. Reports from Google I/O 2026 say Gemini Omni Flash is launching through the Gemini app, Google Flow, and YouTube Shorts. The Verge also reported that current clips are up to ten seconds, with longer formats planned.

A Broad Range of Google AI Video Options

Google is adding more to their already somewhat overwhelming line of video-oriented AI models. Google already has Veo, its dedicated AI video model. Veo 3.1 is built for high fidelity video generation, with native audio, stronger prompt following, cinematic controls, and output options that include 720p, 1080p, and 4K through the Gemini API. Veo 3.1 Lite is the lower-cost version for scaled developer and enterprise use. Flow is Google’s AI filmmaking workspace, Google Vids is the Workspace tool for business videos, Gemini offers a consumer entry point for casual video creation, Vertex AI and the Gemini API give developers programmatic access to Veo.

Gemini Omni Flash is different in scope. Gemini Omni Flash is the broader multimodal model for creating and editing video from text, images, audio, and video through conversation. It is part of Gemini, so it is less about a standalone video engine and more about multimodal creation. It can use text, photos, videos, and audio as starting material. Omni Flash also benefits from broader Gemini training and world knowledge, which could make it better at interpreting context than a video model that only reacts to prompts.

Together, these options give Google a wide video AI stack for consumers, creators, businesses and developers, though the number of overlapping names and entry points could make the product story harder for users to follow. The combination is powerful, however. A basic AI video tool might turn one prompt into a clip. A more useful system could use all those inputs, create several short videos, revise them through chat, and format them for Shorts.

MORE FOR YOU

Given the potential for malicious and harmful use of generated video outputs, Google is also putting safety markers around the output. Google’s AI generated video content will carry SynthID watermarks and content verification tools.

Shifting From Generation to Production

Creators, marketers, agencies, studios, and software platforms now are aware that AI can make a good quality video output clip. The harder question is whether AI can take a product image, a brand guide, a voice memo, three customer reviews, a half-finished storyboard, yesterday’s top-performing ad, and turn all of that into usable video assets that can be revised, tested, localized, approved and shipped.

Google says Gemini Omni Flash can create video from text, images, audio, and video, with the larger Gemini Omni family built around the idea of creating “anything from anything.” Google also says the model brings Gemini’s reasoning together with media generation and editing.

While Veo remains Google’s dedicated video model, with Veo 3.1 focused on video quality, native audio, realism, and creative control, Gemini Omni Flash points toward broader use. It can use different media inputs, generate video with audio, and support conversational editing. This means it’s less of an output oriented tool and more of a visual editor with memory similar to how agentic coding tools like Claude Code and OpenAI’s Codex have shifted away from just one-time outputs to managing the whole process.

Using Gemini Omni, a marketer could ask for three YouTube Shorts based on a product photo and a customer quote. A founder could feed in a rough iPhone clip and ask for a cleaner version that keeps the same energy. A retailer could request twenty variants of the same seasonal promotion, each tuned for a different buyer segment. The machine goes beyond generating pixels and helps manage the production process complexity that sits between the idea and the publish button.

An Increasingly Crowded Field for Video Process

Other vendors are building the workspace around AI video, increasing competition in the market and potential for confusion with customers. Higgsfield offers an AI video generator and studio where users can access several major models in one place, including Kling 3.0, Veo 3.1, Sora 2, Seedance 2.0, Wan 2.7 and others, then compare outputs, control camera moves, manage motion, and shape style without leaving the platform.

Magnific, formerly Freepik, is taking a related route from the creative asset side. The renamed platform now combines AI image and video generation, 4K video with audio, upscaling, enhancement tools, collaboration, 3D and virtual scene tools, an AI assistant, training, and a library of more than 250 million creative assets. That makes Magnific less like a pure video model company and more like a full creative production suite. Its advantage comes from starting with a huge base of stock imagery, design assets, and creative users, then layering AI generation and editing on top.

Runway, Luma, and similar tools are also focusing on the process and flow by offering a range of model choice, repeatable styles, character consistency, camera control, brand assets, collaboration, templates, approvals and output quality. Chinese models from ByteDance and Kuaishou add more pressure, with Seedance and Kling pushing features such as multimodal inputs, multi-shot generation, native audio, lip sync, and faster short-form video creation.

The broader market is splitting into two camps. Google and OpenAI have focused on frontier models and direct product surfaces. OpenAI pushed Sora 2 as a flagship video and audio generation model with synchronized dialogue and sound effects, although OpenAI’s own page now says the Sora product is no longer available, and its developer documentation says the Sora 2 video models and Videos API are deprecated and will shut down on September 24, 2026.

Impacts To The Creative Economy

With this increasing move upstream into more of the production and creative process, voices from the creative economy are getting increasingly concerned. Many are wondering if AI video will replace filmmakers and production crew.

AI video has now reached the feature-film proof-of-concept stage. Hell Grind, a 95-minute AI-generated sci-fi action film from Higgsfield AI, screened around Cannes in May 2026. The Wall Street Journal reported that the film was made in roughly two weeks for about $500,000, with $400,000 spent on AI compute. The production still required a 15-person team and heavy human direction, with the first 25 minutes alone reportedly taking more than 16,000 initial generations, later cut down to 253 final shots. While Hell Grind does not prove that a studio can type “make an AI movie” and receive a finished feature, it does illustrate that AI video can now support longer-form production when people supply detailed prompts, creative judgment, editing discipline, and enough computing power.

The cause for concern is real because much of the work in video production is expensive and repetitive. It often lives across multiple applications, conversation threads, editing timelines, asset folders and approval queues. As studios and production houses make increasing use of AI, they will no doubt be pulled to make greater use of production-focused tools.

Human creativity of course still matters. Creative style, judgment, timing, narrative instinct, risk sense, legal caution and audience knowledge become the scarce goods. While a machine can generate ten options, someone still has to know which one feels cheap, which one feels uncanny, which one violates the brief, and which one might actually sell.

On the risk side, the more precise these tools become, the easier it gets to create convincing synthetic people, fake endorsements, unauthorized likenesses and brand unsafe media. There are still hard questions such as what data trained the model? What likeness rights are protected? Can outputs be traced? Can a company prove what was generated, who approved it, and what assets went into the final cut? Can the workflow stop a rogue campaign before it reaches customers?

So while pretty clips got everyone’s attention with AI generated video, the real future is around controlled production.