惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
V
Vulnerabilities – Threatpost
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
V
Visual Studio Blog
月光博客
月光博客
IT之家
IT之家
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
T
Tailwind CSS Blog
罗磊的独立博客
S
SegmentFault 最新的问题
博客园 - 三生石上(FineUI控件)
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
量子位
V
V2EX
Jina AI
Jina AI
The GitHub Blog
The GitHub Blog
小众软件
小众软件
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
阮一峰的网络日志
阮一峰的网络日志
Recent Announcements
Recent Announcements
MongoDB | Blog
MongoDB | Blog
Y
Y Combinator Blog
H
Help Net Security
博客园_首页
Cyberwarzone
Cyberwarzone
T
Tenable Blog
A
Arctic Wolf
C
CERT Recently Published Vulnerability Notes
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
T
Threat Research - Cisco Blogs
aimingoo的专栏
aimingoo的专栏
Google DeepMind News
Google DeepMind News
博客园 - 叶小钗
C
Cyber Attacks, Cyber Crime and Cyber Security
美团技术团队
Attack and Defense Labs
Attack and Defense Labs
GbyAI
GbyAI
博客园 - 【当耐特】
Cloudbric
Cloudbric
NISL@THU
NISL@THU
B
Blog RSS Feed
K
Kaspersky official blog
Hugging Face - Blog
Hugging Face - Blog
P
Privacy International News Feed
博客园 - Franky
博客园 - 司徒正美
Microsoft Azure Blog
Microsoft Azure Blog
Apple Machine Learning Research
Apple Machine Learning Research
Webroot Blog
Webroot Blog
Microsoft Security Blog
Microsoft Security Blog

Google DeepMind News

Investing in multi-agent AI safety research DiffusionGemma: 4x faster text generation Fluid, natural voice translation with Gemini 3.5 Live Translate Measuring the impact of learning with AI in Sierra Leone and beyond Powering the future of robotics in Europe Introducing Gemma 4 12B: a unified, encoder-free multimodal model Strengthening Singapore’s AI Future: A New National Partnership Simulate real-world places with Project Genie and Street View Introducing Gemini Omni Gemini for Science: AI experiments and tools for a new era of discovery Making it easier to understand how content was created and edited Gemini 3.5: frontier intelligence with action Co-Scientist: A multi-agent AI partner to accelerate research How WeatherNext helped the National Hurricane Center better predict Hurricane Melissa’s historic landfall in Jamaica Fast-tracking genetic leads to reverse cellular aging Finding the molecular switches behind new infectious diseases Opening new paths in aging research Accelerating discovery of liver disease mechanisms Uniting biological toolkits for a new approach to ALS Uncovering repurposed medicines to fight liver fibrosis Google Antigravity We’re launching the Google DeepMind Accelerator program in Asia Pacific to tackle environmental risks. Reimagining the mouse pointer for the AI era AlphaEvolve: How our Gemini-powered coding agent is scaling impact across fields Enabling a new model for healthcare with AI co-clinician Announcing our partnership with the Republic of Korea Decoupled DiLoCo: A new frontier for resilient, distributed AI training Partnering with industry leaders to accelerate AI transformation Gemini 3.1 Flash TTS: the next generation of expressive AI speech Gemini Robotics-ER 1.6: Powering real-world robotics tasks through enhanced embodied reasoning Gemma 4: Byte for byte, the most capable open models Gemini 3.1 Flash Live: Making audio AI more natural and reliable Protecting people from harmful manipulation Lyria 3 Pro: Create longer tracks in more Google products Measuring progress toward AGI: A cognitive framework From games to biology and beyond: 10 years of AlphaGo’s impact Gemini 3.1 Flash-Lite: Built for intelligence at scale Nano Banana 2: Combining Pro capabilities with lightning-fast speed Gemini 3.1 Pro: A smarter model for your most complex tasks A new way to express yourself: Gemini can now create music Accelerating discovery in India through AI-powered science and education Gemini 3 Deep Think: Advancing science, research and engineering Accelerating Mathematical and Scientific Discovery with Gemini Deep Think Project Genie: Experimenting with infinite, interactive worlds D4RT: Teaching AI to see the world in four dimensions Veo 3.1 Ingredients to Video: More consistency, creativity and control Google's year in review: 8 areas with research breakthroughs in 2025 Gemma Scope 2: helping the AI safety community deepen understanding of complex language model behavior Google DeepMind supports U.S. Department of Energy on Genesis: a national mission to accelerate innovation and scientific discovery Gemini 3 Flash: frontier intelligence built for speed Improved Gemini audio models for powerful voice interactions Deepening our partnership with the UK AI Security Institute Strengthening our partnership with the UK government to support prosperity and security in the AI era FACTS Benchmark Suite: Systematically evaluating the factuality of large language models Engineering more resilient crops for a warming climate AlphaFold: Five years of impact Revealing a key protein behind heart disease How we’re bringing AI image verification to the Gemini app Build with Nano Banana Pro, our Gemini 3 Pro Image model Introducing Nano Banana Pro We’re expanding our presence in Singapore to advance AI in the Asia-Pacific region Start building with Gemini 3 A new era of intelligence with Gemini 3 Google Antigravity WeatherNext 2: Our most advanced weather forecasting model SIMA 2: An Agent that Plays, Reasons, and Learns With You in Virtual 3D Worlds Teaching AI to see the world more like we do How AI is giving Northern Ireland teachers time back Mapping, modeling, and understanding nature with AI Accelerating discovery with the AI for Math Initiative MedGemma: Our most capable open models for health AI development VaultGemma: The world's most capable differentially private LLM Bringing AI to the next generation of fusion energy Introducing Veo 3.1 and advanced capabilities in Flow How a Gemma model helped discover a new potential cancer therapy pathway Introducing the Gemini 2.5 Computer Use model Introducing CodeMender: an AI agent for code security Gemini Robotics 1.5 brings AI agents into the physical world Strengthening our Frontier Safety Framework Discovering new solutions to century-old problems in fluid dynamics Gemini achieves gold-medal level at the International Collegiate Programming Contest World Finals Using AI to perceive the universe in greater depth Image editing in Gemini just got a major upgrade Introducing Gemma 3 270M: The compact model for hyper-efficient AI How AI is helping advance the science of bioacoustics to save endangered species Rethinking how we measure AI intelligence Try Deep Think in the Gemini app AlphaEarth Foundations helps map our planet in unprecedented detail Aeneas transforms how historians connect the past Gemini 2.5 Flash-Lite is now stable and generally available Exploring the context of online images with Backstory Advanced version of Gemini with Deep Think officially achieves gold-medal standard at the International Mathematical Olympiad T5Gemma: A new collection of encoder-decoder Gemma models Introducing Gemma 3n: The developer guide AlphaGenome: AI for better understanding the genome Gemini Robotics On-Device brings AI to local robotic devices We’re expanding our Gemini 2.5 family of models Gemini 2.5: Updates to our family of thinking models Behind “ANCESTRA”: combining Veo with live-action filmmaking How we're supporting better tropical cyclone prediction with AI
Genie 3: A new frontier for world models
Jack Parker-Holder, Shlomi Fruchter · 2025-08-05 · via Google DeepMind News

Today we are announcing Genie 3, a general purpose world model that can generate an unprecedented diversity of interactive environments.

Given a text prompt, Genie 3 can generate dynamic worlds that you can navigate in real time at 24 frames per second, retaining consistency for a few minutes at a resolution of 720p.

Towards world simulation

At Google DeepMind, we have been pioneering research in simulated environments for over a decade, from training agents to master real-time strategy games to developing simulated environments for open-ended learning and robotics. This work motivated our development of world models, which are AI systems that can use their understanding of the world to simulate aspects of it, enabling agents to predict both how an environment will evolve and how their actions will affect it.

World models are also a key stepping stone on the path to AGI, since they make it possible to train AI agents in an unlimited curriculum of rich simulation environments. Last year we introduced the first foundation world models with Genie 1 and Genie 2, which could generate new environments for agents. We have also continued to push the state of the art in video generation with our models Veo 2 and Veo 3, which exhibit a deep understanding of intuitive physics.

Each of these models marks progress along different capabilities of world simulation. Genie 3 is our first world model to allow interaction in real-time, while also improving consistency and realism compared to Genie 2.

  • Capabilities
  • Embodied agent research
  • Limitations
  • Responsibility
  • Next steps

Genie 3’s capabilities include:

The following are recordings of real time interactions from Genie 3.

Modelling physical properties of the world

Experience natural phenomena like water and lighting, and complex environmental interactions.

Simulating the natural world

Generate vibrant ecosystems, from animal behaviors to intricate plant life.

Modelling animation and fiction

Tap into imagination, creating fantastical scenarios and expressive animated characters.

Exploring locations and historical settings

Transcend geographical and temporal boundaries to explore places and past eras.

Pushing the frontier of real-time capabilities

Achieving a high degree of controllability and real-time interactivity in Genie 3 required significant technical breakthroughs. During the auto-regressive generation of each frame, the model has to take into account the previously generated trajectory that grows with time. For example, if the user is revisiting a location after a minute, the model has to refer back to the relevant information from a minute ago. To achieve real-time interactivity, this computation must happen multiple times per second in response to new user inputs as they arrive.

Environmental consistency over a long horizon

In order for AI generated worlds to be immersive, they have to stay physically consistent over long horizons. However, generating an environment auto-regressively is generally a harder technical problem than generating an entire video, since inaccuracies tend to accumulate over time. Despite the challenge, Genie 3 environments remain largely consistent for several minutes, with visual memory extending as far back as one minute ago.

The trees to the left of the building remain consistent throughout the interaction, even as they go in and out of view.

Genie 3’s consistency is an emergent capability. Other methods such as NeRFs and Gaussian Splatting also allow consistent navigable 3D environments, but depend on the provision of an explicit 3D representation. By contrast, worlds generated by Genie 3 are far more dynamic and rich because they’re created frame by frame based on the world description and actions by the user.

Prompt: First-person view drone video. High speed flight into and along a narrow canyon in Iceland with a river at the bottom and moss on the rocks, golden hour, realworld

Promptable world events

In addition to navigational inputs, Genie 3 also enables a more expressive form of text-based interaction, which we refer to as promptable world events.

Promptable world events make it possible to change the generated world, like altering weather conditions or introducing new objects and characters, enhancing the experience from navigation controls.

This ability also increases the breadth of counterfactual, or “what if” scenarios, that can be used by agents learning from experience to handle unexpected situations.

Choose a world setting. Then, pick an event, and see Genie 3 create it.

Fueling embodied agent research

To test the compatibility of Genie 3 created worlds for future agent training, we generated worlds for a recent version of our SIMA agent, our generalist agent for 3D virtual settings. In each world we instructed the agent to pursue a set of distinct goals, which it aims to achieve by sending navigation actions to Genie 3. Like any other environment, Genie 3 is not aware of the agent’s goal, instead it simulates the future based on the agent's actions.

Choose a world setting. Then, pick a goal you'd like an agent to achieve and watch how it accomplishes it.

Since Genie 3 is able to maintain consistency, it is now possible to execute a longer sequence of actions, achieving more complex goals. We expect this technology to play a critical role as we push toward AGI, and agents play a greater role in the world.

Limitations

While Genie 3 pushes the boundaries of what world models can accomplish, it's important to acknowledge its current limitations:

  • Limited action space. Although promptable world events allow for a wide range of environmental interventions, they are not necessarily performed by the agent itself. The range of actions agents can perform directly is currently constrained.
  • Interaction and simulation of other agents. Accurately modeling complex interactions between multiple independent agents in shared environments is still an ongoing research challenge.
  • Accurate representation of real-world locations. Genie 3 is currently unable to simulate real-world locations with perfect geographic accuracy.
  • Text rendering. Clear and legible text is often only generated when provided in the input world description.
  • Limited interaction duration. The model can currently support a few minutes of continuous interaction, rather than extended hours.

Responsibility

We believe foundational technologies require a deep commitment to responsibility from the very beginning. The technical innovations in Genie 3, particularly its open-ended and real-time capabilities, introduce new challenges for safety and responsibility. To address these unique risks while aiming to maximize the benefits, we have worked closely with our Responsible Development & Innovation Team.

At Google DeepMind, we're dedicated to developing our best-in-class models in a way that amplifies human creativity, while limiting unintended impacts. As we continue to explore the potential applications for Genie, we are announcing Genie 3 as a limited research preview, providing early access to a small cohort of academics and creators. This approach allows us to gather crucial feedback and interdisciplinary perspectives as we explore this new frontier and continue to build our understanding of risks and their appropriate mitigations. We look forward to working further with the community to develop this technology in a responsible way.

Next steps

We believe Genie 3 is a significant moment for world models, where they will begin to have an impact on many areas of both AI research and generative media. To that end, we're exploring how we can make Genie 3 available to additional testers in the future.

Genie 3 could create new opportunities for education and training, helping students learn and experts gain experience. Not only can it provide a vast space to train agents like robots and autonomous systems, Genie 3 can also make it possible to evaluate agents’ performance, and explore their weaknesses.

At every step, we’re exploring the implications of our work and developing it for the benefit of humanity, safely and responsibly.

Please cite using the following BibTex

Acknowledgments

Genie 3 was made possible due to key research and engineering contributions from Phil Ball, Jakob Bauer, Frank Belletti, Bethanie Brownfield, Ariel Ephrat, Shlomi Fruchter, Agrim Gupta, Kristian Holsheimer, Aleks Holynski, Jiri Hron, Christos Kaplanis, Marjorie Limont, Matt McGill, Yanko Oliveira, Jack Parker-Holder, Frank Perbet, Guy Scully, Jeremy Shar, Stephen Spencer, Omer Tov, Ruben Villegas, Emma Wang and Jessica Yung.

We thank Andrew Audibert, Cip Baetu, Jordi Berbel, David Bridson, Jake Bruce, Gavin Buttimore, Sarah Chakera, Bilva Chandra, Paul Collins, Alex Cullum, Bogdan Damoc, Vibha Dasagi, Maxime Gazeau, Charles Gbadamosi, Shan Han, Woohyun Han, Ed Hirst, Ashyana Kachra, Lucie Kerley, Kristian Kjems, Eva Knoepfel, Vika Koriakin, Jessica Lo, Cong Lu, Zeb Mehring, Alexandre Moufarek, Henna Nandwani, Valeria Oliveira, Fabio Pardo, Jane Park, Andrew Pierson, Ben Poole, Helen Ran, Nilesh Ray, Tim Salimans, Manuel Sanchez, Igor Saprykin, Amy Shen, Sailesh Sidhwani, Duncan Smith, Joe Stanton, Hamish Tomlinson, Dimple Vijaykumar, Luyu Wang, Piers Wingfield, Nat Wong, Keyang Xu, Christopher Yew, Nick Young and Vadim Zubov for their invaluable partnership in developing and refining key components of this project.

Thanks to Tim Rocktäschel, Satinder Singh, Adrian Bolton, Inbar Mosseri, Aäron van den Oord, Douglas Eck, Dumitru Erhan, Raia Hadsell, Zoubin Gharamani, Koray Kavukcuoglu and Demis Hassabis for their insightful guidance and support throughout the research process.

Feature video was produced by Suz Chambers, Matthew Carey, Alex Chen, Andrew Rhee, JR Schmidt, Scotch Johnson, Heysu Oh, Kaloyan Kolev, Arden Schager, Sam Lawton, Hana Tanimura, Zach Velasco, Ben Wiley, and Dev Valladares. Including samples generated by Signe Nørly, Eleni Shaw, Andeep Toor, Gregory Shaw, and Irina Blok.

We thank Frederic Besse, Tim Harley and the rest of the SIMA team for access to a recent version of their agent.

Finally, we extend our gratitude to Mohammad Babaeizadeh, Gabe Barth-Maron, Parker Beak, Jenny Brennan, Tim Brooks, Max Cant, Harris Chan, Jeff Clune, Kaspar Daugaard, Dumitru Erhan, Ashley Feden, Simon Green, Nik Hemmings, Michael Huber, Jony Hudson, Dirichi Ike-Njoku, Hernan Moraldo, Bonnie Li, Simon Osindero, Georg Ostrovski, Ryan Poplin, Alex Rizkowsky, Giles Ruscoe, Ana Salazar, Guy Simmons, Jeff Stanway, Metin Toksoz-Exley, Xinchen Yan, Petko Yotov, Mingda Zhang and Martin Zlocha for their insights and support.

Genie 3

How to create effective prompts with Genie

SIMA 2: An Agent that Plays, Reasons, and Learns With You in Virtual 3D Worlds

Genie 2: A large-scale foundation world model

A generalist AI agent for 3D virtual environments