惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
V
Vulnerabilities – Threatpost
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
V
Visual Studio Blog
月光博客
月光博客
IT之家
IT之家
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
T
Tailwind CSS Blog
罗磊的独立博客
S
SegmentFault 最新的问题
博客园 - 三生石上(FineUI控件)
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
量子位
V
V2EX
Jina AI
Jina AI
The GitHub Blog
The GitHub Blog
小众软件
小众软件
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
阮一峰的网络日志
阮一峰的网络日志
Recent Announcements
Recent Announcements
MongoDB | Blog
MongoDB | Blog
Y
Y Combinator Blog
H
Help Net Security
博客园_首页
Cyberwarzone
Cyberwarzone
T
Tenable Blog
A
Arctic Wolf
C
CERT Recently Published Vulnerability Notes
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
T
Threat Research - Cisco Blogs
aimingoo的专栏
aimingoo的专栏
Google DeepMind News
Google DeepMind News
博客园 - 叶小钗
C
Cyber Attacks, Cyber Crime and Cyber Security
美团技术团队
Attack and Defense Labs
Attack and Defense Labs
GbyAI
GbyAI
博客园 - 【当耐特】
Cloudbric
Cloudbric
NISL@THU
NISL@THU
B
Blog RSS Feed
K
Kaspersky official blog
Hugging Face - Blog
Hugging Face - Blog
P
Privacy International News Feed
博客园 - Franky
博客园 - 司徒正美
Microsoft Azure Blog
Microsoft Azure Blog
Apple Machine Learning Research
Apple Machine Learning Research
Webroot Blog
Webroot Blog
Microsoft Security Blog
Microsoft Security Blog

Google DeepMind News

Investing in multi-agent AI safety research DiffusionGemma: 4x faster text generation Fluid, natural voice translation with Gemini 3.5 Live Translate Measuring the impact of learning with AI in Sierra Leone and beyond Powering the future of robotics in Europe Introducing Gemma 4 12B: a unified, encoder-free multimodal model Strengthening Singapore’s AI Future: A New National Partnership Simulate real-world places with Project Genie and Street View Introducing Gemini Omni Gemini for Science: AI experiments and tools for a new era of discovery Making it easier to understand how content was created and edited Gemini 3.5: frontier intelligence with action Co-Scientist: A multi-agent AI partner to accelerate research How WeatherNext helped the National Hurricane Center better predict Hurricane Melissa’s historic landfall in Jamaica Fast-tracking genetic leads to reverse cellular aging Finding the molecular switches behind new infectious diseases Opening new paths in aging research Accelerating discovery of liver disease mechanisms Uniting biological toolkits for a new approach to ALS Uncovering repurposed medicines to fight liver fibrosis Google Antigravity We’re launching the Google DeepMind Accelerator program in Asia Pacific to tackle environmental risks. Reimagining the mouse pointer for the AI era AlphaEvolve: How our Gemini-powered coding agent is scaling impact across fields Enabling a new model for healthcare with AI co-clinician Announcing our partnership with the Republic of Korea Decoupled DiLoCo: A new frontier for resilient, distributed AI training Partnering with industry leaders to accelerate AI transformation Gemini Robotics-ER 1.6: Powering real-world robotics tasks through enhanced embodied reasoning Gemma 4: Byte for byte, the most capable open models Gemini 3.1 Flash Live: Making audio AI more natural and reliable Protecting people from harmful manipulation Lyria 3 Pro: Create longer tracks in more Google products Measuring progress toward AGI: A cognitive framework From games to biology and beyond: 10 years of AlphaGo’s impact Gemini 3.1 Flash-Lite: Built for intelligence at scale Nano Banana 2: Combining Pro capabilities with lightning-fast speed Gemini 3.1 Pro: A smarter model for your most complex tasks A new way to express yourself: Gemini can now create music Accelerating discovery in India through AI-powered science and education Gemini 3 Deep Think: Advancing science, research and engineering Accelerating Mathematical and Scientific Discovery with Gemini Deep Think Project Genie: Experimenting with infinite, interactive worlds D4RT: Teaching AI to see the world in four dimensions Veo 3.1 Ingredients to Video: More consistency, creativity and control Google's year in review: 8 areas with research breakthroughs in 2025 Gemma Scope 2: helping the AI safety community deepen understanding of complex language model behavior Google DeepMind supports U.S. Department of Energy on Genesis: a national mission to accelerate innovation and scientific discovery Gemini 3 Flash: frontier intelligence built for speed Improved Gemini audio models for powerful voice interactions Deepening our partnership with the UK AI Security Institute Strengthening our partnership with the UK government to support prosperity and security in the AI era FACTS Benchmark Suite: Systematically evaluating the factuality of large language models Engineering more resilient crops for a warming climate AlphaFold: Five years of impact Revealing a key protein behind heart disease How we’re bringing AI image verification to the Gemini app Build with Nano Banana Pro, our Gemini 3 Pro Image model Introducing Nano Banana Pro We’re expanding our presence in Singapore to advance AI in the Asia-Pacific region Start building with Gemini 3 A new era of intelligence with Gemini 3 Google Antigravity WeatherNext 2: Our most advanced weather forecasting model SIMA 2: An Agent that Plays, Reasons, and Learns With You in Virtual 3D Worlds Teaching AI to see the world more like we do How AI is giving Northern Ireland teachers time back Mapping, modeling, and understanding nature with AI Accelerating discovery with the AI for Math Initiative MedGemma: Our most capable open models for health AI development VaultGemma: The world's most capable differentially private LLM Bringing AI to the next generation of fusion energy Introducing Veo 3.1 and advanced capabilities in Flow How a Gemma model helped discover a new potential cancer therapy pathway Introducing the Gemini 2.5 Computer Use model Introducing CodeMender: an AI agent for code security Gemini Robotics 1.5 brings AI agents into the physical world Strengthening our Frontier Safety Framework Discovering new solutions to century-old problems in fluid dynamics Gemini achieves gold-medal level at the International Collegiate Programming Contest World Finals Using AI to perceive the universe in greater depth Image editing in Gemini just got a major upgrade Introducing Gemma 3 270M: The compact model for hyper-efficient AI How AI is helping advance the science of bioacoustics to save endangered species Genie 3: A new frontier for world models Rethinking how we measure AI intelligence Try Deep Think in the Gemini app AlphaEarth Foundations helps map our planet in unprecedented detail Aeneas transforms how historians connect the past Gemini 2.5 Flash-Lite is now stable and generally available Exploring the context of online images with Backstory Advanced version of Gemini with Deep Think officially achieves gold-medal standard at the International Mathematical Olympiad T5Gemma: A new collection of encoder-decoder Gemma models Introducing Gemma 3n: The developer guide AlphaGenome: AI for better understanding the genome Gemini Robotics On-Device brings AI to local robotic devices We’re expanding our Gemini 2.5 family of models Gemini 2.5: Updates to our family of thinking models Behind “ANCESTRA”: combining Veo with live-action filmmaking How we're supporting better tropical cyclone prediction with AI
Gemini 3.1 Flash TTS: the next generation of expressive AI speech
Vilobh Meshram · 2026-04-15 · via Google DeepMind News

Our newest audio model introduces granular audio tags that give you precise control to direct AI speech for expressive audio generation.

Max Gubin

Principal Research Engineer on behalf of the Gemini team

General summary

Gemini 3.1 Flash TTS is here, giving you improved AI speech quality and control. You can now use audio tags to adjust vocal style and pacing in over 70 languages. Test it out in Google AI Studio, Vertex AI, and Google Vids, and know that all audio is watermarked with SynthID to prevent misinformation.

Summaries were generated by Google AI. Generative AI is experimental.

Bullet points

  • "Gemini 3.1 Flash TTS" is a new AI speech model with better control, expressiveness, and quality.
  • This model has improved speech quality, making it sound more natural than previous versions.
  • Audio tags let you control vocal style, pace, and delivery using natural language commands.
  • Developers can use Google AI Studio to fine-tune voices and export settings for consistent use.
  • Gemini 3.1 Flash TTS supports 70+ languages and uses SynthID watermarking to identify AI-generated audio.

Summaries were generated by Google AI. Generative AI is experimental.

Basic explainer

Gemini 3.1 Flash TTS is a new AI that makes computer speech sound more real. It lets people change how the AI talks by using special commands in the text. This AI can speak in over 70 languages and adds a hidden watermark to the audio. This helps people know it's AI-generated and not a real person.

Summaries were generated by Google AI. Generative AI is experimental.

Explore other styles:

Gemini logo next to the text "3.1 Flash TTS", all over colored dots

Your browser does not support the audio element.

Listen to article

This content is generated by Google AI. Generative AI is experimental

[[duration]] minutes

Today, we’re introducing Gemini 3.1 Flash TTS, the latest text-to-speech model that delivers improved controllability, expressivity and quality — empowering developers, enterprises and everyday users to build the next generation of AI-speech applications.

Starting today, 3.1 Flash TTS is rolling out:

Improved speech quality and controllability

We’ve improved the overall speech quality of Gemini 3.1 Flash TTS, making it our most natural and expressive model to date. On the Artificial Analysis TTS leaderboard, a benchmark that captures thousands of blind human preferences, 3.1 Flash TTS achieved an impressive Elo score of 1,211.

a gif showing artificial analysis text to speech arena quality elo

Artificial Analysis has also positioned Gemini 3.1 Flash TTS within its “most attractive quadrant” for its ideal blend of high-quality speech generation and low cost. The model stands out further with native multi-speaker dialogue, support for 70+ languages, and granular creative control via natural language.

New audio tags for more expressive speech generation

3.1 Flash TTS also introduces audio tags — an intuitive way to control vocal style, pace and delivery. By embedding natural language commands directly into the text input, you can steer AI-speech output with improved levels of granularity.

You can start experimenting with these audio tags along with other updates to the developer experience in Google AI Studio with configurable controls that place the developer in the “director’s chair”:

  • Scene direction: Set the stage by defining the environment and providing specific dialogue instructions. This world-building context helps characters remain “in-character” and react to one another naturally across multiple turns.
  • Speaker-level specificity: Cast characters using unique Audio Profiles, then specify Director’s Notes to toggle pace, tone and accent. Using inline tags, speakers can pivot from these high-level settings to change expression mid-sentence.
  • Seamless export: Once the performance is perfected, these exact parameters can be exported as Gemini API code to ensure consistent, recognizable voices across various projects and platforms.

With these new configurations, developers can enhance precision for specific scenarios, creating memorable characters and immersive audio experiences.

Built for global scale

Gemini 3.1 Flash TTS delivers high-fidelity speech and more precise control across more than 70 languages. These core optimizations bring advanced style, pacing and accent control to major markets — helping developers create localized, expressive speech experiences for users at global scale.

Early developer and enterprise testers are already seeing the impact of 3.1 Flash TTS, highlighting its impressive controllability and expressivity. They’ve told us how audio tags provide a new level of creative precision, transforming simple text into a high-fidelity vocal performance.

Watermarked with SynthID

All audio generated by Gemini 3.1 Flash TTS is watermarked with SynthID. This imperceptible watermark is interwoven directly into the audio output, allowing the reliable detection of AI-generated content to help prevent misinformation. For more information on our approach to safety and responsibility, you can review the model card.

Get more stories from Google in your inbox.

Done. Just one step more.

Check your inbox to confirm your subscription.

You are already subscribed to our newsletter.

You can also subscribe with a

Related stories