惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
人人都是产品经理
人人都是产品经理
Cisco Talos Blog
Cisco Talos Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
V
V2EX
博客园 - 三生石上(FineUI控件)
Martin Fowler
Martin Fowler
WordPress大学
WordPress大学
D
Docker
S
SegmentFault 最新的问题
博客园 - 聂微东
美团技术团队
Apple Machine Learning Research
Apple Machine Learning Research
月光博客
月光博客
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Last Week in AI
Last Week in AI
M
MIT News - Artificial intelligence
F
Fortinet All Blogs
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
The GitHub Blog
The GitHub Blog
GbyAI
GbyAI
L
LangChain Blog
Vercel News
Vercel News
博客园 - 叶小钗
MongoDB | Blog
MongoDB | Blog
Stack Overflow Blog
Stack Overflow Blog
H
Help Net Security
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
The Cloudflare Blog
Engineering at Meta
Engineering at Meta
T
Threat Research - Cisco Blogs
T
Threatpost
Scott Helme
Scott Helme
T
Tailwind CSS Blog
Latest news
Latest news
Stack Overflow Blog
Stack Overflow Blog
Blog — PlanetScale
Blog — PlanetScale
The Register - Security
The Register - Security
罗磊的独立博客
P
Proofpoint News Feed
腾讯CDC
S
Schneier on Security
雷峰网
雷峰网
A
About on SuperTechFans
T
Tenable Blog
F
Full Disclosure
Cyberwarzone
Cyberwarzone
博客园_首页
有赞技术团队
有赞技术团队
K
Kaspersky official blog

DEV Community

How to test your code effectively: a practical testing tutorial How does VuReact compile Vue's KeepAlive component to React? Why We Bet on MCP (And What We're Still Figuring Out) China Payment Terms: T/T, LC, Escrow When the LLM Refuses: A Fallback Chain That Salvages Most Refusals Hardware Startup Manufacturing in China: A Founder's Guide OEM vs ODM Electronics China: Which Model to Choose 9 Services, One Architecture: What We Learned Shipping FSx for ONTAP Logs to Every Major Observability Platform PCB Assembly in China: Buyer's Guide How to Source Electronics from China China Factory Audit Checklist We Built a Real-Time AI Research Collaborator Into our JOT writing tool How to Give Claude Access to Snowflake Without Exposing PII The Agent that grows with you What Building Agent_Sudo Taught Me About AI Agent Security (Before I Found Any Users) Abortion Rights Matter PySide6 vs Electron: Why I shipped a 118 MB Windows desktop tool, not a 250 MB cross-platform one MCP Servers for BI Tools: Looker, Tableau, Power BI, Mode (2026) My AI Agent Kept Lying to Me. Then It Tried to Trick Me. Atlan Alternatives: 6 Open-Source Data Catalogs Compared (2026) How I stopped wrestling with regex and started using AI for data extraction How I Built an AI Assistant That Grows Its Own Tools Interactive Floor Plans for Real Estate Developers — Why Static PDFs Are Dead Vue slot to React: How does VuReact handle it? I Found 54 Reliability Issues in My 14-Agent AI System — Here's What Broke I Built 24 Free Browser Tools in 6 Weeks — Here's What I'd Do Differently Octorato: an open-source AI agent OS with built-in per-client FinOps RAG Explained for Beginners: How AI Assistants Stop Making Things Up Curing LLM Hallucinations: Building a Production-Grade Medical RAG with PubMed and Hybrid Search I don't want to write HTML or fight global CSS, so I built a TypeScript DSL FSx for ONTAP Audit Logs with Data Residency in your region with Sumo Logic Someone contributed 3,324 lines to our open K-12 AI lesson library — a 6-unit series asking students to interrogate AI, not just use it My website has two audiences now. I only built for one of them. AI-Powered Root Cause: Correlating File Access with APM via Dynatrace Opus 4.8 ships Dynamic Workflows — hundreds of parallel subagents per session. Read this before you wire it into prod. We Cut $120,000 from Our Cloud Bill Without Sacrificing Reliability Stress Concentration Factor: Why a Small Hole Can Triple Local Stress Streaming an LLM response, in 4 GIFs High-Cardinality File Access Analysis with Honeycomb + OTel Introduction to n8n: Beginner Course Summary What Happens in 2 Milliseconds: Anatomy of a Single HTTP Request Through a Production WAF Why Veltrix Thought It Could Buy Its Way Out of a Distributed Lock Problem 10 Free Developer Utility Tools That Run Entirely in Your Browser 《认知革命播客》:个人AI基础设施的深度实践与安全思辨 Weekend Supervised Vibe Coding Why I Run Claude Code Plugins for Brand Voice Enforcement x.klickd v4.1: Portable, Encrypted, Human-Governed Memory for AI Workflows That Don’t Reset EC2 to Serverless: Modernizing FSx for ONTAP Splunk Integration AI Can Introduce Complexity Without Introducing Noise — But Only If the Repo Knows How to Hold the Complexity 🛠️Building My First AI Agent with Hermes Agent 🤖 I Built a Flutter App with Firebase + MercadoPago and Turned It Into a Starter Kit (Real Production Code) Hermes Commander: An Autonomous Research Assistant Powered by Hermes Agent 🧠 Why Webhooks Fail Behind Firewalls (And Why Every Fix Has the Same Problem) Have Antigravity review prompts update themselves when your codebase changes 5 Browser-Based Image Tools That Work Entirely Offline — No Upload Required 7 Free PDF Tools That Never Upload Your Files — All Client-Side Building a Cloud SIEM from Scratch with AWS Lambda and EventBridge Compound Engineering: A Plugin That Makes Your AI Coding Agent Smarter Over Time "I Reviewed 50 Dev Resumes — These 5 Mistakes Killed Their Chances" How to Test Your SPF Record for Common Mistakes (Step by Step) Building a Real-Time Financial Sentiment API: Handling Noise and LLM Hallucinations Tokyo Transit: How MCP Helped Me Fix a Broken Multi-Agent System Try the Tech Radar #2 — Markdown Typst Converter (Typst's Syntax Is Closer to Markdown Than LaTeX) 🩺 Inside Med AI: How We Engineered a 100M Token Hyper-Scale Clinical Intelligence Suite 🚀 Common Mistakes New Developers Always Make & How to Avoid Them Effectively Session Management, Rate Limiting & Caching using Redis Why Accuracy Is Not Enough: Evaluation Metrics Every AI Engineer Should Understand How I Built One Building Instagram Data Workflows with HikerAPI (Without Maintaining Scrapers) Claude Code can't open my browser. Cowork can't run my tests. So I wired them together. AGTP: A Transport Protocol Built for Agents I built Snipworth a Chrome extension to turn code into shareable images — and keep them for later My Friend's Two Android Apps, Three Months Lost, and Why We Built onTest Blue-Green Deployments Are Invisible. I Made Mine Visible. Here Is How. Need your attention on my current project Why a deleted backup Lambda kept billing 9,400 EBS snapshots Deterministic Telemetry Ingestion Pipeline for GridLoqer Your Deployments Are Causing Downtime. Mine Do Not. Here Is Why How I Built a 7-Layer NL2SQL Guardrail Stack for a Fortune 500 Enterprise Identity in Web3 The Trap of "Perfect" Architecture: What Building a Shopping Cart Taught Me The Browser Boundary Model: APIs, CORS, Cookies, JSON, Files, and SEO ModelChain: Measurable LLM Router with Adaptive Model Selection, Real-Time Scoring, Budget Guards and Failover for Node.js, Edge and Browser I Built a 25-Agent Polish Parliament That Drafts Bills With Real Legal Citations KeyMesh: Zero-Runtime-Dependency API Key Rotation, Circuit Breaker and Failover for Production LLM Applications in Node.js Claude Code's workflow docs are a menu. Building a home server with a mini PC Stop Shipping AI Slop: Build an Anti-Slop Harness Around Your LLM I built an open source SDK to catch AI agent regressions before they ship. Great Stack to Doesn't Work #3 — Redis: "99% Cache Hit Ratio, System Down" The Bug That Passes Every Toolchain Check: Circular Dependencies in JavaScript Great Stack to Doesn't Work Bonus: SQL vs NoSQL: Which One in 2026? Great Stack to Doesn't Work #2 — Kafka: "Where Did My Messages Go?" I built a detention-pay calculator for truckers in a day — unglamourous niches beat another AI wrapper The Same AI Model Can Perform 6x Better: Here's Why SQL-like Queries in FSRS Plugin for Obsidian [Imposter syndrome] Back to the beginning (DevSecOps path) How to Build a Kundali App with Free Vedic Astrology API — Step by Step Ideias Valem Muito Menos do Que Você Imagina [PT-BR] cgroups and Namespaces — The Linux Kernel's Building Blocks Behind Containers
Inworld TTS Paralinguistic Tags Don't Work — Here's What Does
sm1ck · 2026-05-31 · via DEV Community

If you've worked with expressive TTS in the last year you've probably seen the pattern:

She paused. [sigh] "Fine, you can come in."

Enter fullscreen mode Exit fullscreen mode

Inline paralinguistic tags. Half the model demos use them. So when we wired up Inworld TTS-1.5 Max for HoneyChat — Telegram-native AI companion where voice messages are a first-class output — we sprinkled [laugh], [sigh], [breathe] through the prompts and shipped.

The audio sounded fine. Just… exactly the same as before. No laugh. No sigh. The tags were getting read out as silence at best, and as the literal text "sigh" at worst, depending on the voice.

We tested all the variants we could find. None of them moved the needle.

HoneyChat voice stack at a glance:

  • Engine: Inworld TTS-1.5 Max — $10 per 1M characters, currently #1 on the TTS Arena ELO board at 1259 ELO, 15 languages with native pronunciation: en, ru, ja, zh, ko, es, fr, de, it, pt, pl, hi, ar, he, nl.
  • Voice catalog: 312 designed voices (26 character archetypes × 12 languages), stored as voiceId strings in config/archetype_voice_ids.json. Generated via the Voice Design API and managed with core/voice_design.py.
  • Custom voices: Voice Clone Manager (core/voice_clone_manager.py) — persistent voiceId minted from a WAV/MP3 sample.
  • Cache: voice previews + test samples are lazy-loaded from Storj S3 via core/voice_cache.py.
  • Fallback: gTTS (Google) — free, no API key, used if Inworld returns 5xx or budget is exhausted.
  • What we removed to get here: Kokoro (CPU Docker, latency too high) and Chatterbox (GPU on Vast.ai, ops cost too high). Inworld replaced both for a flat per-char cost and dramatically better expressivity.
  • One API gotcha: gender enum is VOICE_GENDER_MALE/VOICE_GENDER_FEMALE, not "male"/"female" strings. Passing the strings 400s silently.

What actually doesn't work

Tried on the same sentence, same voice, side-by-side audio comparison:

Pattern What it did
[laugh] [sigh] Silence in output
(laughs) (sighs) Sometimes read literally
*laughs* *sighs* Silence (asterisks get stripped)
<laugh/> <sigh/> Silence (not valid SSML on Inworld)
<emotion>laugh</emotion> Silence

The Inworld API does not document support for any of these. We had assumed (because every other TTS post on the internet uses them) that they were a universal convention. They are not.

What Inworld does expose is temperature and speakingRate as request parameters, plus a small subset of SSML. The expressivity has to come from those plus how you shape the text itself.

What actually does work

After enough A/B-ing across 26 archetypes × 15 languages, four patterns reliably change the audio output.

1. Asterisks for emphasis

"You did *what?*"

Enter fullscreen mode Exit fullscreen mode

The asterisks get stripped from the spoken text but the emphasised word lands with audible stress. Works in every voice we tried. The cheapest, highest-hit-rate marker.

2. Ellipsis for pause-with-mood

"Fine... you can come in."

Enter fullscreen mode Exit fullscreen mode

Three dots produces a real pause with a tonal drop — the voice equivalent of a sigh, without trying to fake [sigh]. Five dots for a longer pause. The model interprets them as prosodic cues.

3. SSML <break> for hard pauses

<speak>
  She paused. <break time="0.4s"/> "Fine, you can come in."
</speak>

Enter fullscreen mode Exit fullscreen mode

Inworld accepts a useful subset of SSML, and <break> is the one that matters most for expressive speech. 0.2s for a beat, 0.4s for a sigh-pause, 0.8s for a beat-before-a-line-delivery moment. Wrap the whole text in <speak> and the parser handles it.

4. Onomatopoeia for laughs, moans, breath

"Mmm... ha-ha, you're right."
"ahh... I needed that."

Enter fullscreen mode Exit fullscreen mode

The model will render ha-ha, mmm, ahh, oh, nnn as the actual sound, because they're spellings of sounds rather than meta-tags. They sound far more natural than a synthesised [laugh] even when one exists.

For emotional/intimate scenes, rhythmic repeats (ah... ah... ah) carry actual prosody. We use this for breath patterns where another TTS would want a [breathe] marker.

The wrapper that ties it together

In core/voice.py we run every chunk through enrich_for_tts() (line ~772) before handing it to Inworld. Regex-based, language-aware, idempotent:

def enrich_for_tts(text: str, lang: str = "en") -> tuple[str, dict]:
    """Return (preprocessed_text, request_params).
    Strips fake paralinguistic tags, adds SSML breaks where appropriate,
    and bumps temperature/speakingRate for high-emotion scenes."""
    text = _STRIP_FAKE_TAGS.sub("", text)
    text = _ELLIPSIS_TO_BREAK.sub(r'<break time="0.3s"/>', text)
    if "<break" in text:
        text = f"<speak>{text}</speak>"
    params = _detect_mood_params(text, lang)
    return text, params

Enter fullscreen mode Exit fullscreen mode

The mood detector looks for emotional cues (intensity words, repeated punctuation, onomatopoeia density) and bumps temperature and speakingRate for the more expressive scenes. Same model, same voice, much more dynamic output, all without any inline tag that the model would have ignored.

Lessons

  1. Don't assume [laugh]/[sigh] is universal. It isn't. Check the provider's docs and probe.
  2. Probe with side-by-side audio, not just visual diffs. A [sigh] that emits silence looks identical to one that emits a sigh in any log.
  3. Use what the API actually exposes. For Inworld that's temperature, speakingRate, and a useful subset of SSML — not inline tags.
  4. Onomatopoeia beats meta-tags for emotional sounds. "ahh..." is a thing the model can read; [sigh] is a meta-instruction it can't.
  5. Strip the fake tags out of your prompt before sending. Otherwise they leak as text on some voices.

The audio quality jump from these four patterns is meaningful — users notice. The cost is a 30-line preprocessor and the courage to delete every [laugh] your team has been sprinkling for months.


This is from production work at HoneyChat — Telegram-native AI companion where voice messages are a first-class output. Canonical version: honeychat.bot/en/blog/inworld-tts-paralinguistic-tags-alternatives.

HoneyChat Engineering

Sources