惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Google DeepMind News
Google DeepMind News
F
Fortinet All Blogs
阮一峰的网络日志
阮一峰的网络日志
Apple Machine Learning Research
Apple Machine Learning Research
爱范儿
爱范儿
WordPress大学
WordPress大学
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
J
Java Code Geeks
罗磊的独立博客
S
SegmentFault 最新的问题
V
V2EX
V
Visual Studio Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
美团技术团队
博客园 - 三生石上(FineUI控件)
Stack Overflow Blog
Stack Overflow Blog
Y
Y Combinator Blog
MyScale Blog
MyScale Blog
D
Docker
Google DeepMind News
Google DeepMind News
Blog — PlanetScale
Blog — PlanetScale
M
Microsoft Research Blog - Microsoft Research
Martin Fowler
Martin Fowler
S
Secure Thoughts
B
Blog
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
www.infosecurity-magazine.com
www.infosecurity-magazine.com
Recent Announcements
Recent Announcements
MongoDB | Blog
MongoDB | Blog
C
Cisco Blogs
C
CERT Recently Published Vulnerability Notes
T
True Tiger Recordings
GbyAI
GbyAI
P
Proofpoint News Feed
P
Privacy International News Feed
Jina AI
Jina AI
The Cloudflare Blog
I
Intezer
AWS News Blog
AWS News Blog
Hacker News - Newest:
Hacker News - Newest: "LLM"
S
Security Archives - TechRepublic
NISL@THU
NISL@THU
The Register - Security
The Register - Security
Recent Commits to openclaw:main
Recent Commits to openclaw:main
P
Palo Alto Networks Blog
S
Schneier on Security
L
LINUX DO - 热门话题
C
CXSECURITY Database RSS Feed - CXSecurity.com
Security Latest
Security Latest
C
Cybersecurity and Infrastructure Security Agency CISA

DEV Community

n8n for Airtable Power Users: 5 Automations That Take Your Base to the Next Level Validating Gemma 4 for Industrial IoT: A Governance Pattern VS Code Now Credits Copilot on Every Commit by Default Astro and Islands Architecture: Why Your Portfolio Doesn't Need React for Everything Booting from FAT12: How I added file reading to my x86 kernel Unity’s AI agent went public: the developers of a static analysis tool on what that means for code quality Anna's Archive publica un llms.txt para los LLMs que rastrean su catálogo CRDTs for Offline-First Mobile Sync Why I Built Mneme HQ: Preventing AI Agent Architectural Drift Google Antigravity 2.0 Is the I/O 2026 Announcement You Should Actually Care About I Built a Pay-Per-Call Crypto Signal API with x402 — Heres the Architecture JWT Token Refresh Patterns in React 19: Avoiding the Silent Auth Death Spiral 🚀 “From Prompts to Autonomous Agents: What Google I/O 2026 Changed” The Power of Distributed Consensus in Autonomous SOCs Sixteen TUI components, copy-paste, no dependency The Boring Reliability Layer Every Autonomous Agent Needs Nven - Secret manager Building Multi-Tenant Row-Level Security in PostgreSQL: A Production Pattern The Hardest Part of Being a Developer Isn't Coding Building Vylo — Looking for Collaborators, Partners & Early Support I Thought Memory Fades With Time. It Actually Fades With Information. ORA-00064 오류 원인과 해결 방법 완벽 가이드 I registered an AI agent at 1 AM and something cracked open in my head Pitch: Nven - Sync secrets. Ship faster. Why y=mx+b is the heart of AI From Routines to a Crew — Building a System That Plans Its Own Work & executes it 25 React Interview Questions 2026 (With Answers) — Hooks, React 19, Concurrent Mode An open source LLM eval tool with two independent quality signals Using Dashboard Filtering to Get Customer Usage in Seconds from TBs of Data Skills, Java 17, And Theme Accents 4 Hard Lessons on Optimizing AI Coding Agents Arctype: Cross-Platform Database GUI for LLM Artifacts Your robots.txt says GPTBot is welcome. Your server says 403. Organizing How to Use AWS Glue Workflow 5 n8n Automations Every Digital Agency Should Be Running (Bill More, Work Less) Getting Started with TorchGeo — Remote Sensing with PyTorch Designing a Scalable Cross-Platform Appium Framework Google Antigravity 2.0 & Slash Commands Building a Unified Adaptive Learning Intelligence with Gemma 4, Flutter, and Multi-Model Orchestration Looking for beta testers for a £60 server management application The Disk-Pressure Incident That Taught Me to Always Set LimitRanges and Other Lessons from Mirroring EKS Locally. Why AI Should Not Write SQL Against ERP Databases Vibe coding works until it doesn't. The debt is real. Shipping at the Edge: Migrating a Coffee Subscription Platform to Cloudflare Workers Stop Tab-Switching: A Developer's Guide to Color Tools That Actually Fit the Workflow DevOps vs MLOps vs AIOps: What Changes, What Stays, and a Simple Roadmap to Get Started Run Powerful AI Coding Locally on a Normal Laptop 5 n8n Automations Every WooCommerce Store Needs (Save 10+ Hours/Week) What I Learned Building My Own AI Harness Hytale Servers Will Fail Treasure Hunts Until We Fix Our Event Handling Redux in React: Managing Global State Like a Pro Unfreezing Your GitHub Actions: Troubleshooting Stuck Deployments and Protecting Your Git Repo Statistics Unlocking Project Discoverability on GHES: A Key to Software Engineering Productivity When the Cleanup Code Becomes the Project Rockpack 8.0 - A React Scaffolder Built for the Age of AI-Assisted Development Mismanaging the Treasure Hunt Engine in Hytale Servers Will Get You Killed Stop Calling It an AI Assistant. It’s Already Managing Your Company Why Hardcoded Automations Fail AI Agents Why I built a post-quantum signing API (and why JWT is on borrowed time) Weekend Thought: Frontend Build Tools Suffer From Work Amnesia A 10-Line Playwright Trick That Saved Me Hours on Every Sephora Run AI Is Changing Engineering Culture More Than We Realize Everyone Was Focused on Gemini, But Infinite Scaler Was the Real Twister "Gemma 4 Analyzed My Bank Statements – Apparently I 'Have a Problem' with Coffee and Late-Night Apps" #css #webdev #beginners #codenewbie The Hidden Layer Every AI Developer Must Learn AlphaEvolve: Google DeepMind's Gemini-Powered Evolutionary Coding Agent RDS Reserved Instance Pricing: Every Engine, Every Rule, Real Dollar Savings How To Build An AI-Powered MVP Without Burning Your Startup Budget In 2026 Reading a Psychrometric Chart Without Getting Lost LMR-BENCH: Can LLM Agents Reproduce NLP Research Code? (EMNLP 2025) How to turn text into colors (without AI) Building Real-Time Apps in Node.js with Rivalis: WebSockets, Rooms, Actors, and a Binary Wire This Week In React #282 : Security, Fate, TanStack, Redux, Jotai | Hermes-node, Expo, Rozenite, Harness | TC39, Bun, pnpm, npm, Yarn, Node AI Copilot vs AI Agent Architecture - What's Actually Different (And Why It Matters) Smart Contract Security: NEAR's Futures Surge and AI Token Risks Database Maintenance: Tracing Production Incidents to Their Root Cause Stop juggling AI SDKs in PHP — meet Prisma Google Quietly Changed What “Apps” Mean at I/O 2026 The Infrastructure Team Is the Real Single Point of Failure Building SQLite from Scratch: 740 Lines of C++23 to Understand Every Byte of a .db File The 4 Levels of Hermes Agent Scaling Framework: From One Hermes Agent to a Fully Automated Team Your AI Has a Memory. It Just Doesn’t Know What to Remember. Claprec: Engineering Tradeoffs - Limited time vs. Perfection (6/6) Building a Daily Google News API Monitor in Python Building RookDuel Avikal: From Chess Steganography to Post-Quantum Archival Security Google I/O e IA: o que realmente muda na vida do dev? Color Contrast Failures: The Number One Accessibility Issue and How to Fix It # I Watched 15 Hours of Hermes Agent Videos So You Don't Have To Cómo solucionar el bucle infinito en useEffect con objetos y arrays en React The First Agent-Centric Cloud Security Platform — And Why We Didn't Build It That Way On Purpose Most Treasure Hunts Engines on Hytale Servers Are Built to Fail - Lessons from a Burned Database GhostScan v3.0 — From Closed-Source EXE to Open-Source Pentest Framework De hojas de cálculo a IA: construyendo una plataforma SRM moderna When is AI fine in education? Python Tools for Managing API Rate Limits in Data Pipelines How to Implement Exponential Backoff for Rate-Limited APIs in Python "My Web Chat Wasn't a Real Channel. That Broke My Agent Pipeline" next-advanced-sitemap v1.0.7 — safer URL ingestion & automatic trimming for Next.js sitemap generation I keep seeing people build an AI lead processing agent when they really need a 6-step rules engine
Why dialogue placement is the hardest part of AI comic generation
qcrao · 2026-05-16 · via DEV Community

If you ask people what is hard about AI comics, most will say character consistency. That is a real problem, but it is not the worst one in practice.

The harder problem, in my experience building Comicory, is dialogue placement.

Putting words inside a comic panel sounds trivial. It is one of the parts that breaks down most often, and when it breaks down the whole page reads wrong, even if every individual panel looks beautiful.

Speech bubbles are constraint puzzles

A speech bubble is not just a graphic. It is a constraint that ties together text length, panel composition, character position, reading order, and the actual story logic.

A bubble needs to:

  • Sit somewhere that does not cover the speaker's face.
  • Not cover other important visual information in the panel.
  • Point clearly to the speaker.
  • Be readable in size and contrast.
  • Fit the text without overflowing or shrinking the font.
  • Come before the next bubble in reading order.
  • Match the tone of the line (a whisper bubble looks different from a shout).

Each constraint sounds simple. Together, they collide. A model that generates a perfect-looking panel often leaves no room for the bubble, or covers the character's mouth, or breaks the reading order with the next panel.

Text rendering is still unreliable inside images

A second problem is that most image models still cannot reliably render text inside the generated image.

Even when the model places a bubble shape correctly, the letters inside it may be unreadable. Half a word missing. A weird artifact. A misspelling.

For a comic, this is not cosmetic. The dialogue is half the storytelling. A misread bubble flips the meaning of a panel.

That is why most working systems, including Comicory, do not let the image model write the text at all. The model produces an empty bubble or a marked region. The actual text is composited on top by a typography layer that knows about font, kerning, and bubble fit. That gives clean, predictable letters.

But it also moves the hard problem somewhere new. Now the bubble shape and position have to match a text length that was decided separately.

Length mismatch breaks everything

The most common failure is the mismatch between the planned dialogue length and the visual space the model leaves for it.

Imagine the storyboard says character A has a four-word line. The model generates a panel with a small empty corner for the bubble. Fine. Then in iteration, the user rewrites the line to eighteen words. Suddenly there is no room. The compositor either shrinks the text until it is unreadable, or overflows the panel art.

Solving this is not a one-shot problem. The system has to negotiate between text length and panel composition at every revision step. It needs to know:

  • How big can this bubble grow before it hits important art?
  • How much of the panel is safe to cover?
  • Should the bubble break across multiple shapes if the line is long?
  • Should the line be split into two bubbles for the same speaker?

That is a layout engine, not a prompt.

Reading order is a separate layer

Even if every single bubble fits, the bubbles in a multi-panel page must follow a reading order. Western comics read left to right, top to bottom. Within a panel, bubbles read roughly the same way.

A model that generates panels in isolation has no incentive to keep this order consistent. Panel 1 might have its bubble in the top right, while panel 2 has it in the bottom left, and the eye has to ricochet around the page.

Reading order is the cheap part to fix if the system explicitly models it. It is hard to fix if you only realize after rendering that the page is unreadable.

What the product actually has to do

So the working pipeline for dialogue in an AI comic ends up doing more layout work than image generation work:

  1. Decide dialogue lines per panel during the storyboard stage, so the rendering stage knows how much room each bubble needs.
  2. Tell the image model to leave clear, neutral space in a specific region of each panel.
  3. Composite the real text in a typography layer that the user can tweak font, size, and shape on.
  4. Validate that no bubble covers an important face or hand.
  5. Validate reading order across the page.
  6. Provide a small editor so the user can drag a bubble or split a long line into two.

None of these are heroic. All of them are essential. Skip any one and the page reads worse than a manually drawn comic by an amateur.

The product lesson

The flashy demos of AI comics all show character art. The unglamorous reality is that dialogue placement determines whether the page is actually readable.

For Comicory, treating bubbles as a real layout problem, not a rendering hint, was one of the larger architectural decisions. The model produces art and space. The typography layer produces text and bubble shape. The product glues them together with constraints the user can override.

Image quality matters. Character consistency matters. But until the words sit cleanly in the right place at the right time, no amount of pretty art makes the page feel like a comic.