
























Abstract:Multi-frame story illustration requires long-horizon coherence beyond single-image text-to-image generation, including narrative decomposition and persistent character identity, layout, and affect across frames. We propose Story-to-Executable Descriptions (S2ED), a training-free, model-agnostic, prompt-layer framework that converts a full story into a sequence of explicit, editable executable descriptions for more consistent rendering. S2ED coordinates three agents to segment the narrative, ground canonical character attributes, and enrich spatial and affective cues, enabling interpretable prompt-carried state propagation and local edits to repair drift without retraining the generator. Experiments on Flintstones and Shakoo Maku show that S2ED improves sequence-level consistency and character fidelity over strong prompting, large-model planning, and a reference training-based method, under both automatic metrics and human judgments. We also deploy S2ED in an end-to-end story-to-storybook system for children's illustrated stories, with a supplementary video.
| Comments: | 6 pages, 5 figures. Accepted by IEEE ICME 2026 |
| Subjects: | Artificial Intelligence (cs.AI) |
| Cite as: | arXiv:2605.22448 [cs.AI] |
| (or arXiv:2605.22448v1 [cs.AI] for this version) | |
| https://doi.org/10.48550/arXiv.2605.22448 arXiv-issued DOI via DataCite (pending registration) |
From: Sijing Yin [view email]
[v1]
Thu, 21 May 2026 13:16:32 UTC (4,992 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。