Score: 2

TaleDiffusion: Multi-Character Story Generation with Dialogue Rendering

Published: September 4, 2025 | arXiv ID: 2509.04123v1

By: Ayan Banerjee , Josep Lladós , Umapada Pal and more

Potential Business Impact:

Makes stories show characters acting right.

Business Areas:

Text Analytics Data and Analytics, Software

Text-to-story visualization is challenging due to the need for consistent interaction among multiple characters across frames. Existing methods struggle with character consistency, leading to artifact generation and inaccurate dialogue rendering, which results in disjointed storytelling. In response, we introduce TaleDiffusion, a novel framework for generating multi-character stories with an iterative process, maintaining character consistency, and accurate dialogue assignment via postprocessing. Given a story, we use a pre-trained LLM to generate per-frame descriptions, character details, and dialogues via in-context learning, followed by a bounded attention-based per-box mask technique to control character interactions and minimize artifacts. We then apply an identity-consistent self-attention mechanism to ensure character consistency across frames and region-aware cross-attention for precise object placement. Dialogues are also rendered as bubbles and assigned to characters via CLIPSeg. Experimental results demonstrate that TaleDiffusion outperforms existing methods in consistency, noise reduction, and dialogue rendering.

Playmate2: Training-Free Multi-Character Audio-Driven Animation via Diffusion Transformer with Reward Feedback

CV and Pattern Recognition

Makes videos of people talking from sound.

14 Oct 2025 1

88%

Lay2Story: Extending Diffusion Transformers for Layout-Togglable Story Generation

CV and Pattern Recognition

Makes cartoon characters stay the same in stories.

12 Aug 2025 2

88%

TalkingPose: Efficient Face and Gesture Animation with Feedback-guided Diffusion Model

CV and Pattern Recognition

Creates long, smooth talking animations from pictures.

30 Nov 2025 1

View PDF Login to Bookmark

Country of Origin

🇮🇳 🇪🇸 🇬🇧 United Kingdom, India, Spain

Repos / Data Links

github.com github.com

Page Count

25 pages

TaleDiffusion: Multi-Character Story Generation with Dialogue Rendering

Makes stories show characters acting right.

Technical Abstract

Playmate2: Training-Free Multi-Character Audio-Driven Animation via Diffusion Transformer with Reward Feedback

Lay2Story: Extending Diffusion Transformers for Layout-Togglable Story Generation

TalkingPose: Efficient Face and Gesture Animation with Feedback-guided Diffusion Model