Score: 1

Towards Authentic Movie Dubbing with Retrieve-Augmented Director-Actor Interaction Learning

Published: November 18, 2025 | arXiv ID: 2511.14249v1

By: Rui Liu, Yuan Zhao, Zhenqi Jia

Potential Business Impact:

Makes movie voices sound like real actors.

Business Areas:

Speech Recognition Data and Analytics, Software

The automatic movie dubbing model generates vivid speech from given scripts, replicating a speaker's timbre from a brief timbre prompt while ensuring lip-sync with the silent video. Existing approaches simulate a simplified workflow where actors dub directly without preparation, overlooking the critical director-actor interaction. In contrast, authentic workflows involve a dynamic collaboration: directors actively engage with actors, guiding them to internalize the context cues, specifically emotion, before performance. To address this issue, we propose a new Retrieve-Augmented Director-Actor Interaction Learning scheme to achieve authentic movie dubbing, termed Authentic-Dubber, which contains three novel mechanisms: (1) We construct a multimodal Reference Footage library to simulate the learning footage provided by directors. Note that we integrate Large Language Models (LLMs) to achieve deep comprehension of emotional representations across multimodal signals. (2) To emulate how actors efficiently and comprehensively internalize director-provided footage during dubbing, we propose an Emotion-Similarity-based Retrieval-Augmentation strategy. This strategy retrieves the most relevant multimodal information that aligns with the target silent video. (3) We develop a Progressive Graph-based speech generation approach that incrementally incorporates the retrieved multimodal emotional knowledge, thereby simulating the actor's final dubbing process. The above mechanisms enable the Authentic-Dubber to faithfully replicate the authentic dubbing workflow, achieving comprehensive improvements in emotional expressiveness. Both subjective and objective evaluations on the V2C Animation benchmark dataset validate the effectiveness. The code and demos are available at https://github.com/AI-S2-Lab/Authentic-Dubber.

FlowDubber: Movie Dubbing with LLM-based Semantic-aware Learning and Flow Matching based Voice Enhancing

Multimedia

Makes movie voices match the actor's mouth.

2 May 2025 0

88%

SyncVoice: Towards Video Dubbing with Vision-Augmented Pretrained TTS Model

Audio and Speech Processing

Makes videos speak in any language, perfectly synced.

23 Nov 2025 0

88%

Towards Film-Making Production Dialogue, Narration, Monologue Adaptive Moving Dubbing Benchmarks

Machine Learning (CS)

Makes movie dubbing sound more natural for actors.

30 Apr 2025 2

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Repos / Data Links

github.com

Page Count

9 pages

Towards Authentic Movie Dubbing with Retrieve-Augmented Director-Actor Interaction Learning

Makes movie voices sound like real actors.

Technical Abstract

FlowDubber: Movie Dubbing with LLM-based Semantic-aware Learning and Flow Matching based Voice Enhancing

SyncVoice: Towards Video Dubbing with Vision-Augmented Pretrained TTS Model

Towards Film-Making Production Dialogue, Narration, Monologue Adaptive Moving Dubbing Benchmarks