MAGIC-Talk: Motion-aware Audio-Driven Talking Face Generation with Customizable Identity Control
By: Fatemeh Nazarieh , Zhenhua Feng , Diptesh Kanojia and more
Potential Business Impact:
Makes videos of people talking from one picture.
Audio-driven talking face generation has gained significant attention for applications in digital media and virtual avatars. While recent methods improve audio-lip synchronization, they often struggle with temporal consistency, identity preservation, and customization, especially in long video generation. To address these issues, we propose MAGIC-Talk, a one-shot diffusion-based framework for customizable and temporally stable talking face generation. MAGIC-Talk consists of ReferenceNet, which preserves identity and enables fine-grained facial editing via text prompts, and AnimateNet, which enhances motion coherence using structured motion priors. Unlike previous methods requiring multiple reference images or fine-tuning, MAGIC-Talk maintains identity from a single image while ensuring smooth transitions across frames. Additionally, a progressive latent fusion strategy is introduced to improve long-form video quality by reducing motion inconsistencies and flickering. Extensive experiments demonstrate that MAGIC-Talk outperforms state-of-the-art methods in visual quality, identity preservation, and synchronization accuracy, offering a robust solution for talking face generation.
Similar Papers
IMTalker: Efficient Audio-driven Talking Face Generation with Implicit Motion Transfer
CV and Pattern Recognition
Makes faces talk realistically from pictures.
MagicInfinite: Generating Infinite Talking Videos with Your Words and Voice
CV and Pattern Recognition
Makes cartoon characters talk and move realistically.
Mask-Free Audio-driven Talking Face Generation for Enhanced Visual Quality and Identity Preservation
CV and Pattern Recognition
Makes faces talk realistically from sound.