RealTalk: Realistic Emotion-Aware Lifelike Talking-Head Synthesis
By: Wenqing Wang, Yun Fu
Potential Business Impact:
Makes computer faces show real feelings from voices.
Emotion is a critical component of artificial social intelligence. However, while current methods excel in lip synchronization and image quality, they often fail to generate accurate and controllable emotional expressions while preserving the subject's identity. To address this challenge, we introduce RealTalk, a novel framework for synthesizing emotional talking heads with high emotion accuracy, enhanced emotion controllability, and robust identity preservation. RealTalk employs a variational autoencoder (VAE) to generate 3D facial landmarks from driving audio, which are concatenated with emotion-label embeddings using a ResNet-based landmark deformation model (LDM) to produce emotional landmarks. These landmarks and facial blendshape coefficients jointly condition a novel tri-plane attention Neural Radiance Field (NeRF) to synthesize highly realistic emotional talking heads. Extensive experiments demonstrate that RealTalk outperforms existing methods in emotion accuracy, controllability, and identity preservation, advancing the development of socially intelligent AI systems.
Similar Papers
EAI-Avatar: Emotion-Aware Interactive Talking Head Generation
Audio and Speech Processing
Makes talking robots show real feelings.
Taming Transformer for Emotion-Controllable Talking Face Generation
CV and Pattern Recognition
Makes videos of people talking with emotions.
MEDTalk: Multimodal Controlled 3D Facial Animation with Dynamic Emotions by Disentangled Embedding
CV and Pattern Recognition
Makes talking faces show real feelings from sound.