GaussianHeadTalk: Wobble-Free 3D Talking Heads with Audio Driven Gaussian Splatting
By: Madhav Agarwal , Mingtian Zhang , Laura Sevilla-Lara and more
Potential Business Impact:
Creates real-time talking avatars from sound.
Speech-driven talking heads have recently emerged and enable interactive avatars. However, real-world applications are limited, as current methods achieve high visual fidelity but slow or fast yet temporally unstable. Diffusion methods provide realistic image generation, yet struggle with oneshot settings. Gaussian Splatting approaches are real-time, yet inaccuracies in facial tracking, or inconsistent Gaussian mappings, lead to unstable outputs and video artifacts that are detrimental to realistic use cases. We address this problem by mapping Gaussian Splatting using 3D Morphable Models to generate person-specific avatars. We introduce transformer-based prediction of model parameters, directly from audio, to drive temporal consistency. From monocular video and independent audio speech inputs, our method enables generation of real-time talking head videos where we report competitive quantitative and qualitative performance.
Similar Papers
PGSTalker: Real-Time Audio-Driven Talking Head Generation via 3D Gaussian Splatting with Pixel-Aware Density Control
Sound
Makes computer faces talk realistically with sound.
Supervising 3D Talking Head Avatars with Analysis-by-Audio-Synthesis
Graphics
Makes computer faces talk and show feelings.
AHA! Animating Human Avatars in Diverse Scenes with Gaussian Splatting
CV and Pattern Recognition
Makes animated people look real in 3D videos.