JOLT3D: Joint Learning of Talking Heads and 3DMM Parameters with Application to Lip-Sync
By: Sungjoon Park , Minsik Park , Haneol Lee and more
Potential Business Impact:
Makes talking videos look more real.
In this work, we revisit the effectiveness of 3DMM for talking head synthesis by jointly learning a 3D face reconstruction model and a talking head synthesis model. This enables us to obtain a FACS-based blendshape representation of facial expressions that is optimized for talking head synthesis. This contrasts with previous methods that either fit 3DMM parameters to 2D landmarks or rely on pretrained face reconstruction models. Not only does our approach increase the quality of the generated face, but it also allows us to take advantage of the blendshape representation to modify just the mouth region for the purpose of audio-based lip-sync. To this end, we propose a novel lip-sync pipeline that, unlike previous methods, decouples the original chin contour from the lip-synced chin contour, and reduces flickering near the mouth.
Similar Papers
D^3-Talker: Dual-Branch Decoupled Deformation Fields for Few-Shot 3D Talking Head Synthesis
CV and Pattern Recognition
Makes talking videos with less data.
Perceptually Accurate 3D Talking Head Generation: New Definitions, Speech-Mesh Representation, and Evaluation Metrics
Graphics
Makes talking avatars' mouths move correctly with speech.
3DiFACE: Synthesizing and Editing Holistic 3D Facial Animation
Graphics
Makes talking cartoon faces move realistically.