FactorPortrait: Controllable Portrait Animation via Disentangled Expression, Pose, and Viewpoint
By: Jiapeng Tang , Kai Li , Chengxiang Yin and more
We introduce FactorPortrait, a video diffusion method for controllable portrait animation that enables lifelike synthesis from disentangled control signals of facial expressions, head movement, and camera viewpoints. Given a single portrait image, a driving video, and camera trajectories, our method animates the portrait by transferring facial expressions and head movements from the driving video while simultaneously enabling novel view synthesis from arbitrary viewpoints. We utilize a pre-trained image encoder to extract facial expression latents from the driving video as control signals for animation generation. Such latents implicitly capture nuanced facial expression dynamics with identity and pose information disentangled, and they are efficiently injected into the video diffusion transformer through our proposed expression controller. For camera and head pose control, we employ Plücker ray maps and normal maps rendered from 3D body mesh tracking. To train our model, we curate a large-scale synthetic dataset containing diverse combinations of camera viewpoints, head poses, and facial expression dynamics. Extensive experiments demonstrate that our method outperforms existing approaches in realism, expressiveness, control accuracy, and view consistency.
Similar Papers
Stable Video-Driven Portraits
CV and Pattern Recognition
Makes still pictures talk and move like real people.
MVPortrait: Text-Guided Motion and Emotion Control for Multi-view Vivid Portrait Animation
CV and Pattern Recognition
Makes talking portraits move and change views.
Bringing Your Portrait to 3D Presence
CV and Pattern Recognition
Turns one photo into a moving 3D person.