Score: 1

MVPortrait: Text-Guided Motion and Emotion Control for Multi-view Vivid Portrait Animation

Published: March 25, 2025 | arXiv ID: 2503.19383v1

By: Yukang Lin , Hokit Fung , Jianjin Xu and more

Potential Business Impact:

Makes talking portraits move and change views.

Business Areas:

Motion Capture Media and Entertainment, Video

Recent portrait animation methods have made significant strides in generating realistic lip synchronization. However, they often lack explicit control over head movements and facial expressions, and cannot produce videos from multiple viewpoints, resulting in less controllable and expressive animations. Moreover, text-guided portrait animation remains underexplored, despite its user-friendly nature. We present a novel two-stage text-guided framework, MVPortrait (Multi-view Vivid Portrait), to generate expressive multi-view portrait animations that faithfully capture the described motion and emotion. MVPortrait is the first to introduce FLAME as an intermediate representation, effectively embedding facial movements, expressions, and view transformations within its parameter space. In the first stage, we separately train the FLAME motion and emotion diffusion models based on text input. In the second stage, we train a multi-view video generation model conditioned on a reference portrait image and multi-view FLAME rendering sequences from the first stage. Experimental results exhibit that MVPortrait outperforms existing methods in terms of motion and emotion control, as well as view consistency. Furthermore, by leveraging FLAME as a bridge, MVPortrait becomes the first controllable portrait animation framework that is compatible with text, speech, and video as driving signals.

Stable Video-Driven Portraits

CV and Pattern Recognition

Makes still pictures talk and move like real people.

22 Sep 2025 0

90%

FactorPortrait: Controllable Portrait Animation via Disentangled Expression, Pose, and Viewpoint

CV and Pattern Recognition

Makes still pictures move like real people.

12 Dec 2025 0

89%

Controllable Expressive 3D Facial Animation via Diffusion in a Unified Multimodal Space

Multimedia

Makes cartoon faces show real feelings from sound.

14 Apr 2025 0

View PDF Login to Bookmark

Country of Origin

🇨🇳 🇭🇰 🇺🇸 China, Hong Kong, United States

Page Count

16 pages

MVPortrait: Text-Guided Motion and Emotion Control for Multi-view Vivid Portrait Animation

Makes talking portraits move and change views.

Technical Abstract

Stable Video-Driven Portraits

FactorPortrait: Controllable Portrait Animation via Disentangled Expression, Pose, and Viewpoint

Controllable Expressive 3D Facial Animation via Diffusion in a Unified Multimodal Space