Score: 2

DreamActor-M1: Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance

Published: April 2, 2025 | arXiv ID: 2504.01724v3

By: Yuxuan Luo , Zhengkun Rong , Lizhen Wang and more

BigTech Affiliations: ByteDance

Potential Business Impact:

Makes cartoon characters move and talk realistically.

Business Areas:

Motion Capture Media and Entertainment, Video

While recent image-based human animation methods achieve realistic body and facial motion synthesis, critical gaps remain in fine-grained holistic controllability, multi-scale adaptability, and long-term temporal coherence, which leads to their lower expressiveness and robustness. We propose a diffusion transformer (DiT) based framework, DreamActor-M1, with hybrid guidance to overcome these limitations. For motion guidance, our hybrid control signals that integrate implicit facial representations, 3D head spheres, and 3D body skeletons achieve robust control of facial expressions and body movements, while producing expressive and identity-preserving animations. For scale adaptation, to handle various body poses and image scales ranging from portraits to full-body views, we employ a progressive training strategy using data with varying resolutions and scales. For appearance guidance, we integrate motion patterns from sequential frames with complementary visual references, ensuring long-term temporal coherence for unseen regions during complex movements. Experiments demonstrate that our method outperforms the state-of-the-art works, delivering expressive results for portraits, upper-body, and full-body generation with robust long-term consistency. Project Page: https://grisoon.github.io/DreamActor-M1/.

DreamActor-H1: High-Fidelity Human-Product Demonstration Video Generation via Motion-designed Diffusion Transformers

CV and Pattern Recognition

Makes realistic product videos with people.

12 Jun 2025 2

90%

High-Fidelity and Long-Duration Human Image Animation with Diffusion Transformer

CV and Pattern Recognition

Makes people move realistically in long videos.

26 Dec 2025 1

88%

HumanDiT: Pose-Guided Diffusion Transformer for Long-form Human Motion Video Generation

CV and Pattern Recognition

Makes videos of people move realistically.

7 Feb 2025 1

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

11 pages

DreamActor-M1: Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance

Makes cartoon characters move and talk realistically.

Technical Abstract

DreamActor-H1: High-Fidelity Human-Product Demonstration Video Generation via Motion-designed Diffusion Transformers

High-Fidelity and Long-Duration Human Image Animation with Diffusion Transformer

HumanDiT: Pose-Guided Diffusion Transformer for Long-form Human Motion Video Generation