Toward Rich Video Human-Motion2D Generation
By: Ruihao Xi , Xuekuan Wang , Yongcheng Li and more
Potential Business Impact:
Makes computer characters move and interact realistically.
Generating realistic and controllable human motions, particularly those involving rich multi-character interactions, remains a significant challenge due to data scarcity and the complexities of modeling inter-personal dynamics. To address these limitations, we first introduce a new large-scale rich video human motion 2D dataset (Motion2D-Video-150K) comprising 150,000 video sequences. Motion2D-Video-150K features a balanced distribution of diverse single-character and, crucially, double-character interactive actions, each paired with detailed textual descriptions. Building upon this dataset, we propose a novel diffusion-based rich video human motion2D generation (RVHM2D) model. RVHM2D incorporates an enhanced textual conditioning mechanism utilizing either dual text encoders (CLIP-L/B) or T5-XXL with both global and local features. We devise a two-stage training strategy: the model is first trained with a standard diffusion objective, and then fine-tuned using reinforcement learning with an FID-based reward to further enhance motion realism and text alignment. Extensive experiments demonstrate that RVHM2D achieves leading performance on the Motion2D-Video-150K benchmark in generating both single and interactive double-character scenarios.
Similar Papers
MotionDuet: Dual-Conditioned 3D Human Motion Generation with Video-Regularized Text Learning
Graphics
Makes computer characters move like real people.
Learning to Control Physically-simulated 3D Characters via Generating and Mimicking 2D Motions
Graphics
Makes computer characters move like real people from videos.
Video Motion Graphs
CV and Pattern Recognition
Creates new dancing videos from music.