OpenDance: Multimodal Controllable 3D Dance Generation Using Large-scale Internet Data
By: Jinlu Zhang, Zixi Kang, Yizhou Wang
Potential Business Impact:
Makes computers dance like real people to music.
Music-driven dance generation offers significant creative potential yet faces considerable challenges. The absence of fine-grained multimodal data and the difficulty of flexible multi-conditional generation limit previous works on generation controllability and diversity in practice. In this paper, we build OpenDance5D, an extensive human dance dataset comprising over 101 hours across 14 distinct genres. Each sample has five modalities to facilitate robust cross-modal learning: RGB video, audio, 2D keypoints, 3D motion, and fine-grained textual descriptions from human arts. Furthermore, we propose OpenDanceNet, a unified masked modeling framework for controllable dance generation conditioned on music and arbitrary combinations of text prompts, keypoints, or character positioning. Comprehensive experiments demonstrate that OpenDanceNet achieves high-fidelity and flexible controllability.
Similar Papers
DanceMosaic: High-Fidelity Dance Generation with Multimodal Editability
Graphics
Creates realistic, editable 3D dances from music and text.
MDD: A Dataset for Text-and-Music Conditioned Duet Dance Generation
Graphics
Creates dancing robots from music and words.
DanceChat: Large Language Model-Guided Music-to-Dance Generation
CV and Pattern Recognition
Makes music turn into cool dance moves.