Score: 0

OpenDance: Multimodal Controllable 3D Dance Generation Using Large-scale Internet Data

Published: June 9, 2025 | arXiv ID: 2506.07565v1

By: Jinlu Zhang, Zixi Kang, Yizhou Wang

Potential Business Impact:

Makes computers dance like real people to music.

Business Areas:

Motion Capture Media and Entertainment, Video

Music-driven dance generation offers significant creative potential yet faces considerable challenges. The absence of fine-grained multimodal data and the difficulty of flexible multi-conditional generation limit previous works on generation controllability and diversity in practice. In this paper, we build OpenDance5D, an extensive human dance dataset comprising over 101 hours across 14 distinct genres. Each sample has five modalities to facilitate robust cross-modal learning: RGB video, audio, 2D keypoints, 3D motion, and fine-grained textual descriptions from human arts. Furthermore, we propose OpenDanceNet, a unified masked modeling framework for controllable dance generation conditioned on music and arbitrary combinations of text prompts, keypoints, or character positioning. Comprehensive experiments demonstrate that OpenDanceNet achieves high-fidelity and flexible controllability.