Music-Aligned Holistic 3D Dance Generation via Hierarchical Motion Modeling
By: Xiaojie Li , Ronghui Li , Shukai Fang and more
Potential Business Impact:
Makes computers create realistic, music-matching 3D dances.
Well-coordinated, music-aligned holistic dance enhances emotional expressiveness and audience engagement. However, generating such dances remains challenging due to the scarcity of holistic 3D dance datasets, the difficulty of achieving cross-modal alignment between music and dance, and the complexity of modeling interdependent motion across the body, hands, and face. To address these challenges, we introduce SoulDance, a high-precision music-dance paired dataset captured via professional motion capture systems, featuring meticulously annotated holistic dance movements. Building on this dataset, we propose SoulNet, a framework designed to generate music-aligned, kinematically coordinated holistic dance sequences. SoulNet consists of three principal components: (1) Hierarchical Residual Vector Quantization, which models complex, fine-grained motion dependencies across the body, hands, and face; (2) Music-Aligned Generative Model, which composes these hierarchical motion units into expressive and coordinated holistic dance; (3) Music-Motion Retrieval Module, a pre-trained cross-modal model that functions as a music-dance alignment prior, ensuring temporal synchronization and semantic coherence between generated dance and input music throughout the generation process. Extensive experiments demonstrate that SoulNet significantly surpasses existing approaches in generating high-quality, music-coordinated, and well-aligned holistic 3D dance sequences.
Similar Papers
Music-Aligned Holistic 3D Dance Generation via Hierarchical Motion Modeling
Multimedia
Creates realistic 3D dances that perfectly match music.
DanceMeld: Unraveling Dance Phrases with Hierarchical Latent Codes for Music-to-Dance Synthesis
Other Computer Science
Makes computers create realistic dance moves from music.
DanceMosaic: High-Fidelity Dance Generation with Multimodal Editability
Graphics
Creates realistic, editable 3D dances from music and text.