MDD: A Dataset for Text-and-Music Conditioned Duet Dance Generation
By: Prerit Gupta , Jason Alexander Fotso-Puepi , Zhengyuan Li and more
Potential Business Impact:
Creates dancing robots from music and words.
We introduce Multimodal DuetDance (MDD), a diverse multimodal benchmark dataset designed for text-controlled and music-conditioned 3D duet dance motion generation. Our dataset comprises 620 minutes of high-quality motion capture data performed by professional dancers, synchronized with music, and detailed with over 10K fine-grained natural language descriptions. The annotations capture a rich movement vocabulary, detailing spatial relationships, body movements, and rhythm, making MDD the first dataset to seamlessly integrate human motions, music, and text for duet dance generation. We introduce two novel tasks supported by our dataset: (1) Text-to-Duet, where given music and a textual prompt, both the leader and follower dance motion are generated (2) Text-to-Dance Accompaniment, where given music, textual prompt, and the leader's motion, the follower's motion is generated in a cohesive, text-aligned manner. We include baseline evaluations on both tasks to support future research.
Similar Papers
MotionDuet: Dual-Conditioned 3D Human Motion Generation with Video-Regularized Text Learning
Graphics
Makes computer characters move like real people.
Every Image Listens, Every Image Dances: Music-Driven Image Animation
CV and Pattern Recognition
Makes pictures dance to music and text.
OpenDance: Multimodal Controllable 3D Dance Generation Using Large-scale Internet Data
CV and Pattern Recognition
Makes computers dance like real people to music.