MVHumanNet++: A Large-scale Dataset of Multi-view Daily Dressing Human Captures with Richer Annotations for 3D Human Digitization
By: Chenghong Li , Hongjie Liao , Yihao Zhi and more
Potential Business Impact:
Helps computers understand and create 3D people.
In this era, the success of large language models and text-to-image models can be attributed to the driving force of large-scale datasets. However, in the realm of 3D vision, while significant progress has been achieved in object-centric tasks through large-scale datasets like Objaverse and MVImgNet, human-centric tasks have seen limited advancement, largely due to the absence of a comparable large-scale human dataset. To bridge this gap, we present MVHumanNet++, a dataset that comprises multi-view human action sequences of 4,500 human identities. The primary focus of our work is on collecting human data that features a large number of diverse identities and everyday clothing using multi-view human capture systems, which facilitates easily scalable data collection. Our dataset contains 9,000 daily outfits, 60,000 motion sequences and 645 million frames with extensive annotations, including human masks, camera parameters, 2D and 3D keypoints, SMPL/SMPLX parameters, and corresponding textual descriptions. Additionally, the proposed MVHumanNet++ dataset is enhanced with newly processed normal maps and depth maps, significantly expanding its applicability and utility for advanced human-centric research. To explore the potential of our proposed MVHumanNet++ dataset in various 2D and 3D visual tasks, we conducted several pilot studies to demonstrate the performance improvements and effective applications enabled by the scale provided by MVHumanNet++. As the current largest-scale 3D human dataset, we hope that the release of MVHumanNet++ dataset with annotations will foster further innovations in the domain of 3D human-centric tasks at scale. MVHumanNet++ is publicly available at https://kevinlee09.github.io/research/MVHumanNet++/.
Similar Papers
Motion-X++: A Large-Scale Multimodal 3D Whole-body Human Motion Dataset
CV and Pattern Recognition
Creates realistic human movements from text and sound.
PairHuman: A High-Fidelity Photographic Dataset for Customized Dual-Person Generation
CV and Pattern Recognition
Creates custom pictures of two people together.
MMHOI: Modeling Complex 3D Multi-Human Multi-Object Interactions
CV and Pattern Recognition
Helps computers understand how people use things.