MetaScenes: Towards Automated Replica Creation for Real-world 3D Scans
By: Huangyue Yu , Baoxiong Jia , Yixin Chen and more
Potential Business Impact:
Creates realistic 3D worlds for robots to learn.
Embodied AI (EAI) research requires high-quality, diverse 3D scenes to effectively support skill acquisition, sim-to-real transfer, and generalization. Achieving these quality standards, however, necessitates the precise replication of real-world object diversity. Existing datasets demonstrate that this process heavily relies on artist-driven designs, which demand substantial human effort and present significant scalability challenges. To scalably produce realistic and interactive 3D scenes, we first present MetaScenes, a large-scale, simulatable 3D scene dataset constructed from real-world scans, which includes 15366 objects spanning 831 fine-grained categories. Then, we introduce Scan2Sim, a robust multi-modal alignment model, which enables the automated, high-quality replacement of assets, thereby eliminating the reliance on artist-driven designs for scaling 3D scenes. We further propose two benchmarks to evaluate MetaScenes: a detailed scene synthesis task focused on small item layouts for robotic manipulation and a domain transfer task in vision-and-language navigation (VLN) to validate cross-domain transfer. Results confirm MetaScene's potential to enhance EAI by supporting more generalizable agent learning and sim-to-real applications, introducing new possibilities for EAI research. Project website: https://meta-scenes.github.io/.
Similar Papers
InternScenes: A Large-scale Simulatable Indoor Scene Dataset with Realistic Layouts
CV and Pattern Recognition
Teaches robots to navigate messy rooms.
HoloScene: Simulation-Ready Interactive 3D Worlds from a Single Video
CV and Pattern Recognition
Creates realistic 3D worlds for games and robots.
SPATIALGEN: Layout-guided 3D Indoor Scene Generation
CV and Pattern Recognition
Builds realistic 3D rooms from pictures.