Language-guided 3D scene synthesis for fine-grained functionality understanding
By: Jaime Corsetti , Francesco Giuliari , Davide Boscaini and more
Potential Business Impact:
Creates fake 3D rooms for robots to learn.
Functionality understanding in 3D, which aims to identify the functional element in a 3D scene to complete an action (e.g., the correct handle to "Open the second drawer of the cabinet near the bed"), is hindered by the scarcity of real-world data due to the substantial effort needed for its collection and annotation. To address this, we introduce SynthFun3D, the first method for task-based 3D scene synthesis. Given the action description, SynthFun3D generates a 3D indoor environment using a furniture asset database with part-level annotation, ensuring the action can be accomplished. It reasons about the action to automatically identify and retrieve the 3D mask of the correct functional element, enabling the inexpensive and large-scale generation of high-quality annotated data. We validate SynthFun3D through user studies, which demonstrate improved scene-prompt coherence compared to other approaches. Our quantitative results further show that the generated data can either replace real data with minor performance loss or supplement real data for improved performance, thereby providing an inexpensive and scalable solution for data-hungry 3D applications. Project page: github.com/tev-fbk/synthfun3d.
Similar Papers
SPATIALGEN: Layout-guided 3D Indoor Scene Generation
CV and Pattern Recognition
Builds realistic 3D rooms from pictures.
FunGraph: Functionality Aware 3D Scene Graphs for Language-Prompted Scene Interaction
CV and Pattern Recognition
Robots learn to use objects by seeing their parts.
GEN3D: Generating Domain-Free 3D Scenes from a Single Image
CV and Pattern Recognition
Creates realistic 3D worlds from one picture.