Learning 3D Object Spatial Relationships from Pre-trained 2D Diffusion Models
By: Sangwon Baik, Hyeonwoo Kim, Hanbyul Joo
Potential Business Impact:
Teaches computers how objects fit together in 3D.
We present a method for learning 3D spatial relationships between object pairs, referred to as object-object spatial relationships (OOR), by leveraging synthetically generated 3D samples from pre-trained 2D diffusion models. We hypothesize that images synthesized by 2D diffusion models inherently capture realistic OOR cues, enabling efficient collection of a 3D dataset to learn OOR for various unbounded object categories. Our approach synthesizes diverse images that capture plausible OOR cues, which we then uplift into 3D samples. Leveraging our diverse collection of 3D samples for the object pairs, we train a score-based OOR diffusion model to learn the distribution of their relative spatial relationships. Additionally, we extend our pairwise OOR to multi-object OOR by enforcing consistency across pairwise relations and preventing object collisions. Extensive experiments demonstrate the robustness of our method across various object-object spatial relationships, along with its applicability to 3D scene arrangement tasks and human motion synthesis using our OOR diffusion model.
Similar Papers
Video Spatial Reasoning with Object-Centric 3D Rollout
CV and Pattern Recognition
Teaches computers to understand 3D object locations in videos.
Generalized Visual Relation Detection with Diffusion Models
CV and Pattern Recognition
Helps computers see relationships beyond labels.
Orientation Matters: Making 3D Generative Models Orientation-Aligned
CV and Pattern Recognition
Makes 3D models stand up straight from pictures.