Common3D: Self-Supervised Learning of 3D Morphable Models for Common Objects in Neural Feature Space
By: Leonhard Sommer , Olaf Dünkel , Christian Theobalt and more
Potential Business Impact:
Teaches computers to see objects in 3D from videos.
3D morphable models (3DMMs) are a powerful tool to represent the possible shapes and appearances of an object category. Given a single test image, 3DMMs can be used to solve various tasks, such as predicting the 3D shape, pose, semantic correspondence, and instance segmentation of an object. Unfortunately, 3DMMs are only available for very few object categories that are of particular interest, like faces or human bodies, as they require a demanding 3D data acquisition and category-specific training process. In contrast, we introduce a new method, Common3D, that learns 3DMMs of common objects in a fully self-supervised manner from a collection of object-centric videos. For this purpose, our model represents objects as a learned 3D template mesh and a deformation field that is parameterized as an image-conditioned neural network. Different from prior works, Common3D represents the object appearance with neural features instead of RGB colors, which enables the learning of more generalizable representations through an abstraction from pixel intensities. Importantly, we train the appearance features using a contrastive objective by exploiting the correspondences defined through the deformable template mesh. This leads to higher quality correspondence features compared to related works and a significantly improved model performance at estimating 3D object pose and semantic correspondence. Common3D is the first completely self-supervised method that can solve various vision tasks in a zero-shot manner.
Similar Papers
Pixel3DMM: Versatile Screen-Space Priors for Single-Image 3D Face Reconstruction
CV and Pattern Recognition
Makes 2D photos look like 3D faces.
DINeMo: Learning Neural Mesh Models with no 3D Annotations
CV and Pattern Recognition
Teaches robots to see objects in 3D without special labels.
3DFroMLLM: 3D Prototype Generation only from Pretrained Multimodal LLMs
CV and Pattern Recognition
Makes computers build 3D shapes from words.