Zero-shot Inexact CAD Model Alignment from a Single Image
By: Pattaramanee Arsomngern , Sasikarn Khwanmuang , Matthias Nießner and more
Potential Business Impact:
Lets computers guess 3D shapes from pictures.
One practical approach to infer 3D scene structure from a single image is to retrieve a closely matching 3D model from a database and align it with the object in the image. Existing methods rely on supervised training with images and pose annotations, which limits them to a narrow set of object categories. To address this, we propose a weakly supervised 9-DoF alignment method for inexact 3D models that requires no pose annotations and generalizes to unseen categories. Our approach derives a novel feature space based on foundation features that ensure multi-view consistency and overcome symmetry ambiguities inherent in foundation features using a self-supervised triplet loss. Additionally, we introduce a texture-invariant pose refinement technique that performs dense alignment in normalized object coordinates, estimated through the enhanced feature space. We conduct extensive evaluations on the real-world ScanNet25k dataset, where our method outperforms SOTA weakly supervised baselines by +4.3% mean alignment accuracy and is the only weakly supervised approach to surpass the supervised ROCA by +2.7%. To assess generalization, we introduce SUN2CAD, a real-world test set with 20 novel object categories, where our method achieves SOTA results without prior training on them.
Similar Papers
Universal Features Guided Zero-Shot Category-Level Object Pose Estimation
CV and Pattern Recognition
Teaches robots to grab new things they've never seen.
Leveraging Automatic CAD Annotations for Supervised Learning in 3D Scene Understanding
CV and Pattern Recognition
Teaches computers to understand 3D objects better.
CA-W3D: Leveraging Context-Aware Knowledge for Weakly Supervised Monocular 3D Detection
CV and Pattern Recognition
Helps cars see in 3D with less training.