Emergent Outlier View Rejection in Visual Geometry Grounded Transformers
By: Jisang Han , Sunghwan Hong , Jaewoo Jung and more
Potential Business Impact:
Makes 3D pictures from photos without bad ones.
Reliable 3D reconstruction from in-the-wild image collections is often hindered by "noisy" images-irrelevant inputs with little or no view overlap with others. While traditional Structure-from-Motion pipelines handle such cases through geometric verification and outlier rejection, feed-forward 3D reconstruction models lack these explicit mechanisms, leading to degraded performance under in-the-wild conditions. In this paper, we discover that the existing feed-forward reconstruction model, e.g., VGGT, despite lacking explicit outlier-rejection mechanisms or noise-aware training, can inherently distinguish distractor images. Through an in-depth analysis under varying proportions of synthetic distractors, we identify a specific layer that naturally exhibits outlier-suppressing behavior. Further probing reveals that this layer encodes discriminative internal representations that enable an effective noise-filtering capability, which we simply leverage to perform outlier-view rejection in feed-forward 3D reconstruction without any additional fine-tuning or supervision. Extensive experiments on both controlled and in-the-wild datasets demonstrate that this implicit filtering mechanism is consistent and generalizes well across diverse scenarios.
Similar Papers
DriveVGGT: Visual Geometry Transformer for Autonomous Driving
CV and Pattern Recognition
Helps self-driving cars see better in 3D.
C3G: Learning Compact 3D Representations with 2K Gaussians
CV and Pattern Recognition
Builds detailed 3D worlds from few pictures.
VGD: Visual Geometry Gaussian Splatting for Feed-Forward Surround-view Driving Reconstruction
CV and Pattern Recognition
Makes self-driving cars see better from all sides.