Gaussian Alignment for Relative Camera Pose Estimation via Single-View Reconstruction
By: Yumin Li, Dylan Campbell
Potential Business Impact:
Helps computers understand 3D space from pictures.
Estimating metric relative camera pose from a pair of images is of great importance for 3D reconstruction and localisation. However, conventional two-view pose estimation methods are not metric, with camera translation known only up to a scale, and struggle with wide baselines and textureless or reflective surfaces. This paper introduces GARPS, a training-free framework that casts this problem as the direct alignment of two independently reconstructed 3D scenes. GARPS leverages a metric monocular depth estimator and a Gaussian scene reconstructor to obtain a metric 3D Gaussian Mixture Model (GMM) for each image. It then refines an initial pose from a feed-forward two-view pose estimator by optimising a differentiable GMM alignment objective. This objective jointly considers geometric structure, view-independent colour, anisotropic covariance, and semantic feature consistency, and is robust to occlusions and texture-poor regions without requiring explicit 2D correspondences. Extensive experiments on the Real\-Estate10K dataset demonstrate that GARPS outperforms both classical and state-of-the-art learning-based methods, including MASt3R. These results highlight the potential of bridging single-view perception with multi-view geometry to achieve robust and metric relative pose estimation.
Similar Papers
Camera Pose Refinement via 3D Gaussian Splatting
CV and Pattern Recognition
Makes 3D pictures more accurate without retraining.
Unposed 3DGS Reconstruction with Probabilistic Procrustes Mapping
CV and Pattern Recognition
Creates detailed 3D worlds from many photos.
PFGS: Pose-Fused 3D Gaussian Splatting for Complete Multi-Pose Object Reconstruction
CV and Pattern Recognition
Makes 3D models of objects from many pictures.