VGGT-SLAM 2.0: Real time Dense Feed-forward Scene Reconstruction
By: Dominic Maggio, Luca Carlone
Potential Business Impact:
Helps robots map places better and faster.
We present VGGT-SLAM 2.0, a real time RGB feed-forward SLAM system which substantially improves upon VGGT-SLAM for incrementally aligning submaps created from VGGT. Firstly, we remove high-dimensional 15-degree-of-freedom drift and planar degeneracy from VGGT-SLAM by creating a new factor graph design while still addressing the reconstruction ambiguity of VGGT given unknown camera intrinsics. Secondly, by studying the attention layers of VGGT, we show that one of the layers is well suited to assist in image retrieval verification for free without additional training, which enables both rejecting false positive matches and allows for completing more loop closures. Finally, we conduct a suite of experiments which includes showing VGGT-SLAM 2.0 can easily be adapted for open-set object detection and demonstrating real time performance while running online onboard a ground robot using a Jetson Thor. We also test in environments ranging from cluttered indoor apartments and office scenes to a 4,200 square foot barn, and we also demonstrate VGGT-SLAM 2.0 achieves the highest accuracy on the TUM dataset with about 23 percent less pose error than VGGT-SLAM. Code will be released upon publication.
Similar Papers
Building temporally coherent 3D maps with VGGT for memory-efficient Semantic SLAM
CV and Pattern Recognition
Helps robots see and understand moving things.
VGGT-SLAM: Dense RGB SLAM Optimized on the SL(4) Manifold
CV and Pattern Recognition
Helps robots map rooms using just one camera.
Pseudo Depth Meets Gaussian: A Feed-forward RGB SLAM Baseline
CV and Pattern Recognition
Makes 3D models from videos much faster.