Score: 1

VGGT-SLAM 2.0: Real time Dense Feed-forward Scene Reconstruction

Published: January 27, 2026 | arXiv ID: 2601.19887v1

By: Dominic Maggio, Luca Carlone

BigTech Affiliations: Massachusetts Institute of Technology

Potential Business Impact:

Helps robots map places better and faster.

Business Areas:
Image Recognition Data and Analytics, Software

We present VGGT-SLAM 2.0, a real time RGB feed-forward SLAM system which substantially improves upon VGGT-SLAM for incrementally aligning submaps created from VGGT. Firstly, we remove high-dimensional 15-degree-of-freedom drift and planar degeneracy from VGGT-SLAM by creating a new factor graph design while still addressing the reconstruction ambiguity of VGGT given unknown camera intrinsics. Secondly, by studying the attention layers of VGGT, we show that one of the layers is well suited to assist in image retrieval verification for free without additional training, which enables both rejecting false positive matches and allows for completing more loop closures. Finally, we conduct a suite of experiments which includes showing VGGT-SLAM 2.0 can easily be adapted for open-set object detection and demonstrating real time performance while running online onboard a ground robot using a Jetson Thor. We also test in environments ranging from cluttered indoor apartments and office scenes to a 4,200 square foot barn, and we also demonstrate VGGT-SLAM 2.0 achieves the highest accuracy on the TUM dataset with about 23 percent less pose error than VGGT-SLAM. Code will be released upon publication.

Country of Origin
πŸ‡ΊπŸ‡Έ United States

Page Count
10 pages

Category
Computer Science:
CV and Pattern Recognition