Vision Foundation Models for Domain Generalisable Cross-View Localisation in Planetary Ground-Aerial Robotic Teams
By: Lachlan Holden , Feras Dayoub , Alberto Candela and more
Accurate localisation in planetary robotics enables the advanced autonomy required to support the increased scale and scope of future missions. The successes of the Ingenuity helicopter and multiple planetary orbiters lay the groundwork for future missions that use ground-aerial robotic teams. In this paper, we consider rovers using machine learning to localise themselves in a local aerial map using limited field-of-view monocular ground-view RGB images as input. A key consideration for machine learning methods is that real space data with ground-truth position labels suitable for training is scarce. In this work, we propose a novel method of localising rovers in an aerial map using cross-view-localising dual-encoder deep neural networks. We leverage semantic segmentation with vision foundation models and high volume synthetic data to bridge the domain gap to real images. We also contribute a new cross-view dataset of real-world rover trajectories with corresponding ground-truth localisation data captured in a planetary analogue facility, plus a high volume dataset of analogous synthetic image pairs. Using particle filters for state estimation with the cross-view networks allows accurate position estimation over simple and complex trajectories based on sequences of ground-view images.
Similar Papers
Lifting Vision: Ground to Aerial Localization with Reasoning Guided Planning
Machine Learning (CS)
Helps robots find their way using only pictures.
Aerial-ground Cross-modal Localization: Dataset, Ground-truth, and Benchmark
Robotics
Helps robots find their way using 3D maps.
Fine-Grained Cross-View Localization via Local Feature Matching and Monocular Depth Priors
CV and Pattern Recognition
Finds your location from a picture.