GeoBridge: A Semantic-Anchored Multi-View Foundation Model Bridging Images and Text for Geo-Localization
By: Zixuan Song , Jing Zhang , Di Wang and more
Potential Business Impact:
Find places using different kinds of pictures.
Cross-view geo-localization infers a location by retrieving geo-tagged reference images that visually correspond to a query image. However, the traditional satellite-centric paradigm limits robustness when high-resolution or up-to-date satellite imagery is unavailable. It further underexploits complementary cues across views (e.g., drone, satellite, and street) and modalities (e.g., language and image). To address these challenges, we propose GeoBridge, a foundation model that performs bidirectional matching across views and supports language-to-image retrieval. Going beyond traditional satellite-centric formulations, GeoBridge builds on a novel semantic-anchor mechanism that bridges multi-view features through textual descriptions for robust, flexible localization. In support of this task, we construct GeoLoc, the first large-scale, cross-modal, and multi-view aligned dataset comprising over 50,000 pairs of drone, street-view panorama, and satellite images as well as their textual descriptions, collected from 36 countries, ensuring both geographic and semantic alignment. We performed broad evaluations across multiple tasks. Experiments confirm that GeoLoc pre-training markedly improves geo-location accuracy for GeoBridge while promoting cross-domain generalization and cross-modal knowledge transfer. The dataset, source code, and pretrained models were released at https://github.com/MiliLab/GeoBridge.
Similar Papers
SMGeo: Cross-View Object Geo-Localization with Grid-Level Mixture-of-Experts
CV and Pattern Recognition
Find objects in satellite photos from drone pictures.
GeoViS: Geospatially Rewarded Visual Search for Remote Sensing Visual Grounding
CV and Pattern Recognition
Finds tiny things in big satellite pictures.
Unsupervised Multimodal Graph-based Model for Geo-social Analysis
Social and Information Networks
Finds important news in social media posts.