Score: 1

From Street to Orbit: Training-Free Cross-View Retrieval via Location Semantics and LLM Guidance

Published: November 12, 2025 | arXiv ID: 2511.09820v1

By: Jeongho Min, Dongyoung Kim, Jaehyup Lee

Potential Business Impact:

Finds your location on a map from a photo.

Business Areas:

Visual Search Internet Services

Cross-view image retrieval, particularly street-to-satellite matching, is a critical task for applications such as autonomous navigation, urban planning, and localization in GPS-denied environments. However, existing approaches often require supervised training on curated datasets and rely on panoramic or UAV-based images, which limits real-world deployment. In this paper, we present a simple yet effective cross-view image retrieval framework that leverages a pretrained vision encoder and a large language model (LLM), requiring no additional training. Given a monocular street-view image, our method extracts geographic cues through web-based image search and LLM-based location inference, generates a satellite query via geocoding API, and retrieves matching tiles using a pretrained vision encoder (e.g., DINOv2) with PCA-based whitening feature refinement. Despite using no ground-truth supervision or finetuning, our proposed method outperforms prior learning-based approaches on the benchmark dataset under zero-shot settings. Moreover, our pipeline enables automatic construction of semantically aligned street-to-satellite datasets, which is offering a scalable and cost-efficient alternative to manual annotation. All source codes will be made publicly available at https://jeonghomin.github.io/street2orbit.github.io/.

SkyLink: Unifying Street-Satellite Geo-Localization via UAV-Mediated 3D Scene Alignment

CV and Pattern Recognition

Find places from different pictures.

29 Sep 2025 1

90%

AddressVLM: Cross-view Alignment Tuning for Image Address Localization using Large Vision-Language Models

CV and Pattern Recognition

Helps phones find exact street addresses from pictures.

14 Aug 2025 0

90%

Street-Level Geolocalization Using Multimodal Large Language Models and Retrieval-Augmented Generation

CV and Pattern Recognition

Find exact locations from photos.

1 Sep 2025 0

View PDF Login to Bookmark

Country of Origin

🇰🇷 Korea, Republic of

Page Count

10 pages

From Street to Orbit: Training-Free Cross-View Retrieval via Location Semantics and LLM Guidance

Finds your location on a map from a photo.

Technical Abstract

SkyLink: Unifying Street-Satellite Geo-Localization via UAV-Mediated 3D Scene Alignment

AddressVLM: Cross-view Alignment Tuning for Image Address Localization using Large Vision-Language Models

Street-Level Geolocalization Using Multimodal Large Language Models and Retrieval-Augmented Generation