Score: 0

VLG-Loc: Vision-Language Global Localization from Labeled Footprint Maps

Published: December 14, 2025 | arXiv ID: 2512.12793v1

By: Mizuho Aoki , Kohei Honda , Yasuhiro Yoshimura and more

This paper presents Vision-Language Global Localization (VLG-Loc), a novel global localization method that uses human-readable labeled footprint maps containing only names and areas of distinctive visual landmarks in an environment. While humans naturally localize themselves using such maps, translating this capability to robotic systems remains highly challenging due to the difficulty of establishing correspondences between observed landmarks and those in the map without geometric and appearance details. To address this challenge, VLG-Loc leverages a vision-language model (VLM) to search the robot's multi-directional image observations for the landmarks noted in the map. The method then identifies robot poses within a Monte Carlo localization framework, where the found landmarks are used to evaluate the likelihood of each pose hypothesis. Experimental validation in simulated and real-world retail environments demonstrates superior robustness compared to existing scan-based methods, particularly under environmental changes. Further improvements are achieved through the probabilistic fusion of visual and scan-based localization.

Recognition through Reasoning: Reinforcing Image Geo-localization with Large Vision-Language Models

CV and Pattern Recognition

Helps computers find places from any picture.

17 Jun 2025 1

89%

Assessing the Geolocation Capabilities, Limitations and Societal Risks of Generative Vision-Language Models

CV and Pattern Recognition

AI can guess where photos are taken.

27 Aug 2025 1

88%

GeoVLA: Empowering 3D Representations in Vision-Language-Action Models

Robotics

Robots understand 3D space to do tasks better.

12 Aug 2025 1

View PDF Login to Bookmark

VLG-Loc: Vision-Language Global Localization from Labeled Footprint Maps

Technical Abstract

Recognition through Reasoning: Reinforcing Image Geo-localization with Large Vision-Language Models

Assessing the Geolocation Capabilities, Limitations and Societal Risks of Generative Vision-Language Models

GeoVLA: Empowering 3D Representations in Vision-Language-Action Models