AutoTour: Automatic Photo Tour Guide with Smartphones and LLMs
By: Huatao Xu , Zihe Liu , Zilin Zeng and more
We present AutoTour, a system that enhances user exploration by automatically generating fine-grained landmark annotations and descriptive narratives for photos captured by users. The key idea of AutoTour is to fuse visual features extracted from photos with nearby geospatial features queried from open matching databases. Unlike existing tour applications that rely on pre-defined content or proprietary datasets, AutoTour leverages open and extensible data sources to provide scalable and context-aware photo-based guidance. To achieve this, we design a training-free pipeline that first extracts and filters relevant geospatial features around the user's GPS location. It then detects major landmarks in user photos through VLM-based feature detection and projects them into the horizontal spatial plane. A geometric matching algorithm aligns photo features with corresponding geospatial entities based on their estimated distance and direction. The matched features are subsequently grounded and annotated directly on the original photo, accompanied by large language model-generated textual and audio descriptions to provide an informative, tour-like experience. We demonstrate that AutoTour can deliver rich, interpretable annotations for both iconic and lesser-known landmarks, enabling a new form of interactive, context-aware exploration that bridges visual perception and geospatial understanding.
Similar Papers
AutoSpatial: Visual-Language Reasoning for Social Robot Navigation through Efficient Spatial Reasoning Learning
Robotics
Helps robots understand where things are and move.
Automated Label Placement on Maps via Large Language Models
Human-Computer Interaction
AI helps put map labels in the best spots.
StreetLens: Enabling Human-Centered AI Agents for Neighborhood Assessment from Street View Imagery
Human-Computer Interaction
Helps study neighborhoods faster with smart AI.