Vision-Based Localization and LLM-based Navigation for Indoor Environments
By: Keyan Rahimi , Md. Wasiul Haque , Sagar Dasgupta and more
Potential Business Impact:
Guides you indoors using phone camera and AI.
Indoor navigation remains a complex challenge due to the absence of reliable GPS signals and the architectural intricacies of large enclosed environments. This study presents an indoor localization and navigation approach that integrates vision-based localization with large language model (LLM)-based navigation. The localization system utilizes a ResNet-50 convolutional neural network fine-tuned through a two-stage process to identify the user's position using smartphone camera input. To complement localization, the navigation module employs an LLM, guided by a carefully crafted system prompt, to interpret preprocessed floor plan images and generate step-by-step directions. Experimental evaluation was conducted in a realistic office corridor with repetitive features and limited visibility to test localization robustness. The model achieved high confidence and an accuracy of 96% across all tested waypoints, even under constrained viewing conditions and short-duration queries. Navigation tests using ChatGPT on real building floor maps yielded an average instruction accuracy of 75%, with observed limitations in zero-shot reasoning and inference time. This research demonstrates the potential for scalable, infrastructure-free indoor navigation using off-the-shelf cameras and publicly available floor plans, particularly in resource-constrained settings like hospitals, airports, and educational institutions.
Similar Papers
LLM-Guided Indoor Navigation with Multimodal Map Understanding
Artificial Intelligence
Lets phones give directions inside buildings.
Research on Navigation Methods Based on LLMs
Robotics
Lets robots find their way around buildings.
Fine-Tuning Vision-Language Models for Visual Navigation Assistance
CV and Pattern Recognition
Helps blind people navigate indoors with voice.