Floor Plan-Guided Visual Navigation Incorporating Depth and Directional Cues
By: Wei Huang , Jiaxin Li , Zang Wan and more
Potential Business Impact:
Helps robots find their way using pictures and maps.
Guiding an agent to a specific target in indoor environments based solely on RGB inputs and a floor plan is a promising yet challenging problem. Although existing methods have made significant progress, two challenges remain unresolved. First, the modality gap between egocentric RGB observations and the floor plan hinders the integration of visual and spatial information for both local obstacle avoidance and global planning. Second, accurate localization is critical for navigation performance, but remains challenging at deployment in unseen environments due to the lack of explicit geometric alignment between RGB inputs and floor plans. We propose a novel diffusion-based policy, denoted as GlocDiff, which integrates global path planning from the floor plan with local depth-aware features derived from RGB observations. The floor plan offers explicit global guidance, while the depth features provide implicit geometric cues, collectively enabling precise prediction of optimal navigation directions and robust obstacle avoidance. Moreover, GlocDiff introduces noise perturbation during training to enhance robustness against pose estimation errors, and we find that combining this with a relatively stable VO module during inference results in significantly improved navigation performance. Extensive experiments on the FloNa benchmark demonstrate GlocDiff's efficiency and effectiveness in achieving superior navigation performance, and the success of real-world deployments also highlights its potential for widespread practical applications.
Similar Papers
Graph-based Robot Localization Using a Graph Neural Network with a Floor Camera and a Feature Rich Industrial Floor
CV and Pattern Recognition
Helps robots find their way using floor patterns.
Vision-Based Localization and LLM-based Navigation for Indoor Environments
Machine Learning (CS)
Guides you indoors using phone camera and AI.
NaviDiffusor: Cost-Guided Diffusion Model for Visual Navigation
Robotics
Robots learn to walk anywhere without crashing.