RoboTron-Nav: A Unified Framework for Embodied Navigation Integrating Perception, Planning, and Prediction
By: Yufeng Zhong , Chengjian Feng , Feng Yan and more
Potential Business Impact:
Helps robots find things using words and memory.
In language-guided visual navigation, agents locate target objects in unseen environments using natural language instructions. For reliable navigation in unfamiliar scenes, agents should possess strong perception, planning, and prediction capabilities. Additionally, when agents revisit previously explored areas during long-term navigation, they may retain irrelevant and redundant historical perceptions, leading to suboptimal results. In this work, we propose RoboTron-Nav, a unified framework that integrates perception, planning, and prediction capabilities through multitask collaborations on navigation and embodied question answering tasks, thereby enhancing navigation performances. Furthermore, RoboTron-Nav employs an adaptive 3D-aware history sampling strategy to effectively and efficiently utilize historical observations. By leveraging large language model, RoboTron-Nav comprehends diverse commands and complex visual scenes, resulting in appropriate navigation actions. RoboTron-Nav achieves an 81.1% success rate in object goal navigation on the $\mathrm{CHORES}$-$\mathbb{S}$ benchmark, setting a new state-of-the-art performance. Project page: https://yvfengzhong.github.io/RoboTron-Nav
Similar Papers
Endowing Embodied Agents with Spatial Reasoning Capabilities for Vision-and-Language Navigation
Artificial Intelligence
Helps robots see and move without getting lost.
UrbanNav: Learning Language-Guided Urban Navigation from Web-Scale Human Trajectories
Robotics
Helps robots follow spoken directions in cities.
DreamNav: A Trajectory-Based Imaginative Framework for Zero-Shot Vision-and-Language Navigation
Robotics
Robot learns to follow directions by imagining paths.