AirNav: A Large-Scale Real-World UAV Vision-and-Language Navigation Dataset with Natural and Diverse Instructions
By: Hengxing Cai , Yijie Rao , Ligang Huang and more
Potential Business Impact:
Drones follow real-world directions to fly anywhere.
Existing Unmanned Aerial Vehicle (UAV) Vision-Language Navigation (VLN) datasets face issues such as dependence on virtual environments, lack of naturalness in instructions, and limited scale. To address these challenges, we propose AirNav, a large-scale UAV VLN benchmark constructed from real urban aerial data, rather than synthetic environments, with natural and diverse instructions. Additionally, we introduce the AirVLN-R1, which combines Supervised Fine-Tuning and Reinforcement Fine-Tuning to enhance performance and generalization. The feasibility of the model is preliminarily evaluated through real-world tests. Our dataset and code are publicly available.
Similar Papers
IndoorUAV: Benchmarking Vision-Language UAV Navigation in Continuous Indoor Environments
Robotics
Drones follow voice commands to fly indoors.
OpenVLN: Open-world aerial Vision-Language Navigation
Robotics
Drones fly themselves using words and pictures.
Aerial Vision-Language Navigation with a Unified Framework for Spatial, Temporal and Embodied Reasoning
CV and Pattern Recognition
Drones fly themselves using only cameras and words.