Score: 1

Hierarchical Language Models for Semantic Navigation and Manipulation in an Aerial-Ground Robotic System

Published: June 5, 2025 | arXiv ID: 2506.05020v2

By: Haokun Liu , Zhaoqi Ma , Yunong Li and more

Potential Business Impact:

Robots work together better using AI to move things.

Business Areas:

Drone Management Hardware, Software

Heterogeneous multi-robot systems show great potential in complex tasks requiring hybrid cooperation. However, traditional approaches relying on static models often struggle with task diversity and dynamic environments. This highlights the need for generalizable intelligence that can bridge high-level reasoning with low-level execution across heterogeneous agents. To address this, we propose a hierarchical framework integrating a prompted Large Language Model (LLM) and a GridMask-enhanced fine-tuned Vision Language Model (VLM). The LLM decomposes tasks and constructs a global semantic map, while the VLM extracts task-specified semantic labels and 2D spatial information from aerial images to support local planning. Within this framework, the aerial robot follows an optimized global semantic path and continuously provides bird-view images, guiding the ground robot's local semantic navigation and manipulation, including target-absent scenarios where implicit alignment is maintained. Experiments on real-world cube or object arrangement tasks demonstrate the framework's adaptability and robustness in dynamic environments. To the best of our knowledge, this is the first demonstration of an aerial-ground heterogeneous system integrating VLM-based perception with LLM-driven task reasoning and motion planning.

AERMANI-VLM: Structured Prompting and Reasoning for Aerial Manipulation with Vision Language Models

Robotics

Lets drones do tasks safely using words.

3 Nov 2025 0

92%

Air-Ground Collaboration for Language-Specified Missions in Unknown Environments

Robotics

Robots understand spoken commands to work together.

14 May 2025 0

92%

General-Purpose Aerial Intelligent Agents Empowered by Large Language Models

Robotics

Drones can now figure out new jobs on their own.

11 Mar 2025 0

View PDF Login to Bookmark

Country of Origin

🇯🇵 Japan

Repos / Data Links

github.com

Page Count

15 pages

Hierarchical Language Models for Semantic Navigation and Manipulation in an Aerial-Ground Robotic System

Robots work together better using AI to move things.

Technical Abstract

AERMANI-VLM: Structured Prompting and Reasoning for Aerial Manipulation with Vision Language Models

Air-Ground Collaboration for Language-Specified Missions in Unknown Environments

General-Purpose Aerial Intelligent Agents Empowered by Large Language Models