Score: 0

MOSU: Autonomous Long-range Robot Navigation with Multi-modal Scene Understanding

Published: July 7, 2025 | arXiv ID: 2507.04686v1

By: Jing Liang , Kasun Weerakoon , Daeun Song and more

Potential Business Impact:

Helps robots drive safely on roads.

Business Areas:

Autonomous Vehicles Transportation

We present MOSU, a novel autonomous long-range navigation system that enhances global navigation for mobile robots through multimodal perception and on-road scene understanding. MOSU addresses the outdoor robot navigation challenge by integrating geometric, semantic, and contextual information to ensure comprehensive scene understanding. The system combines GPS and QGIS map-based routing for high-level global path planning and multi-modal trajectory generation for local navigation refinement. For trajectory generation, MOSU leverages multi-modalities: LiDAR-based geometric data for precise obstacle avoidance, image-based semantic segmentation for traversability assessment, and Vision-Language Models (VLMs) to capture social context and enable the robot to adhere to social norms in complex environments. This multi-modal integration improves scene understanding and enhances traversability, allowing the robot to adapt to diverse outdoor conditions. We evaluate our system in real-world on-road environments and benchmark it on the GND dataset, achieving a 10% improvement in traversability on navigable terrains while maintaining a comparable navigation distance to existing global navigation methods.

MUSON: A Reasoning-oriented Multimodal Dataset for Socially Compliant Navigation in Urban Environments

CV and Pattern Recognition

Helps robots safely walk through crowds.

28 Dec 2025 1

87%

Enhancing Situational Awareness in Underwater Robotics with Multi-modal Spatial Perception

Robotics

Helps robots see and map underwater clearly.

6 Jun 2025 1

87%

MSGNav: Unleashing the Power of Multi-modal 3D Scene Graph for Zero-Shot Embodied Navigation

CV and Pattern Recognition

Robots learn to explore new places without practice.

13 Nov 2025 1

View PDF Login to Bookmark

Page Count

10 pages

MOSU: Autonomous Long-range Robot Navigation with Multi-modal Scene Understanding

Helps robots drive safely on roads.

Technical Abstract

MUSON: A Reasoning-oriented Multimodal Dataset for Socially Compliant Navigation in Urban Environments

Enhancing Situational Awareness in Underwater Robotics with Multi-modal Spatial Perception

MSGNav: Unleashing the Power of Multi-modal 3D Scene Graph for Zero-Shot Embodied Navigation