Score: 2

OmniUnet: A Multimodal Network for Unstructured Terrain Segmentation on Planetary Rovers Using RGB, Depth, and Thermal Imagery

Published: August 1, 2025 | arXiv ID: 2508.00580v1

By: Raul Castilla-Arquillo , Carlos Perez-del-Pulgar , Levin Gerdes and more

Potential Business Impact:

Helps robots drive safely on Mars.

Robot navigation in unstructured environments requires multimodal perception systems that can support safe navigation. Multimodality enables the integration of complementary information collected by different sensors. However, this information must be processed by machine learning algorithms specifically designed to leverage heterogeneous data. Furthermore, it is necessary to identify which sensor modalities are most informative for navigation in the target environment. In Martian exploration, thermal imagery has proven valuable for assessing terrain safety due to differences in thermal behaviour between soil types. This work presents OmniUnet, a transformer-based neural network architecture for semantic segmentation using RGB, depth, and thermal (RGB-D-T) imagery. A custom multimodal sensor housing was developed using 3D printing and mounted on the Martian Rover Testbed for Autonomy (MaRTA) to collect a multimodal dataset in the Bardenas semi-desert in northern Spain. This location serves as a representative environment of the Martian surface, featuring terrain types such as sand, bedrock, and compact soil. A subset of this dataset was manually labeled to support supervised training of the network. The model was evaluated both quantitatively and qualitatively, achieving a pixel accuracy of 80.37% and demonstrating strong performance in segmenting complex unstructured terrain. Inference tests yielded an average prediction time of 673 ms on a resource-constrained computer (Jetson Orin Nano), confirming its suitability for on-robot deployment. The software implementation of the network and the labeled dataset have been made publicly available to support future research in multimodal terrain perception for planetary robotics.

OmniVLA: Physically-Grounded Multimodal VLA with Unified Multi-Sensor Perception for Robotic Manipulation

CV and Pattern Recognition

Robots see and hear better to do more tasks.

3 Nov 2025 1

88%

Deep Learning-Based Multi-Modal Fusion for Robust Robot Perception and Navigation

Machine Learning (CS)

Helps robots see and move better in tricky places.

26 Apr 2025 1

88%

OmniSegmentor: A Flexible Multi-Modal Learning Framework for Semantic Segmentation

CV and Pattern Recognition

Teaches computers to see and understand many things.

18 Sep 2025 2

View PDF Login to Bookmark

Country of Origin

🇪🇸 🇱🇺 Spain, Luxembourg

Repos / Data Links

github.com

Page Count

7 pages

OmniUnet: A Multimodal Network for Unstructured Terrain Segmentation on Planetary Rovers Using RGB, Depth, and Thermal Imagery

Helps robots drive safely on Mars.

Technical Abstract

OmniVLA: Physically-Grounded Multimodal VLA with Unified Multi-Sensor Perception for Robotic Manipulation

Deep Learning-Based Multi-Modal Fusion for Robust Robot Perception and Navigation

OmniSegmentor: A Flexible Multi-Modal Learning Framework for Semantic Segmentation