Score: 1

Exploring Modality Guidance to Enhance VFM-based Feature Fusion for UDA in 3D Semantic Segmentation

Published: April 19, 2025 | arXiv ID: 2504.14231v1

By: Johannes Spoecklberger , Wei Lin , Pedro Hermosilla and more

Potential Business Impact:

Helps self-driving cars see in 3D better.

Business Areas:

Image Recognition Data and Analytics, Software

Vision Foundation Models (VFMs) have become a de facto choice for many downstream vision tasks, like image classification, image segmentation, and object localization. However, they can also provide significant utility for downstream 3D tasks that can leverage the cross-modal information (e.g., from paired image data). In our work, we further explore the utility of VFMs for adapting from a labeled source to unlabeled target data for the task of LiDAR-based 3D semantic segmentation. Our method consumes paired 2D-3D (image and point cloud) data and relies on the robust (cross-domain) features from a VFM to train a 3D backbone on a mix of labeled source and unlabeled target data. At the heart of our method lies a fusion network that is guided by both the image and point cloud streams, with their relative contributions adjusted based on the target domain. We extensively compare our proposed methodology with different state-of-the-art methods in several settings and achieve strong performance gains. For example, achieving an average improvement of 6.5 mIoU (over all tasks), when compared with the previous state-of-the-art.

VFM-UDA++: Improving Network Architectures and Data Strategies for Unsupervised Domain Adaptive Semantic Segmentation

CV and Pattern Recognition

Helps computers learn from pictures better with less data.

11 Mar 2025 2

90%

Improving Multimodal Distillation for 3D Semantic Segmentation under Domain Shift

CV and Pattern Recognition

Helps self-driving cars see better in different weather.

21 Nov 2025 2

89%

Learning A Zero-shot Occupancy Network from Vision Foundation Models via Self-supervised Adaptation

CV and Pattern Recognition

Lets computers build 3D worlds from flat pictures.

10 Mar 2025 1

View PDF Login to Bookmark

Country of Origin

🇦🇹 Austria

Page Count

10 pages

Exploring Modality Guidance to Enhance VFM-based Feature Fusion for UDA in 3D Semantic Segmentation

Helps self-driving cars see in 3D better.

Technical Abstract

VFM-UDA++: Improving Network Architectures and Data Strategies for Unsupervised Domain Adaptive Semantic Segmentation

Improving Multimodal Distillation for 3D Semantic Segmentation under Domain Shift

Learning A Zero-shot Occupancy Network from Vision Foundation Models via Self-supervised Adaptation