Score: 0

Leveraging 2D-VLM for Label-Free 3D Segmentation in Large-Scale Outdoor Scene Understanding

Published: January 5, 2026 | arXiv ID: 2601.02029v1

By: Toshihiko Nishimura , Hirofumi Abe , Kazuhiko Murasaki and more

Potential Business Impact:

Lets computers understand 3D shapes from pictures.

Business Areas:

Image Recognition Data and Analytics, Software

This paper presents a novel 3D semantic segmentation method for large-scale point cloud data that does not require annotated 3D training data or paired RGB images. The proposed approach projects 3D point clouds onto 2D images using virtual cameras and performs semantic segmentation via a foundation 2D model guided by natural language prompts. 3D segmentation is achieved by aggregating predictions from multiple viewpoints through weighted voting. Our method outperforms existing training-free approaches and achieves segmentation accuracy comparable to supervised methods. Moreover, it supports open-vocabulary recognition, enabling users to detect objects using arbitrary text queries, thus overcoming the limitations of traditional supervised approaches.

VLM-3D:End-to-End Vision-Language Models for Open-World 3D Perception

CV and Pattern Recognition

Helps self-driving cars see new things safely.

12 Aug 2025 0

91%

3D Can Be Explored In 2D: Pseudo-Label Generation for LiDAR Point Clouds Using Sensor-Intensity-Based 2D Semantic Segmentation

CV and Pattern Recognition

Teaches self-driving cars to see without 3D maps.

6 May 2025 1

90%

OpenTrack3D: Towards Accurate and Generalizable Open-Vocabulary 3D Instance Segmentation

CV and Pattern Recognition

Lets robots understand and find any object.

3 Dec 2025 2

View PDF Login to Bookmark

Page Count

6 pages

Leveraging 2D-VLM for Label-Free 3D Segmentation in Large-Scale Outdoor Scene Understanding

Lets computers understand 3D shapes from pictures.

Technical Abstract

VLM-3D:End-to-End Vision-Language Models for Open-World 3D Perception

3D Can Be Explored In 2D: Pseudo-Label Generation for LiDAR Point Clouds Using Sensor-Intensity-Based 2D Semantic Segmentation

OpenTrack3D: Towards Accurate and Generalizable Open-Vocabulary 3D Instance Segmentation