Leveraging 2D-VLM for Label-Free 3D Segmentation in Large-Scale Outdoor Scene Understanding
By: Toshihiko Nishimura , Hirofumi Abe , Kazuhiko Murasaki and more
Potential Business Impact:
Lets computers understand 3D shapes from pictures.
This paper presents a novel 3D semantic segmentation method for large-scale point cloud data that does not require annotated 3D training data or paired RGB images. The proposed approach projects 3D point clouds onto 2D images using virtual cameras and performs semantic segmentation via a foundation 2D model guided by natural language prompts. 3D segmentation is achieved by aggregating predictions from multiple viewpoints through weighted voting. Our method outperforms existing training-free approaches and achieves segmentation accuracy comparable to supervised methods. Moreover, it supports open-vocabulary recognition, enabling users to detect objects using arbitrary text queries, thus overcoming the limitations of traditional supervised approaches.
Similar Papers
VLM-3D:End-to-End Vision-Language Models for Open-World 3D Perception
CV and Pattern Recognition
Helps self-driving cars see new things safely.
3D Can Be Explored In 2D: Pseudo-Label Generation for LiDAR Point Clouds Using Sensor-Intensity-Based 2D Semantic Segmentation
CV and Pattern Recognition
Teaches self-driving cars to see without 3D maps.
OpenTrack3D: Towards Accurate and Generalizable Open-Vocabulary 3D Instance Segmentation
CV and Pattern Recognition
Lets robots understand and find any object.