Score: 1

FOLK: Fast Open-Vocabulary 3D Instance Segmentation via Label-guided Knowledge Distillation

Published: October 9, 2025 | arXiv ID: 2510.08849v1

By: Hongrui Wu , Zhicheng Gao , Jin Cao and more

Potential Business Impact:

Lets computers understand 3D objects by name.

Business Areas:

Image Recognition Data and Analytics, Software

Open-vocabulary 3D instance segmentation seeks to segment and classify instances beyond the annotated label space. Existing methods typically map 3D instances to 2D RGB-D images, and then employ vision-language models (VLMs) for classification. However, such a mapping strategy usually introduces noise from 2D occlusions and incurs substantial computational and memory costs during inference, slowing down the inference speed. To address the above problems, we propose a Fast Open-vocabulary 3D instance segmentation method via Label-guided Knowledge distillation (FOLK). Our core idea is to design a teacher model that extracts high-quality instance embeddings and distills its open-vocabulary knowledge into a 3D student model. In this way, during inference, the distilled 3D model can directly classify instances from the 3D point cloud, avoiding noise caused by occlusions and significantly accelerating the inference process. Specifically, we first design a teacher model to generate a 2D CLIP embedding for each 3D instance, incorporating both visibility and viewpoint diversity, which serves as the learning target for distillation. We then develop a 3D student model that directly produces a 3D embedding for each 3D instance. During training, we propose a label-guided distillation algorithm to distill open-vocabulary knowledge from label-consistent 2D embeddings into the student model. FOLK conducted experiments on the ScanNet200 and Replica datasets, achieving state-of-the-art performance on the ScanNet200 dataset with an AP50 score of 35.7, while running approximately 6.0x to 152.2x faster than previous methods. All codes will be released after the paper is accepted.

OpenTrack3D: Towards Accurate and Generalizable Open-Vocabulary 3D Instance Segmentation

CV and Pattern Recognition

Lets robots understand and find any object.

3 Dec 2025 2

88%

COS3D: Collaborative Open-Vocabulary 3D Segmentation

CV and Pattern Recognition

Helps robots understand and grab any object.

23 Oct 2025 1

88%

Domain Adaptation-Based Crossmodal Knowledge Distillation for 3D Semantic Segmentation

CV and Pattern Recognition

Teaches self-driving cars without needing 3D maps.

30 Aug 2025 2

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

12 pages

FOLK: Fast Open-Vocabulary 3D Instance Segmentation via Label-guided Knowledge Distillation

Lets computers understand 3D objects by name.

Technical Abstract

OpenTrack3D: Towards Accurate and Generalizable Open-Vocabulary 3D Instance Segmentation

COS3D: Collaborative Open-Vocabulary 3D Segmentation

Domain Adaptation-Based Crossmodal Knowledge Distillation for 3D Semantic Segmentation