Score: 0

Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model

Published: March 20, 2025 | arXiv ID: 2503.16282v2

By: Zhaochong An , Guolei Sun , Yun Liu and more

Potential Business Impact:

Teaches computers to understand 3D shapes with few examples.

Business Areas:

Image Recognition Data and Analytics, Software

Generalized few-shot 3D point cloud segmentation (GFS-PCS) adapts models to new classes with few support samples while retaining base class segmentation. Existing GFS-PCS methods enhance prototypes via interacting with support or query features but remain limited by sparse knowledge from few-shot samples. Meanwhile, 3D vision-language models (3D VLMs), generalizing across open-world novel classes, contain rich but noisy novel class knowledge. In this work, we introduce a GFS-PCS framework that synergizes dense but noisy pseudo-labels from 3D VLMs with precise yet sparse few-shot samples to maximize the strengths of both, named GFS-VL. Specifically, we present a prototype-guided pseudo-label selection to filter low-quality regions, followed by an adaptive infilling strategy that combines knowledge from pseudo-label contexts and few-shot samples to adaptively label the filtered, unlabeled areas. Additionally, we design a novel-base mix strategy to embed few-shot samples into training scenes, preserving essential context for improved novel class learning. Moreover, recognizing the limited diversity in current GFS-PCS benchmarks, we introduce two challenging benchmarks with diverse novel classes for comprehensive generalization evaluation. Experiments validate the effectiveness of our framework across models and datasets. Our approach and benchmarks provide a solid foundation for advancing GFS-PCS in the real world. The code is at https://github.com/ZhaochongAn/GFS-VL

From Dataset to Real-world: General 3D Object Detection via Generalized Cross-domain Few-shot Learning

CV and Pattern Recognition

Helps self-driving cars recognize new objects.

8 Mar 2025 0

88%

DSV-LFS: Unifying LLM-Driven Semantic Cues with Visual Features for Robust Few-Shot Segmentation

CV and Pattern Recognition

Teaches computers to recognize new things with few examples.

6 Mar 2025 1

88%

GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models

CV and Pattern Recognition

Helps computers understand 3D spaces from pictures.

2 Jan 2025 1

View PDF Login to Bookmark

Page Count

16 pages

Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model

Teaches computers to understand 3D shapes with few examples.

Technical Abstract

From Dataset to Real-world: General 3D Object Detection via Generalized Cross-domain Few-shot Learning

DSV-LFS: Unifying LLM-Driven Semantic Cues with Visual Features for Robust Few-Shot Segmentation

GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models