VLM-NCD:Novel Class Discovery with Vision-Based Large Language Models
By: Yuetong Su , Baoguo Wei , Xinyu Wang and more
Potential Business Impact:
Finds new things in pictures using words.
Novel Class Discovery aims to utilise prior knowledge of known classes to classify and discover unknown classes from unlabelled data. Existing NCD methods for images primarily rely on visual features, which suffer from limitations such as insufficient feature discriminability and the long-tail distribution of data. We propose LLM-NCD, a multimodal framework that breaks this bottleneck by fusing visual-textual semantics and prototype guided clustering. Our key innovation lies in modelling cluster centres and semantic prototypes of known classes by jointly optimising known class image and text features, and a dualphase discovery mechanism that dynamically separates known or novel samples via semantic affinity thresholds and adaptive clustering. Experiments on the CIFAR-100 dataset show that compared to the current methods, this method achieves up to 25.3% improvement in accuracy for unknown classes. Notably, our method shows unique resilience to long tail distributions, a first in NCD literature.
Similar Papers
Representation Calibration and Uncertainty Guidance for Class-Incremental Learning based on Vision Language Model
CV and Pattern Recognition
Teaches computers to remember old and new pictures.
Novel Class Discovery for Point Cloud Segmentation via Joint Learning of Causal Representation and Reasoning
CV and Pattern Recognition
Teaches computers to identify new things in 3D scans.
NeurNCD: Novel Class Discovery via Implicit Neural Representation
Machine Learning (CS)
Helps computers find new things in pictures.