Hierarchical Semantic Alignment for Image Clustering
By: Xingyu Zhu , Beier Zhu , Yunfan Li and more
Potential Business Impact:
Groups pictures better using words and descriptions.
Image clustering is a classic problem in computer vision, which categorizes images into different groups. Recent studies utilize nouns as external semantic knowledge to improve clus- tering performance. However, these methods often overlook the inherent ambiguity of nouns, which can distort semantic representations and degrade clustering quality. To address this issue, we propose a hierarChical semAntic alignmEnt method for image clustering, dubbed CAE, which improves cluster- ing performance in a training-free manner. In our approach, we incorporate two complementary types of textual seman- tics: caption-level descriptions, which convey fine-grained attributes of image content, and noun-level concepts, which represent high-level object categories. We first select relevant nouns from WordNet and descriptions from caption datasets to construct a semantic space aligned with image features. Then, we align image features with selected nouns and captions via optimal transport to obtain a more discriminative semantic space. Finally, we combine the enhanced semantic and image features to perform clustering. Extensive experiments across 8 datasets demonstrate the effectiveness of our method, notably surpassing the state-of-the-art training-free approach with a 4.2% improvement in accuracy and a 2.9% improvement in adjusted rand index (ARI) on the ImageNet-1K dataset.
Similar Papers
Unsupervised Image Classification with Adaptive Nearest Neighbor Selection and Cluster Ensembles
CV and Pattern Recognition
Groups pictures automatically, making computers smarter.
Free-Grained Hierarchical Recognition
CV and Pattern Recognition
Teaches computers to sort pictures better.
Hierarchical Semantic Tree Anchoring for CLIP-Based Class-Incremental Learning
CV and Pattern Recognition
Teaches computers to remember old and new things.