Free-Grained Hierarchical Recognition
By: Seulki Park, Zilin Wang, Stella X. Yu
Potential Business Impact:
Teaches computers to sort pictures better.
Hierarchical image classification predicts labels across a semantic taxonomy, but existing methods typically assume complete, fine-grained annotations, an assumption rarely met in practice. Real-world supervision varies in granularity, influenced by image quality, annotator expertise, and task demands; a distant bird may be labeled Bird, while a close-up reveals Bald eagle. We introduce ImageNet-F, a large-scale benchmark curated from ImageNet and structured into cognitively inspired basic, subordinate, and fine-grained levels. Using CLIP as a proxy for semantic ambiguity, we simulate realistic, mixed-granularity labels reflecting human annotation behavior. We propose free-grain learning, with heterogeneous supervision across instances. We develop methods that enhance semantic guidance via pseudo-attributes from vision-language models and visual guidance via semi-supervised learning. These, along with strong baselines, substantially improve performance under mixed supervision. Together, our benchmark and methods advance hierarchical classification under real-world constraints.
Similar Papers
EnGraf-Net: Multiple Granularity Branch Network with Fine-Coarse Graft Grained for Classification Task
CV and Pattern Recognition
Teaches computers to tell very similar things apart.
Hierarchical Semantic Alignment for Image Clustering
CV and Pattern Recognition
Groups pictures better using words and descriptions.
The Finer the Better: Towards Granular-aware Open-set Domain Generalization
CV and Pattern Recognition
Teaches AI to spot new things it hasn't seen.