Score: 0

H3Former: Hypergraph-based Semantic-Aware Aggregation via Hyperbolic Hierarchical Contrastive Loss for Fine-Grained Visual Classification

Published: November 13, 2025 | arXiv ID: 2511.10260v1

By: Yongji Zhang , Siqi Li , Kuiyang Huang and more

Potential Business Impact:

Helps computers tell apart very similar things.

Business Areas:

Image Recognition Data and Analytics, Software

Fine-Grained Visual Classification (FGVC) remains a challenging task due to subtle inter-class differences and large intra-class variations. Existing approaches typically rely on feature-selection mechanisms or region-proposal strategies to localize discriminative regions for semantic analysis. However, these methods often fail to capture discriminative cues comprehensively while introducing substantial category-agnostic redundancy. To address these limitations, we propose H3Former, a novel token-to-region framework that leverages high-order semantic relations to aggregate local fine-grained representations with structured region-level modeling. Specifically, we propose the Semantic-Aware Aggregation Module (SAAM), which exploits multi-scale contextual cues to dynamically construct a weighted hypergraph among tokens. By applying hypergraph convolution, SAAM captures high-order semantic dependencies and progressively aggregates token features into compact region-level representations. Furthermore, we introduce the Hyperbolic Hierarchical Contrastive Loss (HHCL), which enforces hierarchical semantic constraints in a non-Euclidean embedding space. The HHCL enhances inter-class separability and intra-class consistency while preserving the intrinsic hierarchical relationships among fine-grained categories. Comprehensive experiments conducted on four standard FGVC benchmarks validate the superiority of our H3Former framework.

HGFormer: Topology-Aware Vision Transformer with HyperGraph Learning

CV and Pattern Recognition

Helps computers see scenes like humans do.

3 Apr 2025 1

88%

A Semantics-Aware Hierarchical Self-Supervised Approach to Classification of Remote Sensing Images

CV and Pattern Recognition

Teaches computers to sort satellite pictures better.

6 Oct 2025 0

88%

Cross-Hierarchical Bidirectional Consistency Learning for Fine-Grained Visual Classification

CV and Pattern Recognition

Teaches computers to tell very similar things apart.

18 Apr 2025 0

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

12 pages

H3Former: Hypergraph-based Semantic-Aware Aggregation via Hyperbolic Hierarchical Contrastive Loss for Fine-Grained Visual Classification

Helps computers tell apart very similar things.

Technical Abstract

HGFormer: Topology-Aware Vision Transformer with HyperGraph Learning

A Semantics-Aware Hierarchical Self-Supervised Approach to Classification of Remote Sensing Images

Cross-Hierarchical Bidirectional Consistency Learning for Fine-Grained Visual Classification