Score: 1

Video-based Generalized Category Discovery via Memory-Guided Consistency-Aware Contrastive Learning

Published: September 8, 2025 | arXiv ID: 2509.06306v1

By: Zhang Jing , Pu Nan , Xie Yu Xiang and more

Potential Business Impact:

Helps computers find new things in videos.

Business Areas:

Image Recognition Data and Analytics, Software

Generalized Category Discovery (GCD) is an emerging and challenging open-world problem that has garnered increasing attention in recent years. Most existing GCD methods focus on discovering categories in static images. However, relying solely on static visual content is often insufficient to reliably discover novel categories. To bridge this gap, we extend the GCD problem to the video domain and introduce a new setting, termed Video-GCD. Thus, effectively integrating multi-perspective information across time is crucial for accurate Video-GCD. To tackle this challenge, we propose a novel Memory-guided Consistency-aware Contrastive Learning (MCCL) framework, which explicitly captures temporal-spatial cues and incorporates them into contrastive learning through a consistency-guided voting mechanism. MCCL consists of two core components: Consistency-Aware Contrastive Learning(CACL) and Memory-Guided Representation Enhancement (MGRE). CACL exploits multiperspective temporal features to estimate consistency scores between unlabeled instances, which are then used to weight the contrastive loss accordingly. MGRE introduces a dual-level memory buffer that maintains both feature-level and logit-level representations, providing global context to enhance intra-class compactness and inter-class separability. This in turn refines the consistency estimation in CACL, forming a mutually reinforcing feedback loop between representation learning and consistency modeling. To facilitate a comprehensive evaluation, we construct a new and challenging Video-GCD benchmark, which includes action recognition and bird classification video datasets. Extensive experiments demonstrate that our method significantly outperforms competitive GCD approaches adapted from image-based settings, highlighting the importance of temporal information for discovering novel categories in videos. The code will be publicly available.

Dissecting Generalized Category Discovery: Multiplex Consensus under Self-Deconstruction

CV and Pattern Recognition

Helps computers learn new things like humans do.

14 Aug 2025 0

91%

ClearGCD: Mitigating Shortcut Learning For Robust Generalized Category Discovery

CV and Pattern Recognition

Helps computers learn new things without forgetting old ones.

28 Nov 2025 1

90%

Learning Part Knowledge to Facilitate Category Understanding for Fine-Grained Generalized Category Discovery

CV and Pattern Recognition

Helps computers tell apart very similar things.

21 Mar 2025 1

View PDF Login to Bookmark

Country of Origin

🇨🇳 🇮🇹 China, Italy

Page Count

14 pages

Video-based Generalized Category Discovery via Memory-Guided Consistency-Aware Contrastive Learning

Helps computers find new things in videos.

Technical Abstract

Dissecting Generalized Category Discovery: Multiplex Consensus under Self-Deconstruction

ClearGCD: Mitigating Shortcut Learning For Robust Generalized Category Discovery

Learning Part Knowledge to Facilitate Category Understanding for Fine-Grained Generalized Category Discovery