Score: 2

Novel Category Discovery with X-Agent Attention for Open-Vocabulary Semantic Segmentation

Published: September 1, 2025 | arXiv ID: 2509.01275v2

By: Jiahao Li , Yang Lu , Yachao Zhang and more

Potential Business Impact:

Teaches computers to see anything described in words.

Business Areas:

Semantic Search Internet Services

Open-vocabulary semantic segmentation (OVSS) conducts pixel-level classification via text-driven alignment, where the domain discrepancy between base category training and open-vocabulary inference poses challenges in discriminative modeling of latent unseen category. To address this challenge, existing vision-language model (VLM)-based approaches demonstrate commendable performance through pre-trained multi-modal representations. However, the fundamental mechanisms of latent semantic comprehension remain underexplored, making the bottleneck for OVSS. In this work, we initiate a probing experiment to explore distribution patterns and dynamics of latent semantics in VLMs under inductive learning paradigms. Building on these insights, we propose X-Agent, an innovative OVSS framework employing latent semantic-aware ``agent'' to orchestrate cross-modal attention mechanisms, simultaneously optimizing latent semantic dynamic and amplifying its perceptibility. Extensive benchmark evaluations demonstrate that X-Agent achieves state-of-the-art performance while effectively enhancing the latent semantic saliency.

Stepping Out of Similar Semantic Space for Open-Vocabulary Segmentation

CV and Pattern Recognition

Teaches computers to find any object in pictures.

19 Jun 2025 1

88%

Leveraging Depth and Language for Open-Vocabulary Domain-Generalized Semantic Segmentation

CV and Pattern Recognition

Helps computers see and name anything, anywhere.

11 Jun 2025 2

88%

OVOD-Agent: A Markov-Bandit Framework for Proactive Visual Reasoning and Self-Evolving Detection

Artificial Intelligence

Teaches computers to find any object, even new ones.

26 Nov 2025 0

View PDF Login to Bookmark

Repos / Data Links

github.com

Page Count

10 pages

Novel Category Discovery with X-Agent Attention for Open-Vocabulary Semantic Segmentation

Teaches computers to see anything described in words.

Technical Abstract

Stepping Out of Similar Semantic Space for Open-Vocabulary Segmentation

Leveraging Depth and Language for Open-Vocabulary Domain-Generalized Semantic Segmentation

OVOD-Agent: A Markov-Bandit Framework for Proactive Visual Reasoning and Self-Evolving Detection