Score: 2

Heterogeneous Uncertainty-Guided Composed Image Retrieval with Fine-Grained Probabilistic Learning

Published: January 16, 2026 | arXiv ID: 2601.11393v1

By: Haomiao Tang , Jinpeng Wang , Minyi Zhao and more

Potential Business Impact:

Finds images better by understanding what you want.

Business Areas:

Image Recognition Data and Analytics, Software

Composed Image Retrieval (CIR) enables image search by combining a reference image with modification text. Intrinsic noise in CIR triplets incurs intrinsic uncertainty and threatens the model's robustness. Probabilistic learning approaches have shown promise in addressing such issues; however, they fall short for CIR due to their instance-level holistic modeling and homogeneous treatment of queries and targets. This paper introduces a Heterogeneous Uncertainty-Guided (HUG) paradigm to overcome these limitations. HUG utilizes a fine-grained probabilistic learning framework, where queries and targets are represented by Gaussian embeddings that capture detailed concepts and uncertainties. We customize heterogeneous uncertainty estimations for multi-modal queries and uni-modal targets. Given a query, we capture uncertainties not only regarding uni-modal content quality but also multi-modal coordination, followed by a provable dynamic weighting mechanism to derive comprehensive query uncertainty. We further design uncertainty-guided objectives, including query-target holistic contrast and fine-grained contrasts with comprehensive negative sampling strategies, which effectively enhance discriminative learning. Experiments on benchmarks demonstrate HUG's effectiveness beyond state-of-the-art baselines, with faithful analysis justifying the technical contributions.

HUD: Hierarchical Uncertainty-Aware Disambiguation Network for Composed Video Retrieval

CV and Pattern Recognition

Finds videos by combining video and text clues.

2 Dec 2025 0

90%

HUD: Hierarchical Uncertainty-Aware Disambiguation Network for Composed Video Retrieval

CV and Pattern Recognition

Finds videos by matching descriptions and examples.

2 Dec 2025 0

89%

CroBIM-U: Uncertainty-Driven Referring Remote Sensing Image Segmentation

CV and Pattern Recognition

Find objects in pictures using words.

7 Jan 2026 0

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Repos / Data Links

github.com

Page Count

10 pages

Heterogeneous Uncertainty-Guided Composed Image Retrieval with Fine-Grained Probabilistic Learning

Finds images better by understanding what you want.

Technical Abstract

HUD: Hierarchical Uncertainty-Aware Disambiguation Network for Composed Video Retrieval

HUD: Hierarchical Uncertainty-Aware Disambiguation Network for Composed Video Retrieval

CroBIM-U: Uncertainty-Driven Referring Remote Sensing Image Segmentation