Split-Fuse-Transport: Annotation-Free Saliency via Dual Clustering and Optimal Transport Alignment
By: Muhammad Umer Ramzan , Ali Zia , Abdelwahed Khamis and more
Potential Business Impact:
Finds important parts of pictures without labels.
Salient object detection (SOD) aims to segment visually prominent regions in images and serves as a foundational task for various computer vision applications. We posit that SOD can now reach near-supervised accuracy without a single pixel-level label, but only when reliable pseudo-masks are available. We revisit the prototype-based line of work and make two key observations. First, boundary pixels and interior pixels obey markedly different geometry; second, the global consistency enforced by optimal transport (OT) is underutilized if prototype quality is weak. To address this, we introduce POTNet, an adaptation of Prototypical Optimal Transport that replaces POT's single k-means step with an entropy-guided dual-clustering head: high-entropy pixels are organized by spectral clustering, low-entropy pixels by k-means, and the two prototype sets are subsequently aligned by OT. This split-fuse-transport design yields sharper, part-aware pseudo-masks in a single forward pass, without handcrafted priors. Those masks supervise a standard MaskFormer-style encoder-decoder, giving rise to AutoSOD, an end-to-end unsupervised SOD pipeline that eliminates SelfMask's offline voting yet improves both accuracy and training efficiency. Extensive experiments on five benchmarks show that AutoSOD outperforms unsupervised methods by up to 26% and weakly supervised methods by up to 36% in F-measure, further narrowing the gap to fully supervised models.
Similar Papers
S3OD: Towards Generalizable Salient Object Detection with Synthetic Data
CV and Pattern Recognition
Finds important things in pictures better.
DualGazeNet: A Biologically Inspired Dual-Gaze Query Network for Salient Object Detection
CV and Pattern Recognition
Finds important things in pictures faster.
Small Object Detection: A Comprehensive Survey on Challenges, Techniques and Real-World Applications
CV and Pattern Recognition
Finds tiny things in pictures better.