S3OD: Towards Generalizable Salient Object Detection with Synthetic Data
By: Orest Kupyn, Hirokatsu Kataoka, Christian Rupprecht
Potential Business Impact:
Finds important things in pictures better.
Salient object detection exemplifies data-bounded tasks where expensive pixel-precise annotations force separate model training for related subtasks like DIS and HR-SOD. We present a method that dramatically improves generalization through large-scale synthetic data generation and ambiguity-aware architecture. We introduce S3OD, a dataset of over 139,000 high-resolution images created through our multi-modal diffusion pipeline that extracts labels from diffusion and DINO-v3 features. The iterative generation framework prioritizes challenging categories based on model performance. We propose a streamlined multi-mask decoder that naturally handles the inherent ambiguity in salient object detection by predicting multiple valid interpretations. Models trained solely on synthetic data achieve 20-50% error reduction in cross-dataset generalization, while fine-tuned versions reach state-of-the-art performance across DIS and HR-SOD benchmarks.
Similar Papers
Boosting Salient Object Detection with Knowledge Distillated from Large Foundation Models
CV and Pattern Recognition
Helps computers find important things in pictures faster.
Practical Insights into Semi-Supervised Object Detection Approaches
CV and Pattern Recognition
Teaches computers to find things with few examples.
Hyperspectral Remote Sensing Images Salient Object Detection: The First Benchmark Dataset and Baseline
CV and Pattern Recognition
Finds important things in pictures from space.