AutoQ-VIS: Improving Unsupervised Video Instance Segmentation via Automatic Quality Assessment
By: Kaixuan Lu, Mehmet Onurcan Kaya, Dim P. Papadopoulos
Potential Business Impact:
Teaches computers to track objects in videos without help.
Video Instance Segmentation (VIS) faces significant annotation challenges due to its dual requirements of pixel-level masks and temporal consistency labels. While recent unsupervised methods like VideoCutLER eliminate optical flow dependencies through synthetic data, they remain constrained by the synthetic-to-real domain gap. We present AutoQ-VIS, a novel unsupervised framework that bridges this gap through quality-guided self-training. Our approach establishes a closed-loop system between pseudo-label generation and automatic quality assessment, enabling progressive adaptation from synthetic to real videos. Experiments demonstrate state-of-the-art performance with 52.6 $\text{AP}_{50}$ on YouTubeVIS-2019 val set, surpassing the previous state-of-the-art VideoCutLER by 4.4$\%$, while requiring no human annotations. This demonstrates the viability of quality-aware self-training for unsupervised VIS. The source code of our method is available at https://github.com/wcbup/AutoQ-VIS.
Similar Papers
Boosting Unsupervised Video Instance Segmentation with Automatic Quality-Guided Self-Training
CV and Pattern Recognition
Teaches computers to track objects in videos automatically.
Breaking Annotation Barriers: Generalized Video Quality Assessment via Ranking-based Self-Supervision
CV and Pattern Recognition
Makes videos look better without human help.
Semi-Supervised Multi-Task Learning for Interpretable Quality As- sessment of Fundus Images
CV and Pattern Recognition
Helps doctors see eye problems in pictures.