FVOS for MOSE Track of 4th PVUW Challenge: 3rd Place Solution
By: Mengjiao Wang , Junpei Zhang , Xu Liu and more
Potential Business Impact:
Helps computers perfectly track moving things in videos.
Video Object Segmentation (VOS) is one of the most fundamental and challenging tasks in computer vision and has a wide range of applications. Most existing methods rely on spatiotemporal memory networks to extract frame-level features and have achieved promising results on commonly used datasets. However, these methods often struggle in more complex real-world scenarios. This paper addresses this issue, aiming to achieve accurate segmentation of video objects in challenging scenes. We propose fine-tuning VOS (FVOS), optimizing existing methods for specific datasets through tailored training. Additionally, we introduce a morphological post-processing strategy to address the issue of excessively large gaps between adjacent objects in single-model predictions. Finally, we apply a voting-based fusion method on multi-scale segmentation results to generate the final output. Our approach achieves J&F scores of 76.81% and 83.92% during the validation and testing stages, respectively, securing third place overall in the MOSE Track of the 4th PVUW challenge 2025.
Similar Papers
LSVOS 2025 Challenge Report: Recent Advances in Complex Video Object Segmentation
CV and Pattern Recognition
Helps computers track many moving things in videos.
SAMSON: 3rd Place Solution of LSVOS 2025 VOS Challenge
CV and Pattern Recognition
Tracks moving things in videos, even when hidden.
4th PVUW MeViS 3rd Place Report: Sa2VA
CV and Pattern Recognition
Helps computers find objects in videos using words.