Distractor-Aware Memory-Based Visual Object Tracking
By: Jovana Videnovic, Matej Kristan, Alan Lukezic
Potential Business Impact:
Helps computers track moving objects better.
Recent emergence of memory-based video segmentation methods such as SAM2 has led to models with excellent performance in segmentation tasks, achieving leading results on numerous benchmarks. However, these modes are not fully adjusted for visual object tracking, where distractors (i.e., objects visually similar to the target) pose a key challenge. In this paper we propose a distractor-aware drop-in memory module and introspection-based management method for SAM2, leading to DAM4SAM. Our design effectively reduces the tracking drift toward distractors and improves redetection capability after object occlusion. To facilitate the analysis of tracking in the presence of distractors, we construct DiDi, a Distractor-Distilled dataset. DAM4SAM outperforms SAM2.1 on thirteen benchmarks and sets new state-of-the-art results on ten. Furthermore, integrating the proposed distractor-aware memory into a real-time tracker EfficientTAM leads to 11% improvement and matches tracking quality of the non-real-time SAM2.1-L on multiple tracking and segmentation benchmarks, while integration with edge-based tracker EdgeTAM delivers 4% performance boost, demonstrating a very good generalization across architectures.
Similar Papers
Rethinking Memory Design in SAM-Based Visual Object Tracking
CV and Pattern Recognition
Helps computers remember objects better when they disappear.
Evaluating SAM2 for Video Semantic Segmentation
CV and Pattern Recognition
Lets computers perfectly cut out any object in videos.
Fast SAM2 with Text-Driven Token Pruning
CV and Pattern Recognition
Makes videos easier to edit by focusing on important parts.