Learning Association via Track-Detection Matching for Multi-Object Tracking
By: Momir Adžemović
Potential Business Impact:
Tracks moving things in videos better.
Multi-object tracking aims to maintain object identities over time by associating detections across video frames. Two dominant paradigms exist in literature: tracking-by-detection methods, which are computationally efficient but rely on handcrafted association heuristics, and end-to-end approaches, which learn association from data at the cost of higher computational complexity. We propose Track-Detection Link Prediction (TDLP), a tracking-by-detection method that performs per-frame association via link prediction between tracks and detections, i.e., by predicting the correct continuation of each track at every frame. TDLP is architecturally designed primarily for geometric features such as bounding boxes, while optionally incorporating additional cues, including pose and appearance. Unlike heuristic-based methods, TDLP learns association directly from data without handcrafted rules, while remaining modular and computationally efficient compared to end-to-end trackers. Extensive experiments on multiple benchmarks demonstrate that TDLP consistently surpasses state-of-the-art performance across both tracking-by-detection and end-to-end methods. Finally, we provide a detailed analysis comparing link prediction with metric learning-based association and show that link prediction is more effective, particularly when handling heterogeneous features such as detection bounding boxes. Our code is available at \href{https://github.com/Robotmurlock/TDLP}{https://github.com/Robotmurlock/TDLP}.
Similar Papers
From Detection to Association: Learning Discriminative Object Embeddings for Multi-Object Tracking
CV and Pattern Recognition
Helps cameras track many moving things better.
Deep Learning-Based Multi-Object Tracking: A Comprehensive Survey from Foundations to State-of-the-Art
CV and Pattern Recognition
Tracks many moving things in videos better.
StableTrack: Stabilizing Multi-Object Tracking on Low-Frequency Detections
CV and Pattern Recognition
Tracks moving things better with less computer power.