Score: 0

Enhancing Self-Supervised Fine-Grained Video Object Tracking with Dynamic Memory Prediction

Published: April 30, 2025 | arXiv ID: 2504.21692v1

By: Zihan Zhou , Changrui Dai , Aibo Song and more

Potential Business Impact:

Improves video tracking by using more past pictures.

Business Areas:

Image Recognition Data and Analytics, Software

Successful video analysis relies on accurate recognition of pixels across frames, and frame reconstruction methods based on video correspondence learning are popular due to their efficiency. Existing frame reconstruction methods, while efficient, neglect the value of direct involvement of multiple reference frames for reconstruction and decision-making aspects, especially in complex situations such as occlusion or fast movement. In this paper, we introduce a Dynamic Memory Prediction (DMP) framework that innovatively utilizes multiple reference frames to concisely and directly enhance frame reconstruction. Its core component is a Reference Frame Memory Engine that dynamically selects frames based on object pixel features to improve tracking accuracy. In addition, a Bidirectional Target Prediction Network is built to utilize multiple reference frames to improve the robustness of the model. Through experiments, our algorithm outperforms the state-of-the-art self-supervised techniques on two fine-grained video object tracking tasks: object segmentation and keypoint tracking.

Learning Multi-frame and Monocular Prior for Estimating Geometry in Dynamic Scenes

CV and Pattern Recognition

Makes videos show 3D shapes of moving things.

3 May 2025 1

88%

FRAME: Pre-Training Video Feature Representations via Anticipation and Memory

CV and Pattern Recognition

Helps computers understand videos better.

5 Jun 2025 1

88%

Leveraging Motion Information for Better Self-Supervised Video Correspondence Learning

CV and Pattern Recognition

Helps computers track moving things in videos.

15 Mar 2025 1

View PDF Login to Bookmark

Page Count

8 pages

Enhancing Self-Supervised Fine-Grained Video Object Tracking with Dynamic Memory Prediction

Improves video tracking by using more past pictures.

Technical Abstract

Learning Multi-frame and Monocular Prior for Estimating Geometry in Dynamic Scenes

FRAME: Pre-Training Video Feature Representations via Anticipation and Memory

Leveraging Motion Information for Better Self-Supervised Video Correspondence Learning