DePT3R: Joint Dense Point Tracking and 3D Reconstruction of Dynamic Scenes in a Single Forward Pass
By: Vivek Alumootil, Tuan-Anh Vu, M. Khalid Jawed
Potential Business Impact:
Makes 3D movies from many pictures.
Current methods for dense 3D point tracking in dynamic scenes typically rely on pairwise processing, require known camera poses, or assume a temporal ordering to input frames, constraining their flexibility and applicability. Additionally, recent advances have successfully enabled efficient 3D reconstruction from large-scale, unposed image collections, underscoring opportunities for unified approaches to dynamic scene understanding. Motivated by this, we propose DePT3R, a novel framework that simultaneously performs dense point tracking and 3D reconstruction of dynamic scenes from multiple images in a single forward pass. This multi-task learning is achieved by extracting deep spatio-temporal features with a powerful backbone and regressing pixel-wise maps with dense prediction heads. Crucially, DePT3R operates without requiring camera poses, substantially enhancing its adaptability and efficiency-especially important in dynamic environments with rapid changes. We validate DePT3R on several challenging benchmarks involving dynamic scenes, demonstrating strong performance and significant improvements in memory efficiency over existing state-of-the-art methods. Data and codes are available via the open repository: https://github.com/StructuresComp/DePT3R
Similar Papers
PointSt3R: Point Tracking through 3D Grounded Correspondence
CV and Pattern Recognition
Tracks moving objects in videos without seeing them move.
4D3R: Motion-Aware Neural Reconstruction and Rendering of Dynamic Scenes from Monocular Videos
CV and Pattern Recognition
Creates realistic 3D videos from regular videos.
Efficiently Reconstructing Dynamic Scenes One D4RT at a Time
CV and Pattern Recognition
Reconstructs 3D scenes from videos quickly.