SyncTrack4D: Cross-Video Motion Alignment and Video Synchronization for Multi-Video 4D Gaussian Splatting
By: Yonghan Lee , Tsung-Wei Huang , Shiv Gehlot and more
Potential Business Impact:
Makes shaky videos look like smooth 3D movies.
Modeling dynamic 3D scenes is challenging due to their high-dimensional nature, which requires aggregating information from multiple views to reconstruct time-evolving 3D geometry and motion. We present a novel multi-video 4D Gaussian Splatting (4DGS) approach designed to handle real-world, unsynchronized video sets. Our approach, SyncTrack4D, directly leverages dense 4D track representation of dynamic scene parts as cues for simultaneous cross-video synchronization and 4DGS reconstruction. We first compute dense per-video 4D feature tracks and cross-video track correspondences by Fused Gromov-Wasserstein optimal transport approach. Next, we perform global frame-level temporal alignment to maximize overlapping motion of matched 4D tracks. Finally, we achieve sub-frame synchronization through our multi-video 4D Gaussian splatting built upon a motion-spline scaffold representation. The final output is a synchronized 4DGS representation with dense, explicit 3D trajectories, and temporal offsets for each video. We evaluate our approach on the Panoptic Studio and SyncNeRF Blender, demonstrating sub-frame synchronization accuracy with an average temporal error below 0.26 frames, and high-fidelity 4D reconstruction reaching 26.3 PSNR scores on the Panoptic Studio dataset. To the best of our knowledge, our work is the first general 4D Gaussian Splatting approach for unsynchronized video sets, without assuming the existence of predefined scene objects or prior models.
Similar Papers
Cross-Temporal 3D Gaussian Splatting for Sparse-View Guided Scene Update
CV and Pattern Recognition
Builds 3D worlds from old and new pictures.
Motion4D: Learning 3D-Consistent Motion and Semantics for 4D Scene Understanding
CV and Pattern Recognition
Makes videos show 3D worlds without flickering.
Geometry-Consistent 4D Gaussian Splatting for Sparse-Input Dynamic View Synthesis
CV and Pattern Recognition
Creates realistic 3D scenes from few pictures.