Stereo Any Video: Temporally Consistent Stereo Matching
By: Junpeng Jing , Weixun Luo , Ye Mao and more
Potential Business Impact:
Makes 3D videos look real without special cameras.
This paper introduces Stereo Any Video, a powerful framework for video stereo matching. It can estimate spatially accurate and temporally consistent disparities without relying on auxiliary information such as camera poses or optical flow. The strong capability is driven by rich priors from monocular video depth models, which are integrated with convolutional features to produce stable representations. To further enhance performance, key architectural innovations are introduced: all-to-all-pairs correlation, which constructs smooth and robust matching cost volumes, and temporal convex upsampling, which improves temporal coherence. These components collectively ensure robustness, accuracy, and temporal consistency, setting a new standard in video stereo matching. Extensive experiments demonstrate that our method achieves state-of-the-art performance across multiple datasets both qualitatively and quantitatively in zero-shot settings, as well as strong generalization to real-world indoor and outdoor scenarios.
Similar Papers
Lite Any Stereo: Efficient Zero-Shot Stereo Matching
CV and Pattern Recognition
Makes computers see depth with less power.
StereoSync: Spatially-Aware Stereo Audio Generation from Video
Sound
Makes video sound match what you see.
Beyond Audio and Pose: A General-Purpose Framework for Video Synchronization
CV and Pattern Recognition
Aligns videos from different cameras automatically.