Score: 0

Stereo Any Video: Temporally Consistent Stereo Matching

Published: March 7, 2025 | arXiv ID: 2503.05549v3

By: Junpeng Jing , Weixun Luo , Ye Mao and more

Potential Business Impact:

Makes 3D videos look real without special cameras.

Business Areas:
Image Recognition Data and Analytics, Software

This paper introduces Stereo Any Video, a powerful framework for video stereo matching. It can estimate spatially accurate and temporally consistent disparities without relying on auxiliary information such as camera poses or optical flow. The strong capability is driven by rich priors from monocular video depth models, which are integrated with convolutional features to produce stable representations. To further enhance performance, key architectural innovations are introduced: all-to-all-pairs correlation, which constructs smooth and robust matching cost volumes, and temporal convex upsampling, which improves temporal coherence. These components collectively ensure robustness, accuracy, and temporal consistency, setting a new standard in video stereo matching. Extensive experiments demonstrate that our method achieves state-of-the-art performance across multiple datasets both qualitatively and quantitatively in zero-shot settings, as well as strong generalization to real-world indoor and outdoor scenarios.

Country of Origin
🇬🇧 United Kingdom

Page Count
19 pages

Category
Computer Science:
CV and Pattern Recognition