Lite Any Stereo: Efficient Zero-Shot Stereo Matching
By: Junpeng Jing , Weixun Luo , Ye Mao and more
Potential Business Impact:
Makes computers see depth with less power.
Recent advances in stereo matching have focused on accuracy, often at the cost of significantly increased model size. Traditionally, the community has regarded efficient models as incapable of zero-shot ability due to their limited capacity. In this paper, we introduce Lite Any Stereo, a stereo depth estimation framework that achieves strong zero-shot generalization while remaining highly efficient. To this end, we design a compact yet expressive backbone to ensure scalability, along with a carefully crafted hybrid cost aggregation module. We further propose a three-stage training strategy on million-scale data to effectively bridge the sim-to-real gap. Together, these components demonstrate that an ultra-light model can deliver strong generalization, ranking 1st across four widely used real-world benchmarks. Remarkably, our model attains accuracy comparable to or exceeding state-of-the-art non-prior-based accurate methods while requiring less than 1% computational cost, setting a new standard for efficient stereo matching.
Similar Papers
Fast-FoundationStereo: Real-Time Zero-Shot Stereo Matching
CV and Pattern Recognition
Makes 3D cameras work super fast and smart.
Stereo Any Video: Temporally Consistent Stereo Matching
CV and Pattern Recognition
Makes 3D videos look real without special cameras.
LeanStereo: A Leaner Backbone based Stereo Network
CV and Pattern Recognition
Makes 3D cameras faster and use less power.