Self-Supervised Spatial Correspondence Across Modalities
By: Ayush Shrivastava, Andrew Owens
Potential Business Impact:
Matches points in different kinds of pictures.
We present a method for finding cross-modal space-time correspondences. Given two images from different visual modalities, such as an RGB image and a depth map, our model identifies which pairs of pixels correspond to the same physical points in the scene. To solve this problem, we extend the contrastive random walk framework to simultaneously learn cycle-consistent feature representations for both cross-modal and intra-modal matching. The resulting model is simple and has no explicit photo-consistency assumptions. It can be trained entirely using unlabeled data, without the need for any spatially aligned multimodal image pairs. We evaluate our method on both geometric and semantic correspondence tasks. For geometric matching, we consider challenging tasks such as RGB-to-depth and RGB-to-thermal matching (and vice versa); for semantic matching, we evaluate on photo-sketch and cross-style image alignment. Our method achieves strong performance across all benchmarks.
Similar Papers
Model alignment using inter-modal bridges
Machine Learning (CS)
Lets different AI skills work together easily.
Do It Yourself: Learning Semantic Correspondence from Pseudo-Labels
CV and Pattern Recognition
Helps computers match objects in pictures better.
Self-Supervised Cross-Modal Learning for Image-to-Point Cloud Registration
CV and Pattern Recognition
Helps cars see the world in 3D.