SplatSearch: Instance Image Goal Navigation for Mobile Robots using 3D Gaussian Splatting and Diffusion Models
By: Siddarth Narasimhan , Matthew Lisondra , Haitong Wang and more
Potential Business Impact:
Robot finds lost things using a picture.
The Instance Image Goal Navigation (IIN) problem requires mobile robots deployed in unknown environments to search for specific objects or people of interest using only a single reference goal image of the target. This problem can be especially challenging when: 1) the reference image is captured from an arbitrary viewpoint, and 2) the robot must operate with sparse-view scene reconstructions. In this paper, we address the IIN problem, by introducing SplatSearch, a novel architecture that leverages sparse-view 3D Gaussian Splatting (3DGS) reconstructions. SplatSearch renders multiple viewpoints around candidate objects using a sparse online 3DGS map, and uses a multi-view diffusion model to complete missing regions of the rendered images, enabling robust feature matching against the goal image. A novel frontier exploration policy is introduced which uses visual context from the synthesized viewpoints with semantic context from the goal image to evaluate frontier locations, allowing the robot to prioritize frontiers that are semantically and visually relevant to the goal image. Extensive experiments in photorealistic home and real-world environments validate the higher performance of SplatSearch against current state-of-the-art methods in terms of Success Rate and Success Path Length. An ablation study confirms the design choices of SplatSearch.
Similar Papers
Hierarchical Scoring with 3D Gaussian Splatting for Instance Image-Goal Navigation
CV and Pattern Recognition
Finds targets faster by picking smart views.
GSplatVNM: Point-of-View Synthesis for Visual Navigation Models Using Gaussian Splatting
Robotics
Robot finds its way using fewer pictures.
Novel Demonstration Generation with Gaussian Splatting Enables Robust One-Shot Manipulation
Robotics
Robots learn better from fake 3D scenes.