A Multi-Modal Neuro-Symbolic Approach for Spatial Reasoning-Based Visual Grounding in Robotics
By: Simindokht Jahangard , Mehrzad Mohammadi , Abhinav Dhall and more
Potential Business Impact:
Helps robots understand where things are.
Visual reasoning, particularly spatial reasoning, is a challenging cognitive task that requires understanding object relationships and their interactions within complex environments, especially in robotics domain. Existing vision_language models (VLMs) excel at perception tasks but struggle with fine-grained spatial reasoning due to their implicit, correlation-driven reasoning and reliance solely on images. We propose a novel neuro_symbolic framework that integrates both panoramic-image and 3D point cloud information, combining neural perception with symbolic reasoning to explicitly model spatial and logical relationships. Our framework consists of a perception module for detecting entities and extracting attributes, and a reasoning module that constructs a structured scene graph to support precise, interpretable queries. Evaluated on the JRDB-Reasoning dataset, our approach demonstrates superior performance and reliability in crowded, human_built environments while maintaining a lightweight design suitable for robotics and embodied AI applications.
Similar Papers
Vision-Language Memory for Spatial Reasoning
CV and Pattern Recognition
Robots understand 3D space better from videos.
Seeing Across Views: Benchmarking Spatial Reasoning of Vision-Language Models in Robotic Scenes
CV and Pattern Recognition
Helps robots understand things from many camera views.
Reasoning Path and Latent State Analysis for Multi-view Visual Spatial Reasoning: A Cognitive Science Perspective
Artificial Intelligence
Helps computers understand 3D spaces from different views.