A Deep Learning Model of Mental Rotation Informed by Interactive VR Experiments
By: Raymond Khazoum , Daniela Fernandes , Aleksandr Krylov and more
Mental rotation -- the ability to compare objects seen from different viewpoints -- is a fundamental example of mental simulation and spatial world modelling in humans. Here we propose a mechanistic model of human mental rotation, leveraging advances in deep, equivariant, and neuro-symbolic learning. Our model consists of three stacked components: (1) an equivariant neural encoder, taking images as input and producing 3D spatial representations of objects, (2) a neuro-symbolic object encoder, deriving symbolic descriptions of objects from these spatial representations, and (3) a neural decision agent, comparing these symbolic descriptions to prescribe rotation simulations in 3D latent space via a recurrent pathway. Our model design is guided by the abundant experimental literature on mental rotation, which we complemented with experiments in VR where participants could at times manipulate the objects to compare, providing us with additional insights into the cognitive process of mental rotation. Our model captures well the performance, response times and behavior of participants in our and others' experiments. The necessity of each model component is shown through systematic ablations. Our work adds to a recent collection of deep neural models of human spatial reasoning, further demonstrating the potency of integrating deep, equivariant, and symbolic representations to model the human mind.
Similar Papers
Large Vision Models Can Solve Mental Rotation Problems
CV and Pattern Recognition
Computers learn to "see" and turn objects in their minds.
Unfolding Spatial Cognition: Evaluating Multimodal Models on Visual Simulations
CV and Pattern Recognition
Teaches computers to solve puzzles with pictures.
How Does a Virtual Agent Decide Where to Look? - Symbolic Cognitive Reasoning for Embodied Head Rotation
Graphics
Makes virtual characters look around realistically.