A Vision for Multisensory Intelligence: Sensing, Synergy, and Science
By: Paul Pu Liang
Potential Business Impact:
AI learns from all your senses, not just screens.
Our experience of the world is multisensory, spanning a synthesis of language, sight, sound, touch, taste, and smell. Yet, artificial intelligence has primarily advanced in digital modalities like text, vision, and audio. This paper outlines a research vision for multisensory artificial intelligence over the next decade. This new set of technologies can change how humans and AI experience and interact with one another, by connecting AI to the human senses and a rich spectrum of signals from physiological and tactile cues on the body, to physical and social signals in homes, cities, and the environment. We outline how this field must advance through three interrelated themes of sensing, science, and synergy. Firstly, research in sensing should extend how AI captures the world in richer ways beyond the digital medium. Secondly, developing a principled science for quantifying multimodal heterogeneity and interactions, developing unified modeling architectures and representations, and understanding cross-modal transfer. Finally, we present new technical challenges to learn synergy between modalities and between humans and AI, covering multisensory integration, alignment, reasoning, generation, generalization, and experience. Accompanying this vision paper are a series of projects, resources, and demos of latest advances from the Multisensory Intelligence group at the MIT Media Lab, see https://mit-mi.github.io/.
Similar Papers
Towards Robust Multimodal Learning in the Open World
Machine Learning (CS)
Helps AI understand the real world better.
Towards deployment-centric multimodal AI beyond vision and language
Artificial Intelligence
AI learns from many things, not just pictures.
Vision-Based Multimodal Interfaces: A Survey and Taxonomy for Enhanced Context-Aware System Design
Human-Computer Interaction
Helps computers understand what you mean better.