Score: 0

Enhancing Explainability with Multimodal Context Representations for Smarter Robots

Published: February 28, 2025 | arXiv ID: 2503.16467v1

By: Anargh Viswanath, Lokesh Veeramacheneni, Hendrik Buschmeier

Potential Business Impact:

Robots understand what you say and see.

Business Areas:

Artificial Intelligence Artificial Intelligence, Data and Analytics, Science and Engineering, Software

Artificial Intelligence (AI) has significantly advanced in recent years, driving innovation across various fields, especially in robotics. Even though robots can perform complex tasks with increasing autonomy, challenges remain in ensuring explainability and user-centered design for effective interaction. A key issue in Human-Robot Interaction (HRI) is enabling robots to effectively perceive and reason over multimodal inputs, such as audio and vision, to foster trust and seamless collaboration. In this paper, we propose a generalized and explainable multimodal framework for context representation, designed to improve the fusion of speech and vision modalities. We introduce a use case on assessing 'Relevance' between verbal utterances from the user and visual scene perception of the robot. We present our methodology with a Multimodal Joint Representation module and a Temporal Alignment module, which can allow robots to evaluate relevance by temporally aligning multimodal inputs. Finally, we discuss how the proposed framework for context representation can help with various aspects of explainability in HRI.

Decoding the Multimodal Maze: A Systematic Review on the Adoption of Explainability in Multimodal Attention-based Models

Machine Learning (CS)

Helps understand how AI uses different information.

6 Aug 2025 0

90%

Rethinking Explainability in the Era of Multimodal AI

Artificial Intelligence

Explains how different data types work together.

16 Jun 2025 1

90%

Accessible and Pedagogically-Grounded Explainability for Human-Robot Interaction: A Framework Based on UDL and Symbolic Interfaces

Robotics

Helps robots explain themselves to everyone.

8 Apr 2025 0

View PDF Login to Bookmark

Country of Origin

🇩🇪 Germany

Page Count

5 pages

Enhancing Explainability with Multimodal Context Representations for Smarter Robots

Robots understand what you say and see.

Technical Abstract

Decoding the Multimodal Maze: A Systematic Review on the Adoption of Explainability in Multimodal Attention-based Models

Rethinking Explainability in the Era of Multimodal AI

Accessible and Pedagogically-Grounded Explainability for Human-Robot Interaction: A Framework Based on UDL and Symbolic Interfaces