Score: 0

Decoding the Multimodal Mind: Generalizable Brain-to-Text Translation via Multimodal Alignment and Adaptive Routing

Published: May 15, 2025 | arXiv ID: 2505.10356v2

By: Chunyu Ye , Yunhao Zhang , Jingyuan Sun and more

Potential Business Impact:

Reads thoughts about pictures, sounds, and words.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Decoding language from the human brain remains a grand challenge for Brain-Computer Interfaces (BCIs). Current approaches typically rely on unimodal brain representations, neglecting the brain's inherently multimodal processing. Inspired by the brain's associative mechanisms, where viewing an image can evoke related sounds and linguistic representations, we propose a unified framework that leverages Multimodal Large Language Models (MLLMs) to align brain signals with a shared semantic space encompassing text, images, and audio. A router module dynamically selects and fuses modality-specific brain features according to the characteristics of each stimulus. Experiments on various fMRI datasets with textual, visual, and auditory stimuli demonstrate state-of-the-art performance, achieving an 8.48% improvement on the most commonly used benchmark. We further extend our framework to EEG and MEG data, demonstrating flexibility and robustness across varying temporal and spatial resolutions. To our knowledge, this is the first unified BCI architecture capable of robustly decoding multimodal brain activity across diverse brain signals and stimulus types, offering a flexible solution for real-world applications.

A Pre-trained Framework for Multilingual Brain Decoding Using Non-invasive Recordings

Neurons and Cognition

Lets brains talk in any language.

3 Jun 2025 0

90%

Unified Multimodal Brain Decoding via Cross-Subject Soft-ROI Fusion

Machine Learning (CS)

Reads minds to describe what you see.

23 Dec 2025 2

90%

Brain-Adapter: Enhancing Neurological Disorder Analysis with Adapter-Tuning Multimodal Large Language Models

Image and Video Processing

Helps doctors find brain problems using scans and words.

27 Jan 2025 0

View PDF Login to Bookmark

Page Count

9 pages

Decoding the Multimodal Mind: Generalizable Brain-to-Text Translation via Multimodal Alignment and Adaptive Routing

Reads thoughts about pictures, sounds, and words.

Technical Abstract

A Pre-trained Framework for Multilingual Brain Decoding Using Non-invasive Recordings

Unified Multimodal Brain Decoding via Cross-Subject Soft-ROI Fusion

Brain-Adapter: Enhancing Neurological Disorder Analysis with Adapter-Tuning Multimodal Large Language Models