Score: 1

Decoding Visual Neural Representations by Multimodal with Dynamic Balancing

Published: September 3, 2025 | arXiv ID: 2509.03433v1

By: Kaili sun , Xingyu Miao , Bing Zhai and more

Potential Business Impact:

Reads minds by matching brain waves to pictures.

Business Areas:
Visual Search Internet Services

In this work, we propose an innovative framework that integrates EEG, image, and text data, aiming to decode visual neural representations from low signal-to-noise ratio EEG signals. Specifically, we introduce text modality to enhance the semantic correspondence between EEG signals and visual content. With the explicit semantic labels provided by text, image and EEG features of the same category can be more closely aligned with the corresponding text representations in a shared multimodal space. To fully utilize pre-trained visual and textual representations, we propose an adapter module that alleviates the instability of high-dimensional representation while facilitating the alignment and fusion of cross-modal features. Additionally, to alleviate the imbalance in multimodal feature contributions introduced by the textual representations, we propose a Modal Consistency Dynamic Balance (MCDB) strategy that dynamically adjusts the contribution weights of each modality. We further propose a stochastic perturbation regularization (SPR) term to enhance the generalization ability of semantic perturbation-based models by introducing dynamic Gaussian noise in the modality optimization process. The evaluation results on the ThingsEEG dataset show that our method surpasses previous state-of-the-art methods in both Top-1 and Top-5 accuracy metrics, improving by 2.0\% and 4.7\% respectively.

Country of Origin
🇬🇧 United Kingdom

Page Count
36 pages

Category
Computer Science:
CV and Pattern Recognition