Decoding Visual Neural Representations by Multimodal with Dynamic Balancing
By: Kaili sun , Xingyu Miao , Bing Zhai and more
Potential Business Impact:
Reads minds by matching brain waves to pictures.
In this work, we propose an innovative framework that integrates EEG, image, and text data, aiming to decode visual neural representations from low signal-to-noise ratio EEG signals. Specifically, we introduce text modality to enhance the semantic correspondence between EEG signals and visual content. With the explicit semantic labels provided by text, image and EEG features of the same category can be more closely aligned with the corresponding text representations in a shared multimodal space. To fully utilize pre-trained visual and textual representations, we propose an adapter module that alleviates the instability of high-dimensional representation while facilitating the alignment and fusion of cross-modal features. Additionally, to alleviate the imbalance in multimodal feature contributions introduced by the textual representations, we propose a Modal Consistency Dynamic Balance (MCDB) strategy that dynamically adjusts the contribution weights of each modality. We further propose a stochastic perturbation regularization (SPR) term to enhance the generalization ability of semantic perturbation-based models by introducing dynamic Gaussian noise in the modality optimization process. The evaluation results on the ThingsEEG dataset show that our method surpasses previous state-of-the-art methods in both Top-1 and Top-5 accuracy metrics, improving by 2.0\% and 4.7\% respectively.
Similar Papers
DynaMind: Reconstructing Dynamic Visual Scenes from EEG by Aligning Temporal Dynamics and Multimodal Semantics to Guided Diffusion
CV and Pattern Recognition
Shows what you're seeing from brain waves.
MOON2.0: Dynamic Modality-balanced Multimodal Representation Learning for E-commerce Product Understanding
CV and Pattern Recognition
Helps online stores understand products better.
ECMF: Enhanced Cross-Modal Fusion for Multimodal Emotion Recognition in MER-SEMI Challenge
CV and Pattern Recognition
Helps computers understand your feelings from faces, voices, words.