Score: 1

Decoding Visual Neural Representations by Multimodal with Dynamic Balancing

Published: September 3, 2025 | arXiv ID: 2509.03433v1

By: Kaili sun , Xingyu Miao , Bing Zhai and more

Potential Business Impact:

Reads minds by matching brain waves to pictures.

Business Areas:

Visual Search Internet Services

In this work, we propose an innovative framework that integrates EEG, image, and text data, aiming to decode visual neural representations from low signal-to-noise ratio EEG signals. Specifically, we introduce text modality to enhance the semantic correspondence between EEG signals and visual content. With the explicit semantic labels provided by text, image and EEG features of the same category can be more closely aligned with the corresponding text representations in a shared multimodal space. To fully utilize pre-trained visual and textual representations, we propose an adapter module that alleviates the instability of high-dimensional representation while facilitating the alignment and fusion of cross-modal features. Additionally, to alleviate the imbalance in multimodal feature contributions introduced by the textual representations, we propose a Modal Consistency Dynamic Balance (MCDB) strategy that dynamically adjusts the contribution weights of each modality. We further propose a stochastic perturbation regularization (SPR) term to enhance the generalization ability of semantic perturbation-based models by introducing dynamic Gaussian noise in the modality optimization process. The evaluation results on the ThingsEEG dataset show that our method surpasses previous state-of-the-art methods in both Top-1 and Top-5 accuracy metrics, improving by 2.0\% and 4.7\% respectively.

DynaMind: Reconstructing Dynamic Visual Scenes from EEG by Aligning Temporal Dynamics and Multimodal Semantics to Guided Diffusion

CV and Pattern Recognition

Shows what you're seeing from brain waves.

1 Sep 2025 1

89%

MOON2.0: Dynamic Modality-balanced Multimodal Representation Learning for E-commerce Product Understanding

CV and Pattern Recognition

Helps online stores understand products better.

16 Nov 2025 2

88%

ECMF: Enhanced Cross-Modal Fusion for Multimodal Emotion Recognition in MER-SEMI Challenge

CV and Pattern Recognition

Helps computers understand your feelings from faces, voices, words.

8 Aug 2025 0

View PDF Login to Bookmark

Country of Origin

🇬🇧 United Kingdom

Page Count

36 pages

Decoding Visual Neural Representations by Multimodal with Dynamic Balancing

Reads minds by matching brain waves to pictures.

Technical Abstract

DynaMind: Reconstructing Dynamic Visual Scenes from EEG by Aligning Temporal Dynamics and Multimodal Semantics to Guided Diffusion

MOON2.0: Dynamic Modality-balanced Multimodal Representation Learning for E-commerce Product Understanding

ECMF: Enhanced Cross-Modal Fusion for Multimodal Emotion Recognition in MER-SEMI Challenge