Score: 0

NeuroCLIP: Brain-Inspired Prompt Tuning for EEG-to-Image Multimodal Contrastive Learning

Published: November 12, 2025 | arXiv ID: 2511.09250v1

By: Jiyuan Wang , Li Zhang , Haipeng Lin and more

Potential Business Impact:

Lets computers understand what you see from brain waves.

Business Areas:

Visual Search Internet Services

Recent advances in brain-inspired artificial intelligence have sought to align neural signals with visual semantics using multimodal models such as CLIP. However, existing methods often treat CLIP as a static feature extractor, overlooking its adaptability to neural representations and the inherent physiological-symbolic gap in EEG-image alignment. To address these challenges, we present NeuroCLIP, a prompt tuning framework tailored for EEG-to-image contrastive learning. Our approach introduces three core innovations: (1) We design a dual-stream visual embedding pipeline that combines dynamic filtering and token-level fusion to generate instance-level adaptive prompts, which guide the adjustment of patch embedding tokens based on image content, thereby enabling fine-grained modulation of visual representations under neural constraints; (2) We are the first to introduce visual prompt tokens into EEG-image alignment, acting as global, modality-level prompts that work in conjunction with instance-level adjustments. These visual prompt tokens are inserted into the Transformer architecture to facilitate neural-aware adaptation and parameter optimization at a global level; (3) Inspired by neuroscientific principles of human visual encoding, we propose a refined contrastive loss that better model the semantic ambiguity and cross-modal noise present in EEG signals. On the THINGS-EEG2 dataset, NeuroCLIP achieves a Top-1 accuracy of 63.2% in zero-shot image retrieval, surpassing the previous best method by +12.3%, and demonstrates strong generalization under inter-subject conditions (+4.6% Top-1), highlighting the potential of physiology-aware prompt tuning for bridging brain signals and visual semantics.

DistilCLIP-EEG: Enhancing Epileptic Seizure Detection Through Multi-modal Learning and Knowledge Distillation

Machine Learning (CS)

Finds seizures using brain waves and words.

15 Oct 2025 0

90%

Image-to-Brain Signal Generation for Visual Prosthesis with CLIP Guided Multimodal Diffusion Models

CV and Pattern Recognition

Helps blind people see by turning pictures into brain signals.

31 Aug 2025 1

90%

Cross-domain EEG-based Emotion Recognition with Contrastive Learning

CV and Pattern Recognition

Reads your feelings from brain waves.

7 Nov 2025 0

View PDF Login to Bookmark

Page Count

26 pages

NeuroCLIP: Brain-Inspired Prompt Tuning for EEG-to-Image Multimodal Contrastive Learning

Lets computers understand what you see from brain waves.

Technical Abstract

DistilCLIP-EEG: Enhancing Epileptic Seizure Detection Through Multi-modal Learning and Knowledge Distillation

Image-to-Brain Signal Generation for Visual Prosthesis with CLIP Guided Multimodal Diffusion Models

Cross-domain EEG-based Emotion Recognition with Contrastive Learning