Score: 1

BrainMCLIP: Brain Image Decoding with Multi-Layer feature Fusion of CLIP

Published: October 22, 2025 | arXiv ID: 2510.19332v1

By: Tian Xia , Zihan Ma , Xinlong Wang and more

Potential Business Impact:

Reads minds to see detailed pictures.

Business Areas:

Image Recognition Data and Analytics, Software

Decoding images from fMRI often involves mapping brain activity to CLIP's final semantic layer. To capture finer visual details, many approaches add a parameter-intensive VAE-based pipeline. However, these approaches overlook rich object information within CLIP's intermediate layers and contradicts the brain's functionally hierarchical. We introduce BrainMCLIP, which pioneers a parameter-efficient, multi-layer fusion approach guided by human visual system's functional hierarchy, eliminating the need for such a separate VAE pathway. BrainMCLIP aligns fMRI signals from functionally distinct visual areas (low-/high-level) to corresponding intermediate and final CLIP layers, respecting functional hierarchy. We further introduce a Cross-Reconstruction strategy and a novel multi-granularity loss. Results show BrainMCLIP achieves highly competitive performance, particularly excelling on high-level semantic metrics where it matches or surpasses SOTA(state-of-the-art) methods, including those using VAE pipelines. Crucially, it achieves this with substantially fewer parameters, demonstrating a reduction of 71.7\%(Table.\ref{tab:compare_clip_vae}) compared to top VAE-based SOTA methods, by avoiding the VAE pathway. By leveraging intermediate CLIP features, it effectively captures visual details often missed by CLIP-only approaches, striking a compelling balance between semantic accuracy and detail fidelity without requiring a separate VAE pipeline.

Image-to-Brain Signal Generation for Visual Prosthesis with CLIP Guided Multimodal Diffusion Models

CV and Pattern Recognition

Helps blind people see by turning pictures into brain signals.

31 Aug 2025 1

89%

NeuroSwift: A Lightweight Cross-Subject Framework for fMRI Visual Reconstruction of Complex Scenes

CV and Pattern Recognition

Shows what someone sees from their brain.

2 Oct 2025 1

89%

NeuroCLIP: Brain-Inspired Prompt Tuning for EEG-to-Image Multimodal Contrastive Learning

Information Retrieval

Lets computers understand what you see from brain waves.

12 Nov 2025 0

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

10 pages

BrainMCLIP: Brain Image Decoding with Multi-Layer feature Fusion of CLIP

Reads minds to see detailed pictures.

Technical Abstract

Image-to-Brain Signal Generation for Visual Prosthesis with CLIP Guided Multimodal Diffusion Models

NeuroSwift: A Lightweight Cross-Subject Framework for fMRI Visual Reconstruction of Complex Scenes

NeuroCLIP: Brain-Inspired Prompt Tuning for EEG-to-Image Multimodal Contrastive Learning