Score: 1

Image-to-Brain Signal Generation for Visual Prosthesis with CLIP Guided Multimodal Diffusion Models

Published: August 31, 2025 | arXiv ID: 2509.00787v3

By: Ganxi Xu, Jinyi Long, Jia Zhang

Potential Business Impact:

Helps blind people see by turning pictures into brain signals.

Business Areas:
Image Recognition Data and Analytics, Software

Visual prostheses hold great promise for restoring vision in blind individuals. While researchers have successfully utilized M/EEG signals to evoke visual perceptions during the brain decoding stage of visual prostheses, the complementary process of converting images into M/EEG signals in the brain encoding stage remains largely unexplored, hindering the formation of a complete functional pipeline. In this work, we present, to our knowledge, the first image-to-brain signal framework that generates M/EEG from images by leveraging denoising diffusion probabilistic models enhanced with cross-attention mechanisms. Specifically, the proposed framework comprises two key components: a pretrained CLIP visual encoder that extracts rich semantic representations from input images, and a cross-attention enhanced U-Net diffusion model that reconstructs brain signals through iterative denoising. Unlike conventional generative models that rely on simple concatenation for conditioning, our cross-attention modules capture the complex interplay between visual features and brain signal representations, enabling fine-grained alignment during generation. We evaluate the framework on two multimodal benchmark datasets and demonstrate that it generates biologically plausible brain signals. We also present visualizations of M/EEG topographies across all subjects in both datasets, providing intuitive demonstrations of intra-subject and inter-subject variations in brain signals.

Country of Origin
🇨🇦 🇨🇳 China, Canada

Page Count
17 pages

Category
Computer Science:
CV and Pattern Recognition