HAVIR: HierArchical Vision to Image Reconstruction using CLIP-Guided Versatile Diffusion
By: Shiyi Zhang , Dong Liang , Hairong Zheng and more
Potential Business Impact:
Lets brains see pictures from thoughts.
The reconstruction of visual information from brain activity fosters interdisciplinary integration between neuroscience and computer vision. However, existing methods still face challenges in accurately recovering highly complex visual stimuli. This difficulty stems from the characteristics of natural scenes: low-level features exhibit heterogeneity, while high-level features show semantic entanglement due to contextual overlaps. Inspired by the hierarchical representation theory of the visual cortex, we propose the HAVIR model, which separates the visual cortex into two hierarchical regions and extracts distinct features from each. Specifically, the Structural Generator extracts structural information from spatial processing voxels and converts it into latent diffusion priors, while the Semantic Extractor converts semantic processing voxels into CLIP embeddings. These components are integrated via the Versatile Diffusion model to synthesize the final image. Experimental results demonstrate that HAVIR enhances both the structural and semantic quality of reconstructions, even in complex scenes, and outperforms existing models.
Similar Papers
NeuroSwift: A Lightweight Cross-Subject Framework for fMRI Visual Reconstruction of Complex Scenes
CV and Pattern Recognition
Shows what someone sees from their brain.
ViEEG: Hierarchical Visual Neural Representation for EEG Brain Decoding
CV and Pattern Recognition
Lets computers see what you see.
Hi-DREAM: Brain Inspired Hierarchical Diffusion for fMRI Reconstruction via ROI Encoder and visuAl Mapping
CV and Pattern Recognition
Reconstructs images from brain scans better.