Score: 0

Uncovering Brain-Like Hierarchical Patterns in Vision-Language Models through fMRI-Based Neural Encoding

Published: October 19, 2025 | arXiv ID: 2510.16870v1

By: Yudan Ren , Xinlong Wang , Kexin Wang and more

Potential Business Impact:

Makes AI understand pictures and words like a brain.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

While brain-inspired artificial intelligence(AI) has demonstrated promising results, current understanding of the parallels between artificial neural networks (ANNs) and human brain processing remains limited: (1) unimodal ANN studies fail to capture the brain's inherent multimodal processing capabilities, and (2) multimodal ANN research primarily focuses on high-level model outputs, neglecting the crucial role of individual neurons. To address these limitations, we propose a novel neuron-level analysis framework that investigates the multimodal information processing mechanisms in vision-language models (VLMs) through the lens of human brain activity. Our approach uniquely combines fine-grained artificial neuron (AN) analysis with fMRI-based voxel encoding to examine two architecturally distinct VLMs: CLIP and METER. Our analysis reveals four key findings: (1) ANs successfully predict biological neurons (BNs) activities across multiple functional networks (including language, vision, attention, and default mode), demonstrating shared representational mechanisms; (2) Both ANs and BNs demonstrate functional redundancy through overlapping neural representations, mirroring the brain's fault-tolerant and collaborative information processing mechanisms; (3) ANs exhibit polarity patterns that parallel the BNs, with oppositely activated BNs showing mirrored activation trends across VLM layers, reflecting the complexity and bidirectional nature of neural information processing; (4) The architectures of CLIP and METER drive distinct BNs: CLIP's independent branches show modality-specific specialization, whereas METER's cross-modal design yields unified cross-modal activation, highlighting the architecture's influence on ANN brain-like properties. These results provide compelling evidence for brain-like hierarchical processing in VLMs at the neuronal level.

Language models align with brain regions that represent concepts across modalities

Computation and Language

Computers understand ideas, not just words.

15 Aug 2025 1

88%

fMRI-LM: Towards a Universal Foundation Model for Language-Aligned fMRI Understanding

Computation and Language

Reads thoughts from brain scans using language.

24 Nov 2025 0

88%

EEG-VLM: A Hierarchical Vision-Language Model with Multi-Level Feature Alignment and Visually Enhanced Language-Guided Reasoning for EEG Image-Based Sleep Stage Prediction

Artificial Intelligence

Helps doctors understand sleep by looking at brain waves.

24 Nov 2025 1

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

14 pages

Uncovering Brain-Like Hierarchical Patterns in Vision-Language Models through fMRI-Based Neural Encoding

Makes AI understand pictures and words like a brain.

Technical Abstract

Language models align with brain regions that represent concepts across modalities

fMRI-LM: Towards a Universal Foundation Model for Language-Aligned fMRI Understanding

EEG-VLM: A Hierarchical Vision-Language Model with Multi-Level Feature Alignment and Visually Enhanced Language-Guided Reasoning for EEG Image-Based Sleep Stage Prediction