Fine-grained Analysis of Brain-LLM Alignment through Input Attribution
By: Michela Proietti, Roberto Capobianco, Mariya Toneva
Potential Business Impact:
Shows how computers understand words like people.
Understanding the alignment between large language models (LLMs) and human brain activity can reveal computational principles underlying language processing. We introduce a fine-grained input attribution method to identify the specific words most important for brain-LLM alignment, and leverage it to study a contentious research question about brain-LLM alignment: the relationship between brain alignment (BA) and next-word prediction (NWP). Our findings reveal that BA and NWP rely on largely distinct word subsets: NWP exhibits recency and primacy biases with a focus on syntax, while BA prioritizes semantic and discourse-level information with a more targeted recency effect. This work advances our understanding of how LLMs relate to human language processing and highlights differences in feature reliance between BA and NWP. Beyond this study, our attribution method can be broadly applied to explore the cognitive relevance of model predictions in diverse language processing tasks.
Similar Papers
Scaling and context steer LLMs along the same computational path as the human brain
Machine Learning (CS)
Brain and AI process information in a similar order.
Language models align with brain regions that represent concepts across modalities
Computation and Language
Computers understand ideas, not just words.
Exploring Similarity between Neural and LLM Trajectories in Language Processing
Human-Computer Interaction
Shows how computers "think" like brains.