Discovering and Causally Validating Emotion-Sensitive Neurons in Large Audio-Language Models
By: Xiutian Zhao, Björn Schuller, Berrak Sisman
Potential Business Impact:
Finds brain cells in AI that understand feelings.
Emotion is a central dimension of spoken communication, yet, we still lack a mechanistic account of how modern large audio-language models (LALMs) encode it internally. We present the first neuron-level interpretability study of emotion-sensitive neurons (ESNs) in LALMs and provide causal evidence that such units exist in Qwen2.5-Omni, Kimi-Audio, and Audio Flamingo 3. Across these three widely used open-source models, we compare frequency-, entropy-, magnitude-, and contrast-based neuron selectors on multiple emotion recognition benchmarks. Using inference-time interventions, we reveal a consistent emotion-specific signature: ablating neurons selected for a given emotion disproportionately degrades recognition of that emotion while largely preserving other classes, whereas gain-based amplification steers predictions toward the target emotion. These effects arise with modest identification data and scale systematically with intervention strength. We further observe that ESNs exhibit non-uniform layer-wise clustering with partial cross-dataset transfer. Taken together, our results offer a causal, neuron-level account of emotion decisions in LALMs and highlight targeted neuron interventions as an actionable handle for controllable affective behaviors.
Similar Papers
Decoding Neural Emotion Patterns through Large Language Model Embeddings
Computation and Language
Maps words to brain parts showing feelings.
EASL: Multi-Emotion Guided Semantic Disentanglement for Expressive Sign Language Generation
CV and Pattern Recognition
Makes sign language videos show feelings.
Amplifying Emotional Signals: Data-Efficient Deep Learning for Robust Speech Emotion Recognition
Audio and Speech Processing
Helps computers understand your feelings from your voice.