Score: 0

Beyond saliency: enhancing explanation of speech emotion recognition with expert-referenced acoustic cues

Published: November 12, 2025 | arXiv ID: 2511.11691v1

By: Seham Nasr, Zhao Ren, David Johnson

Potential Business Impact:

Shows why computers understand emotions in voices.

Business Areas:

Semantic Search Internet Services

Explainable AI (XAI) for Speech Emotion Recognition (SER) is critical for building transparent, trustworthy models. Current saliency-based methods, adapted from vision, highlight spectrogram regions but fail to show whether these regions correspond to meaningful acoustic markers of emotion, limiting faithfulness and interpretability. We propose a framework that overcomes these limitations by quantifying the magnitudes of cues within salient regions. This clarifies "what" is highlighted and connects it to "why" it matters, linking saliency to expert-referenced acoustic cues of speech emotions. Experiments on benchmark SER datasets show that our approach improves explanation quality by explicitly linking salient regions to theory-driven speech emotions expert-referenced acoustics. Compared to standard saliency methods, it provides more understandable and plausible explanations of SER models, offering a foundational step towards trustworthy speech-based affective computing.

Semantic Differentiation in Speech Emotion Recognition: Insights from Descriptive and Expressive Speech Roles

Computation and Language

Helps computers understand your feelings in speech.

3 Oct 2025 1

89%

Amplifying Emotional Signals: Data-Efficient Deep Learning for Robust Speech Emotion Recognition

Audio and Speech Processing

Helps computers understand your feelings from your voice.

26 Aug 2025 1

88%

Cross-Corpus Validation of Speech Emotion Recognition in Urdu using Domain-Knowledge Acoustic Features

Sound

Helps computers understand emotions in Urdu speech.

28 Oct 2025 0

View PDF Login to Bookmark

Page Count

5 pages

Beyond saliency: enhancing explanation of speech emotion recognition with expert-referenced acoustic cues

Shows why computers understand emotions in voices.

Technical Abstract

Semantic Differentiation in Speech Emotion Recognition: Insights from Descriptive and Expressive Speech Roles

Amplifying Emotional Signals: Data-Efficient Deep Learning for Robust Speech Emotion Recognition

Cross-Corpus Validation of Speech Emotion Recognition in Urdu using Domain-Knowledge Acoustic Features