EmoHRNet: High-Resolution Neural Network Based Speech Emotion Recognition
By: Akshay Muppidi, Martin Radfar
Potential Business Impact:
Helps computers understand how you feel from your voice.
Speech emotion recognition (SER) is pivotal for enhancing human-machine interactions. This paper introduces "EmoHRNet", a novel adaptation of High-Resolution Networks (HRNet) tailored for SER. The HRNet structure is designed to maintain high-resolution representations from the initial to the final layers. By transforming audio samples into spectrograms, EmoHRNet leverages the HRNet architecture to extract high-level features. EmoHRNet's unique architecture maintains high-resolution representations throughout, capturing both granular and overarching emotional cues from speech signals. The model outperforms leading models, achieving accuracies of 92.45% on RAVDESS, 80.06% on IEMOCAP, and 92.77% on EMOVO. Thus, we show that EmoHRNet sets a new benchmark in the SER domain.
Similar Papers
DeepEmoNet: Building Machine Learning Models for Automatic Emotion Recognition in Human Speeches
Audio and Speech Processing
Helps computers understand feelings in voices.
EmoNet-Voice: A Fine-Grained, Expert-Verified Benchmark for Speech Emotion Detection
Computation and Language
Helps computers understand feelings in voices.
EmoAugNet: A Signal-Augmented Hybrid CNN-LSTM Framework for Speech Emotion Recognition
Sound
Helps computers understand how you feel when you talk.