Speech Emotion Recognition with Phonation Excitation Information and Articulatory Kinematics
By: Ziqian Zhang, Min Huang, Zhongzhe Xiao
Potential Business Impact:
Reads emotions from how someone talks and moves their mouth.
Speech emotion recognition (SER) has advanced significantly for the sake of deep-learning methods, while textual information further enhances its performance. However, few studies have focused on the physiological information during speech production, which also encompasses speaker traits, including emotional states. To bridge this gap, we conducted a series of experiments to investigate the potential of the phonation excitation information and articulatory kinematics for SER. Due to the scarcity of training data for this purpose, we introduce a portrayed emotional dataset, STEM-E2VA, which includes audio and physiological data such as electroglottography (EGG) and electromagnetic articulography (EMA). EGG and EMA provide information of phonation excitation and articulatory kinematics, respectively. Additionally, we performed emotion recognition using estimated physiological data derived through inversion methods from speech, instead of collected EGG and EMA, to explore the feasibility of applying such physiological information in real-world SER. Experimental results confirm the effectiveness of incorporating physiological information about speech production for SER and demonstrate its potential for practical use in real-world scenarios.
Similar Papers
Semantic Differentiation in Speech Emotion Recognition: Insights from Descriptive and Expressive Speech Roles
Computation and Language
Helps computers understand your feelings in speech.
Human Feedback Driven Dynamic Speech Emotion Recognition
Sound
Makes cartoon characters show real feelings.
Beyond saliency: enhancing explanation of speech emotion recognition with expert-referenced acoustic cues
Machine Learning (CS)
Shows why computers understand emotions in voices.