Combining facial videos and biosignals for stress estimation during driving
By: Paraskevi Valergaki , Vassilis C. Nicodemou , Iason Oikonomidis and more
Potential Business Impact:
Detects stress from faces better than before.
Reliable stress recognition from facial videos is challenging due to stress's subjective nature and voluntary facial control. While most methods rely on Facial Action Units, the role of disentangled 3D facial geometry remains underexplored. We address this by analyzing stress during distracted driving using EMOCA-derived 3D expression and pose coefficients. Paired hypothesis tests between baseline and stressor phases reveal that 41 of 56 coefficients show consistent, phase-specific stress responses comparable to physiological markers. Building on this, we propose a Transformer-based temporal modeling framework and assess unimodal, early-fusion, and cross-modal attention strategies. Cross-Modal Attention fusion of EMOCA and physiological signals achieves best performance (AUROC 92\%, Accuracy 86.7\%), with EMOCA-gaze fusion also competitive (AUROC 91.8\%). This highlights the effectiveness of temporal modeling and cross-modal attention for stress recognition.
Similar Papers
Modelling Emotions in Face-to-Face Setting: The Interplay of Eye-Tracking, Personality, and Temporal Dynamics
Human-Computer Interaction
Helps computers understand how you feel.
Realtime Multimodal Emotion Estimation using Behavioral and Neurophysiological Data
Human-Computer Interaction
Helps people understand feelings by reading body signals.
Stress Detection from Multimodal Wearable Sensor Data
Human-Computer Interaction
Helps computers detect stress from body signals.