An Audio-Visual Fusion Emotion Generation Model Based on Neuroanatomical Alignment
By: Haidong Wang , Qia Shan , JianHua Zhang and more
Potential Business Impact:
Makes computers understand feelings better from sights and sounds.
In the field of affective computing, traditional methods for generating emotions predominantly rely on deep learning techniques and large-scale emotion datasets. However, deep learning techniques are often complex and difficult to interpret, and standardizing large-scale emotional datasets are difficult and costly to establish. To tackle these challenges, we introduce a novel framework named Audio-Visual Fusion for Brain-like Emotion Learning(AVF-BEL). In contrast to conventional brain-inspired emotion learning methods, this approach improves the audio-visual emotion fusion and generation model through the integration of modular components, thereby enabling more lightweight and interpretable emotion learning and generation processes. The framework simulates the integration of the visual, auditory, and emotional pathways of the brain, optimizes the fusion of emotional features across visual and auditory modalities, and improves upon the traditional Brain Emotional Learning (BEL) model. The experimental results indicate a significant improvement in the similarity of the audio-visual fusion emotion learning generation model compared to single-modality visual and auditory emotion learning and generation model. Ultimately, this aligns with the fundamental phenomenon of heightened emotion generation facilitated by the integrated impact of visual and auditory stimuli. This contribution not only enhances the interpretability and efficiency of affective intelligence but also provides new insights and pathways for advancing affective computing technology. Our source code can be accessed here: https://github.com/OpenHUTB/emotion}{https://github.com/OpenHUTB/emotion.
Similar Papers
Interactive Multimodal Fusion with Temporal Modeling
CV and Pattern Recognition
Lets computers guess your feelings from faces and voices.
Seeing is Believing: Emotion-Aware Audio-Visual Language Modeling for Expressive Speech Generation
Computation and Language
Makes computer voices sound more real.
Seeing is Believing: Emotion-Aware Audio-Visual Language Modeling for Expressive Speech Generation
Computation and Language
Makes computers talk with real-life facial expressions.