Score: 1

An Audio-Visual Fusion Emotion Generation Model Based on Neuroanatomical Alignment

Published: February 21, 2025 | arXiv ID: 2503.16454v1

By: Haidong Wang , Qia Shan , JianHua Zhang and more

Potential Business Impact:

Makes computers understand feelings better from sights and sounds.

Business Areas:

Virtual World Community and Lifestyle, Media and Entertainment, Software

In the field of affective computing, traditional methods for generating emotions predominantly rely on deep learning techniques and large-scale emotion datasets. However, deep learning techniques are often complex and difficult to interpret, and standardizing large-scale emotional datasets are difficult and costly to establish. To tackle these challenges, we introduce a novel framework named Audio-Visual Fusion for Brain-like Emotion Learning(AVF-BEL). In contrast to conventional brain-inspired emotion learning methods, this approach improves the audio-visual emotion fusion and generation model through the integration of modular components, thereby enabling more lightweight and interpretable emotion learning and generation processes. The framework simulates the integration of the visual, auditory, and emotional pathways of the brain, optimizes the fusion of emotional features across visual and auditory modalities, and improves upon the traditional Brain Emotional Learning (BEL) model. The experimental results indicate a significant improvement in the similarity of the audio-visual fusion emotion learning generation model compared to single-modality visual and auditory emotion learning and generation model. Ultimately, this aligns with the fundamental phenomenon of heightened emotion generation facilitated by the integrated impact of visual and auditory stimuli. This contribution not only enhances the interpretability and efficiency of affective intelligence but also provides new insights and pathways for advancing affective computing technology. Our source code can be accessed here: https://github.com/OpenHUTB/emotion}{https://github.com/OpenHUTB/emotion.

Interactive Multimodal Fusion with Temporal Modeling

CV and Pattern Recognition

Lets computers guess your feelings from faces and voices.

13 Mar 2025 0

90%

Seeing is Believing: Emotion-Aware Audio-Visual Language Modeling for Expressive Speech Generation

Computation and Language

Makes computer voices sound more real.

22 Aug 2025 1

90%

Seeing is Believing: Emotion-Aware Audio-Visual Language Modeling for Expressive Speech Generation

Computation and Language

Makes computers talk with real-life facial expressions.

22 Aug 2025 1

View PDF Login to Bookmark

Repos / Data Links

github.com github.com github.com

Page Count

15 pages

An Audio-Visual Fusion Emotion Generation Model Based on Neuroanatomical Alignment

Makes computers understand feelings better from sights and sounds.

Technical Abstract

Interactive Multimodal Fusion with Temporal Modeling

Seeing is Believing: Emotion-Aware Audio-Visual Language Modeling for Expressive Speech Generation

Seeing is Believing: Emotion-Aware Audio-Visual Language Modeling for Expressive Speech Generation