DyFuLM: An Advanced Multimodal Framework for Sentiment Analysis
By: Ruohan Zhou , Jiachen Yuan , Churui Yang and more
Potential Business Impact:
Helps computers understand feelings in writing better.
Understanding sentiment in complex textual expressions remains a fundamental challenge in affective computing. To address this, we propose a Dynamic Fusion Learning Model (DyFuLM), a multimodal framework designed to capture both hierarchical semantic representations and fine-grained emotional nuances. DyFuLM introduces two key moodules: a Hierarchical Dynamic Fusion module that adaptively integrates multi-level features, and a Gated Feature Aggregation module that regulates cross-layer information ffow to achieve balanced representation learning. Comprehensive experiments on multi-task sentiment datasets demonstrate that DyFuLM achieves 82.64% coarse-grained and 68.48% fine-grained accuracy, yielding the lowest regression errors (MAE = 0.0674, MSE = 0.0082) and the highest R^2 coefficient of determination (R^2= 0.6903). Furthermore, the ablation study validates the effectiveness of each module in DyFuLM. When all modules are removed, the accuracy drops by 0.91% for coarse-grained and 0.68% for fine-grained tasks. Keeping only the gated fusion module causes decreases of 0.75% and 0.55%, while removing the dynamic loss mechanism results in drops of 0.78% and 0.26% for coarse-grained and fine-grained sentiment classification, respectively. These results demonstrate that each module contributes significantly to feature interaction and task balance. Overall, the experimental findings further validate that DyFuLM enhances sentiment representation and overall performance through effective hierarchical feature fusion.
Similar Papers
FINE: Factorized multimodal sentiment analysis via mutual INformation Estimation
Multimedia
Helps computers understand feelings from text and pictures.
Rethinking Multimodal Sentiment Analysis: A High-Accuracy, Simplified Fusion Architecture
Computation and Language
Helps computers understand feelings from talking, seeing, and hearing.
Robust Multimodal Sentiment Analysis with Distribution-Based Feature Recovery and Fusion
Computation and Language
Helps computers understand feelings from broken pictures and words.