Score: 0

DaQ-MSA: Denoising and Qualifying Diffusion Augmentations for Multimodal Sentiment Analysis

Published: January 11, 2026 | arXiv ID: 2601.06870v1

By: Jiazhang Liang , Jianheng Dai , Miaosen Luo and more

Potential Business Impact:

Makes AI understand feelings in videos better.

Business Areas:
Text Analytics Data and Analytics, Software

Multimodal large language models (MLLMs) have demonstrated strong performance on vision-language tasks, yet their effectiveness on multimodal sentiment analysis remains constrained by the scarcity of high-quality training data, which limits accurate multimodal understanding and generalization. To alleviate this bottleneck, we leverage diffusion models to perform semantics-preserving augmentation on the video and audio modalities, expanding the multimodal training distribution. However, increasing data quantity alone is insufficient, as diffusion-generated samples exhibit substantial quality variation and noisy augmentations may degrade performance. We therefore propose DaQ-MSA (Denoising and Qualifying Diffusion Augmentations for Multimodal Sentiment Analysis), which introduces a quality scoring module to evaluate the reliability of augmented samples and assign adaptive training weights. By down-weighting low-quality samples and emphasizing high-fidelity ones, DaQ-MSA enables more stable learning. By integrating the generative capability of diffusion models with the semantic understanding of MLLMs, our approach provides a robust and generalizable automated augmentation strategy for training MLLMs without any human annotation or additional supervision.

Country of Origin
🇨🇳 China

Page Count
11 pages

Category
Computer Science:
Machine Learning (CS)