FANoise: Singular Value-Adaptive Noise Modulation for Robust Multimodal Representation Learning
By: Jiaoyang Li , Jun Fang , Tianhao Gao and more
Potential Business Impact:
Makes AI understand pictures and words better.
Representation learning is fundamental to modern machine learning, powering applications such as text retrieval and multimodal understanding. However, learning robust and generalizable representations remains challenging. While prior work has demonstrated that active noise injection, a form of data augmentation, can enhance encoding performance, most existing methods rely on heuristic or static noise, overlooking the dynamic nature of feature distributions during training. In this work, we systematically study the role of noise in representation learning from both gradient-based and feature distribution perspectives, using InfoNCE loss as a representative example. Focusing on multimodal representation learning, we propose FANoise, a novel feature-adaptive noise injection strategy. By leveraging the dynamics of contrastive learning, FANoise effectively mitigates the negative impacts of noise while preserving its benefits. Under this theoretically grounded framework, comprehensive experiments demonstrate that FANoise consistently improves overall performance on multimodal tasks across various base VLM models.
Similar Papers
Explore How to Inject Beneficial Noise in MLLMs
CV and Pattern Recognition
Makes AI better at understanding pictures and words together.
Noise Augmented Fine Tuning for Mitigating Hallucinations in Large Language Models
Computation and Language
Makes AI less likely to make up fake answers.
DLRREC: Denoising Latent Representations via Multi-Modal Knowledge Fusion in Deep Recommender Systems
Information Retrieval
Makes movie suggestions much smarter.