ANPrompt: Anti-noise Prompt Tuning for Vision-Language Models
By: Yansheng Gao , Yufei Zheng , Jinghan Qu and more
Potential Business Impact:
Makes AI models better at understanding images and text.
Prompt tuning has emerged as an efficient and effective technique for adapting vision-language models (VLMs) with low computational overhead. However, existing methods often overlook the vulnerability of prompt-tuned VLMs to weak semantic perturbations-such as subtle image or text noise-that degrade their generalization to unseen classes. To address this limitation, we propose ANPrompt, a novel prompt tuning framework designed to enhance robustness under such perturbations. ANPrompt first constructs weak noise text features by fusing original and noise-perturbed text embeddings, which are then clustered to form noise prompts. These noise prompts are integrated with learnable prompt tokens to generate anti-noise prompts, which are injected into the deeper layers of both image and text encoders. To further capture the noise-aware visual semantics, ANPrompt computes the Noise-Resistant Visual Prompt Prototype (NRVPP) by averaging the output prompt tokens from the vision encoder. Finally, ANPrompt introduces alignment, robustness, and anti-noise objectives by computing a Weak semantic noise Alignment Loss (WALoss) alongside the standard cross-entropy and sim loss. Experiments across 11 benchmarks demonstrate that ANPrompt consistently outperforms existing prompt tuning approaches, achieving superior robustness to semantic noise and improved generalization to novel categories.
Similar Papers
ANPrompt: Anti-noise Prompt Tuning for Vision-Language Models
CV and Pattern Recognition
Makes AI understand pictures better, even with noise.
NAP-Tuning: Neural Augmented Prompt Tuning for Adversarially Robust Vision-Language Models
CV and Pattern Recognition
Makes AI understand pictures and words better, safely.
Beyond Human-prompting: Adaptive Prompt Tuning with Semantic Alignment for Anomaly Detection
CV and Pattern Recognition
Finds weird things in pictures automatically.