Score: 1

R-TPT: Improving Adversarial Robustness of Vision-Language Models through Test-Time Prompt Tuning

Published: April 15, 2025 | arXiv ID: 2504.11195v2

By: Lijun Sheng , Jian Liang , Zilei Wang and more

Potential Business Impact:

Protects AI from tricky, fake pictures.

Business Areas:

A/B Testing Data and Analytics

Vision-language models (VLMs), such as CLIP, have gained significant popularity as foundation models, with numerous fine-tuning methods developed to enhance performance on downstream tasks. However, due to their inherent vulnerability and the common practice of selecting from a limited set of open-source models, VLMs suffer from a higher risk of adversarial attacks than traditional vision models. Existing defense techniques typically rely on adversarial fine-tuning during training, which requires labeled data and lacks of flexibility for downstream tasks. To address these limitations, we propose robust test-time prompt tuning (R-TPT), which mitigates the impact of adversarial attacks during the inference stage. We first reformulate the classic marginal entropy objective by eliminating the term that introduces conflicts under adversarial conditions, retaining only the pointwise entropy minimization. Furthermore, we introduce a plug-and-play reliability-based weighted ensembling strategy, which aggregates useful information from reliable augmented views to strengthen the defense. R-TPT enhances defense against adversarial attacks without requiring labeled training data while offering high flexibility for inference tasks. Extensive experiments on widely used benchmarks with various attacks demonstrate the effectiveness of R-TPT. The code is available in https://github.com/TomSheng21/R-TPT.

MetaTPT: Meta Test-time Prompt Tuning for Vision-Language Models

CV and Pattern Recognition

Helps AI understand new pictures better.

13 Dec 2025 1

92%

NAP-Tuning: Neural Augmented Prompt Tuning for Adversarially Robust Vision-Language Models

CV and Pattern Recognition

Makes AI understand pictures and words better, safely.

15 Jun 2025 0

91%

O-TPT: Orthogonality Constraints for Calibrating Test-time Prompt Tuning in Vision-Language Models

CV and Pattern Recognition

Makes AI image guesses more trustworthy and accurate.

15 Mar 2025 1

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Repos / Data Links

github.com

Page Count

12 pages

R-TPT: Improving Adversarial Robustness of Vision-Language Models through Test-Time Prompt Tuning

Protects AI from tricky, fake pictures.

Technical Abstract

MetaTPT: Meta Test-time Prompt Tuning for Vision-Language Models

NAP-Tuning: Neural Augmented Prompt Tuning for Adversarially Robust Vision-Language Models

O-TPT: Orthogonality Constraints for Calibrating Test-time Prompt Tuning in Vision-Language Models