Multiple Stochastic Prompt Tuning for Few-shot Adaptation under Extreme Domain Shift
By: Debarshi Brahma, Soma Biswas
Potential Business Impact:
Helps AI understand new things with few examples.
Foundation Vision-Language Models (VLMs) like CLIP exhibit strong generalization capabilities due to large-scale pretraining on diverse image-text pairs. However, their performance often degrades when applied to target datasets with significant distribution shifts in both visual appearance and class semantics. Recent few-shot learning approaches adapt CLIP to downstream tasks using limited labeled data via adapter or prompt tuning, but are not specifically designed to handle such extreme domain shifts. Conversely, some works addressing cross-domain few-shot learning consider such domain-shifted scenarios but operate in an episodic setting with only a few classes per episode, limiting their applicability to real-world deployment, where all classes must be handled simultaneously. To address this gap, we propose a novel framework, MIST (Multiple Stochastic Prompt Tuning), for efficiently adapting CLIP to datasets with extreme distribution shifts using only a few labeled examples, in scenarios involving all classes at once. Specifically, we introduce multiple learnable prompts per class to effectively capture diverse modes in visual representations arising from distribution shifts. To further enhance generalization, these prompts are modeled as learnable Gaussian distributions, enabling efficient exploration of the prompt parameter space and reducing overfitting caused by limited supervision. Extensive experiments and comparisons with state-of-the-art methods demonstrate the effectiveness of the proposed framework.
Similar Papers
VaMP: Variational Multi-Modal Prompt Learning for Vision-Language Models
CV and Pattern Recognition
Teaches computers to understand pictures and words better.
Generalizable Prompt Learning of CLIP: A Brief Overview
CV and Pattern Recognition
Teaches computers to understand pictures and words.
Vision-aware Multimodal Prompt Tuning for Uploadable Multi-source Few-shot Domain Adaptation
CV and Pattern Recognition
Lets computers learn from less data, faster.