Advancing Cache-Based Few-Shot Classification via Patch-Driven Relational Gated Graph Attention
By: Tasweer Ahmad , Arindam Sikdar , Sandip Pradhan and more
Potential Business Impact:
Helps computers learn new things from few examples.
Few-shot image classification remains difficult under limited supervision and visual domain shift. Recent cache-based adaptation approaches (e.g., Tip-Adapter) address this challenge to some extent by learning lightweight residual adapters over frozen features, yet they still inherit CLIP's tendency to encode global, general-purpose representations that are not optimally discriminative to adapt the generalist to the specialist's domain in low-data regimes. We address this limitation with a novel patch-driven relational refinement that learns cache adapter weights from intra-image patch dependencies rather than treating an image embedding as a monolithic vector. Specifically, we introduce a relational gated graph attention network that constructs a patch graph and performs edge-aware attention to emphasize informative inter-patch interactions, producing context-enriched patch embeddings. A learnable multi-aggregation pooling then composes these into compact, task-discriminative representations that better align cache keys with the target few-shot classes. Crucially, the proposed graph refinement is used only during training to distil relational structure into the cache, incurring no additional inference cost beyond standard cache lookup. Final predictions are obtained by a residual fusion of cache similarity scores with CLIP zero-shot logits. Extensive evaluations on 11 benchmarks show consistent gains over state-of-the-art CLIP adapter and cache-based baselines while preserving zero-shot efficiency. We further validate battlefield relevance by introducing an Injured vs. Uninjured Soldier dataset for casualty recognition. It is motivated by the operational need to support triage decisions within the "platinum minutes" and the broader "golden hour" window in time-critical UAV-driven search-and-rescue and combat casualty care.
Similar Papers
Attn-Adapter: Attention Is All You Need for Online Few-shot Learner of Vision-Language Model
CV and Pattern Recognition
Teaches computers to learn from few pictures.
Adapting Multimodal Foundation Models for Few-Shot Learning: A Comprehensive Study on Contrastive Captioners
CV and Pattern Recognition
Helps AI learn from very few pictures.
Supervised Contrastive Learning for Few-Shot AI-Generated Image Detection and Attribution
CV and Pattern Recognition
Finds fake pictures made by AI.