Score: 0

EfficientFSL: Enhancing Few-Shot Classification via Query-Only Tuning in Vision Transformers

Published: January 13, 2026 | arXiv ID: 2601.08499v1

By: Wenwen Liao, Hang Ruan

Large models such as Vision Transformers (ViTs) have demonstrated remarkable superiority over smaller architectures like ResNet in few-shot classification, owing to their powerful representational capacity. However, fine-tuning such large models demands extensive GPU memory and prolonged training time, making them impractical for many real-world low-resource scenarios. To bridge this gap, we propose EfficientFSL, a query-only fine-tuning framework tailored specifically for few-shot classification with ViT, which achieves competitive performance while significantly reducing computational overhead. EfficientFSL fully leverages the knowledge embedded in the pre-trained model and its strong comprehension ability, achieving high classification accuracy with an extremely small number of tunable parameters. Specifically, we introduce a lightweight trainable Forward Block to synthesize task-specific queries that extract informative features from the intermediate representations of the pre-trained model in a query-only manner. We further propose a Combine Block to fuse multi-layer outputs, enhancing the depth and robustness of feature representations. Finally, a Support-Query Attention Block mitigates distribution shift by adjusting prototypes to align with the query set distribution. With minimal trainable parameters, EfficientFSL achieves state-of-the-art performance on four in-domain few-shot datasets and six cross-domain datasets, demonstrating its effectiveness in real-world applications.

Adaptive Additive Parameter Updates of Vision Transformers for Few-Shot Continual Learning

CV and Pattern Recognition

Teaches AI new things without forgetting old ones.

11 Apr 2025 0

89%

CascadedViT: Cascaded Chunk-FeedForward and Cascaded Group Attention Vision Transformer

CV and Pattern Recognition

Makes AI see better using less power.

18 Nov 2025 1

88%

Rethinking Vision Transformer for Large-Scale Fine-Grained Image Retrieval

Multimedia

Finds exact picture matches faster and better.

23 Apr 2025 1

View PDF Login to Bookmark

EfficientFSL: Enhancing Few-Shot Classification via Query-Only Tuning in Vision Transformers

Technical Abstract

Adaptive Additive Parameter Updates of Vision Transformers for Few-Shot Continual Learning

CascadedViT: Cascaded Chunk-FeedForward and Cascaded Group Attention Vision Transformer

Rethinking Vision Transformer for Large-Scale Fine-Grained Image Retrieval