Score: 1

GA2-CLIP: Generic Attribute Anchor for Efficient Prompt Tuningin Video-Language Models

Published: November 27, 2025 | arXiv ID: 2511.22125v1

By: Bin Wang , Ruotong Hu , Wenqian Wang and more

Potential Business Impact:

Helps AI remember old lessons when learning new ones.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Visual and textual soft prompt tuning can effectively improve the adaptability of Vision-Language Models (VLMs) in downstream tasks. However, fine-tuning on video tasks impairs the model's generalization ability to unseen classes. Existing methods attempt to mitigate this forgetting effect by regularizing the gap between hand-crafted prompts and soft prompts, but this also weakens the learning ability of soft prompts. To address this challenge, we propose a plug-and-play coupling prompt learning framework to optimize the generalization performance of V-L models in video tasks, with the core motivation of mitigating semantic space narrowing during fine-tuning by introducing an externally supervised prompt. Specifically, for textual prompts, we introduce pre-trained prompts from other datasets as hard prompt tokens. These are concatenated with soft prompt tokens and coupled via a learnable mapping layer. This competitive prompting approach prevents the semantic space from overfitting to supervised categories. In addition, we introduce a set of well-designed irrelevant video sets and negative prompts as generic attribute anchors to maintain the generic relevance of the attributes in the pre-trained semantic space, thus preserving the generalization ability. Experiments on video tasks demonstrate that our method significantly outperforms state-of-the-art prompt tuning approaches across generalization benchmarks, particularly on base-to-new class prediction.

Continual Learning on CLIP via Incremental Prompt Tuning with Intrinsic Textual Anchors

CV and Pattern Recognition

Teaches computers to learn new things without forgetting.

27 May 2025 1

90%

Constrained Prompt Enhancement for Improving Zero-Shot Generalization of Vision-Language Models

CV and Pattern Recognition

Helps computers understand pictures and words better.

24 Aug 2025 1

90%

TA-Prompting: Enhancing Video Large Language Models for Dense Video Captioning via Temporal Anchors

CV and Pattern Recognition

Helps videos tell stories with exact moments.

6 Jan 2026 3

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Repos / Data Links

github.com

Page Count

14 pages

GA2-CLIP: Generic Attribute Anchor for Efficient Prompt Tuningin Video-Language Models

Helps AI remember old lessons when learning new ones.

Technical Abstract

Continual Learning on CLIP via Incremental Prompt Tuning with Intrinsic Textual Anchors

Constrained Prompt Enhancement for Improving Zero-Shot Generalization of Vision-Language Models

TA-Prompting: Enhancing Video Large Language Models for Dense Video Captioning via Temporal Anchors