Score: 1

Continual Text-to-Video Retrieval with Frame Fusion and Task-Aware Routing

Published: March 13, 2025 | arXiv ID: 2503.10111v2

By: Zecheng Zhao , Zhi Chen , Zi Huang and more

Potential Business Impact:

Finds videos matching new words without forgetting old ones.

Business Areas:
Video Streaming Content and Publishing, Media and Entertainment, Video

Text-to-Video Retrieval (TVR) aims to retrieve relevant videos based on textual queries. However, as video content evolves continuously, adapting TVR systems to new data remains a critical yet under-explored challenge. In this paper, we introduce the first benchmark for Continual Text-to-Video Retrieval (CTVR) to address the limitations of existing approaches. Current Pre-Trained Model (PTM)-based TVR methods struggle with maintaining model plasticity when adapting to new tasks, while existing Continual Learning (CL) methods suffer from catastrophic forgetting, leading to semantic misalignment between historical queries and stored video features. To address these two challenges, we propose FrameFusionMoE, a novel CTVR framework that comprises two key components: (1) the Frame Fusion Adapter (FFA), which captures temporal video dynamics while preserving model plasticity, and (2) the Task-Aware Mixture-of-Experts (TAME), which ensures consistent semantic alignment between queries across tasks and the stored video features. Thus, FrameFusionMoE enables effective adaptation to new video content while preserving historical text-video relevance to mitigate catastrophic forgetting. We comprehensively evaluate FrameFusionMoE on two benchmark datasets under various task settings. Results demonstrate that FrameFusionMoE outperforms existing CL and TVR methods, achieving superior retrieval performance with minimal degradation on earlier tasks when handling continuous video streams. Our code is available at: https://github.com/JasonCodeMaker/CTVR.

Country of Origin
🇦🇺 Australia

Repos / Data Links

Page Count
11 pages

Category
Computer Science:
CV and Pattern Recognition