Score: 0

IVCR-200K: A Large-Scale Multi-turn Dialogue Benchmark for Interactive Video Corpus Retrieval

Published: December 1, 2025 | arXiv ID: 2512.01312v1

By: Ning Han , Yawen Zeng , Shaohua Long and more

Potential Business Impact:

Helps you find videos by talking to the computer.

Business Areas:

Image Recognition Data and Analytics, Software

In recent years, significant developments have been made in both video retrieval and video moment retrieval tasks, which respectively retrieve complete videos or moments for a given text query. These advancements have greatly improved user satisfaction during the search process. However, previous work has failed to establish meaningful "interaction" between the retrieval system and the user, and its one-way retrieval paradigm can no longer fully meet the personalization and dynamic needs of at least 80.8\% of users. In this paper, we introduce the Interactive Video Corpus Retrieval (IVCR) task, a more realistic setting that enables multi-turn, conversational, and realistic interactions between the user and the retrieval system. To facilitate research on this challenging task, we introduce IVCR-200K, a high-quality, bilingual, multi-turn, conversational, and abstract semantic dataset that supports video retrieval and even moment retrieval. Furthermore, we propose a comprehensive framework based on multi-modal large language models (MLLMs) to help users interact in several modes with more explainable solutions. The extensive experiments demonstrate the effectiveness of our dataset and framework.

Hierarchical Indexing with Knowledge Enrichment for Multilingual Video Corpus Retrieval

Computation and Language

Finds right medical videos in any language.

10 Oct 2025 2

88%

VisR-Bench: An Empirical Study on Visual Retrieval-Augmented Generation for Multilingual Long Document Understanding

CV and Pattern Recognition

Helps computers find answers in any language document.

10 Aug 2025 1

87%

MMCR: Advancing Visual Language Model in Multimodal Multi-Turn Contextual Reasoning

Artificial Intelligence

Helps AI understand conversations with many pictures.

24 Mar 2025 0

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

11 pages

IVCR-200K: A Large-Scale Multi-turn Dialogue Benchmark for Interactive Video Corpus Retrieval

Helps you find videos by talking to the computer.

Technical Abstract

Hierarchical Indexing with Knowledge Enrichment for Multilingual Video Corpus Retrieval

VisR-Bench: An Empirical Study on Visual Retrieval-Augmented Generation for Multilingual Long Document Understanding

MMCR: Advancing Visual Language Model in Multimodal Multi-Turn Contextual Reasoning