ReinPool: Reinforcement Learning Pooling Multi-Vector Embeddings for Retrieval System
By: Sungguk Cha , DongWook Kim , Mintae Kim and more
Potential Business Impact:
Makes searching for information much faster and smaller.
Multi-vector embedding models have emerged as a powerful paradigm for document retrieval, preserving fine-grained visual and textual details through token-level representations. However, this expressiveness comes at a staggering cost: storing embeddings for every token inflates index sizes by over $1000\times$ compared to single-vector approaches, severely limiting scalability. We introduce \textbf{ReinPool}, a reinforcement learning framework that learns to dynamically filter and pool multi-vector embeddings into compact, retrieval-optimized representations. By training with an inverse retrieval objective and NDCG-based rewards, ReinPool identifies and retains only the most discriminative vectors without requiring manual importance annotations. On the Vidore V2 benchmark across three vision-language embedding models, ReinPool compresses multi-vector representations by $746$--$1249\times$ into single vectors while recovering 76--81\% of full multi-vector retrieval performance. Compared to static mean pooling baselines, ReinPool achieves 22--33\% absolute NDCG@3 improvement, demonstrating that learned selection significantly outperforms heuristic aggregation.
Similar Papers
Multivector Reranking in the Era of Strong First-Stage Retrievers
Information Retrieval
Finds information faster without losing accuracy.
Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking
Computation and Language
Finds matching pictures and words in any language.
Investigating Multi-layer Representations for Dense Passage Retrieval
Information Retrieval
Finds better answers by using more of a computer's brain.