CLAP: Coreference-Linked Augmentation for Passage Retrieval
By: Huanwei Xu, Lin Xu, Liang Yuan
Potential Business Impact:
Helps computers find information better by understanding words.
Large Language Model (LLM)-based passage expansion has shown promise for enhancing first-stage retrieval, but often underperforms with dense retrievers due to semantic drift and misalignment with their pretrained semantic space. Beyond this, only a portion of a passage is typically relevant to a query, while the rest introduces noise--an issue compounded by chunking techniques that break coreference continuity. We propose Coreference-Linked Augmentation for Passage Retrieval (CLAP), a lightweight LLM-based expansion framework that segments passages into coherent chunks, resolves coreference chains, and generates localized pseudo-queries aligned with dense retriever representations. A simple fusion of global topical signals and fine-grained subtopic signals achieves robust performance across domains. CLAP yields consistent gains even as retriever strength increases, enabling dense retrievers to match or surpass second-stage rankers such as BM25 + MonoT5-3B, with up to 20.68% absolute nDCG@10 improvement. These improvements are especially notable in out-of-domain settings, where conventional LLM-based expansion methods relying on domain knowledge often falter. CLAP instead adopts a logic-centric pipeline that enables robust, domain-agnostic generalization.
Similar Papers
CLAP: Coreference-Linked Augmentation for Passage Retrieval
Information Retrieval
Helps search engines find better answers faster.
Cross-Layer Attention Probing for Fine-Grained Hallucination Detection
Computation and Language
Finds and fixes when AI makes up wrong answers.
Toward Structured Knowledge Reasoning: Contrastive Retrieval-Augmented Generation on Experience
Computation and Language
Helps computers understand tables and databases.