PairSem: LLM-Guided Pairwise Semantic Matching for Scientific Document Retrieval
By: Wonbin Kweon , Runchu Tian , SeongKu Kang and more
Potential Business Impact:
Finds science papers by matching ideas, not just words.
Scientific document retrieval is a critical task for enabling knowledge discovery and supporting research across diverse domains. However, existing dense retrieval methods often struggle to capture fine-grained scientific concepts in texts due to their reliance on holistic embeddings and limited domain understanding. Recent approaches leverage large language models (LLMs) to extract fine-grained semantic entities and enhance semantic matching, but they typically treat entities as independent fragments, overlooking the multi-faceted nature of scientific concepts. To address this limitation, we propose Pairwise Semantic Matching (PairSem), a framework that represents relevant semantics as entity-aspect pairs, capturing complex, multi-faceted scientific concepts. PairSem is unsupervised, base retriever-agnostic, and plug-and-play, enabling precise and context-aware matching without requiring query-document labels or entity annotations. Extensive experiments on multiple datasets and retrievers demonstrate that PairSem significantly improves retrieval performance, highlighting the importance of modeling multi-aspect semantics in scientific information retrieval.
Similar Papers
Scientific Paper Retrieval with LLM-Guided Semantic-Based Ranking
Information Retrieval
Finds the best science papers for your questions.
SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific Abstracts
Computation and Language
Helps computers understand science papers better.
ReMatch: Boosting Representation through Matching for Multimodal Retrieval
CV and Pattern Recognition
Helps computers find matching pictures and text.