Score: 2

Ontology-Guided Query Expansion for Biomedical Document Retrieval using Large Language Models

Published: August 15, 2025 | arXiv ID: 2508.11784v1

By: Zabir Al Nazi , Vagelis Hristidis , Aaron Lawson McLean and more

Potential Business Impact:

Helps find medical answers in science papers.

Effective Question Answering (QA) on large biomedical document collections requires effective document retrieval techniques. The latter remains a challenging task due to the domain-specific vocabulary and semantic ambiguity in user queries. We propose BMQExpander, a novel ontology-aware query expansion pipeline that combines medical knowledge - definitions and relationships - from the UMLS Metathesaurus with the generative capabilities of large language models (LLMs) to enhance retrieval effectiveness. We implemented several state-of-the-art baselines, including sparse and dense retrievers, query expansion methods, and biomedical-specific solutions. We show that BMQExpander has superior retrieval performance on three popular biomedical Information Retrieval (IR) benchmarks: NFCorpus, TREC-COVID, and SciFact - with improvements of up to 22.1% in NDCG@10 over sparse baselines and up to 6.5% over the strongest baseline. Further, BMQExpander generalizes robustly under query perturbation settings, in contrast to supervised baselines, achieving up to 15.7% improvement over the strongest baseline. As a side contribution, we publish our paraphrased benchmarks. Finally, our qualitative analysis shows that BMQExpander has fewer hallucinations compared to other LLM-based query expansion baselines.

Country of Origin
🇺🇸 United States

Repos / Data Links

Page Count
11 pages

Category
Computer Science:
Information Retrieval