Ontology-Guided Query Expansion for Biomedical Document Retrieval using Large Language Models
By: Zabir Al Nazi , Vagelis Hristidis , Aaron Lawson McLean and more
Potential Business Impact:
Helps find medical answers in science papers.
Effective Question Answering (QA) on large biomedical document collections requires effective document retrieval techniques. The latter remains a challenging task due to the domain-specific vocabulary and semantic ambiguity in user queries. We propose BMQExpander, a novel ontology-aware query expansion pipeline that combines medical knowledge - definitions and relationships - from the UMLS Metathesaurus with the generative capabilities of large language models (LLMs) to enhance retrieval effectiveness. We implemented several state-of-the-art baselines, including sparse and dense retrievers, query expansion methods, and biomedical-specific solutions. We show that BMQExpander has superior retrieval performance on three popular biomedical Information Retrieval (IR) benchmarks: NFCorpus, TREC-COVID, and SciFact - with improvements of up to 22.1% in NDCG@10 over sparse baselines and up to 6.5% over the strongest baseline. Further, BMQExpander generalizes robustly under query perturbation settings, in contrast to supervised baselines, achieving up to 15.7% improvement over the strongest baseline. As a side contribution, we publish our paraphrased benchmarks. Finally, our qualitative analysis shows that BMQExpander has fewer hallucinations compared to other LLM-based query expansion baselines.
Similar Papers
Query Expansion in the Age of Pre-trained and Large Language Models: A Comprehensive Survey
Information Retrieval
Helps computers find better answers to your questions.
ThinkQE: Query Expansion via an Evolving Thinking Process
Information Retrieval
Finds better search results by thinking more.
Harnessing Collective Intelligence of LLMs for Robust Biomedical QA: A Multi-Model Approach
Computation and Language
Helps doctors find answers in medical books faster.