Generative Query Expansion with Multilingual LLMs for Cross-Lingual Information Retrieval
By: Olivia Macmillan-Scott, Roksana Goworek, Eda B. Özyiğit
Potential Business Impact:
Helps computers find information in different languages.
Query expansion is the reformulation of a user query by adding semantically related information, and is an essential component of monolingual and cross-lingual information retrieval used to ensure that relevant documents are not missed. Recently, multilingual large language models (mLLMs) have shifted query expansion from semantic augmentation with synonyms and related words to pseudo-document generation. Pseudo-documents both introduce additional relevant terms and bridge the gap between short queries and long documents, which is particularly beneficial in dense retrieval. This study evaluates recent mLLMs and fine-tuned variants across several generative expansion strategies to identify factors that drive cross-lingual retrieval performance. Results show that query length largely determines which prompting technique is effective, and that more elaborate prompts often do not yield further gains. Substantial linguistic disparities persist: cross-lingual query expansion can produce the largest improvements for languages with the weakest baselines, yet retrieval is especially poor between languages written in different scripts. Fine-tuning is found to lead to performance gains only when the training and test data are of similar format. These outcomes underline the need for more balanced multilingual and cross-lingual training and evaluation resources.
Similar Papers
Query Expansion in the Age of Pre-trained and Large Language Models: A Comprehensive Survey
Information Retrieval
Helps computers find better answers to your questions.
Bridging Language Gaps: Advances in Cross-Lingual Information Retrieval with Multilingual LLMs
Information Retrieval
Finds information in any language.
What Drives Cross-lingual Ranking? Retrieval Approaches with Multilingual Language Models
Information Retrieval
Find information in any language, even rare ones.