Score: 1

Evaluating the Retrieval Robustness of Large Language Models

Published: May 28, 2025 | arXiv ID: 2505.21870v1

By: Shuyang Cao , Karthik Radhakrishnan , David Rosenberg and more

Potential Business Impact:

Makes AI smarter by checking its facts.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Retrieval-augmented generation (RAG) generally enhances large language models' (LLMs) ability to solve knowledge-intensive tasks. But RAG may also lead to performance degradation due to imperfect retrieval and the model's limited ability to leverage retrieved content. In this work, we evaluate the robustness of LLMs in practical RAG setups (henceforth retrieval robustness). We focus on three research questions: (1) whether RAG is always better than non-RAG; (2) whether more retrieved documents always lead to better performance; (3) and whether document orders impact results. To facilitate this study, we establish a benchmark of 1500 open-domain questions, each with retrieved documents from Wikipedia. We introduce three robustness metrics, each corresponds to one research question. Our comprehensive experiments, involving 11 LLMs and 3 prompting strategies, reveal that all of these LLMs exhibit surprisingly high retrieval robustness; nonetheless, different degrees of imperfect robustness hinders them from fully utilizing the benefits of RAG.

Investigating the Robustness of Retrieval-Augmented Generation at the Query Level

Computation and Language

Makes AI smarter by improving how it finds answers.

9 Jul 2025 2

94%

Retrieval Augmented Generation Evaluation in the Era of Large Language Models: A Comprehensive Survey

Computation and Language

Tests how AI uses outside facts to answer questions.

21 Apr 2025 0

94%

Retrieval-Augmented Generation: A Comprehensive Survey of Architectures, Enhancements, and Robustness Frontiers

Information Retrieval

Helps computers answer questions with real-world facts.

28 May 2025 1

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Repos / Data Links

github.com

Page Count

19 pages

Evaluating the Retrieval Robustness of Large Language Models

Makes AI smarter by checking its facts.

Technical Abstract

Investigating the Robustness of Retrieval-Augmented Generation at the Query Level

Retrieval Augmented Generation Evaluation in the Era of Large Language Models: A Comprehensive Survey

Retrieval-Augmented Generation: A Comprehensive Survey of Architectures, Enhancements, and Robustness Frontiers