Score: 0

Comparing the Performance of LLMs in RAG-based Question-Answering: A Case Study in Computer Science Literature

Published: November 5, 2025 | arXiv ID: 2511.03261v1

By: Ranul Dayarathne, Uvini Ranaweera, Upeksha Ganegoda

Potential Business Impact:

Helps AI answer questions more truthfully and accurately.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Retrieval Augmented Generation (RAG) is emerging as a powerful technique to enhance the capabilities of Generative AI models by reducing hallucination. Thus, the increasing prominence of RAG alongside Large Language Models (LLMs) has sparked interest in comparing the performance of different LLMs in question-answering (QA) in diverse domains. This study compares the performance of four open-source LLMs, Mistral-7b-instruct, LLaMa2-7b-chat, Falcon-7b-instruct and Orca-mini-v3-7b, and OpenAI's trending GPT-3.5 over QA tasks within the computer science literature leveraging RAG support. Evaluation metrics employed in the study include accuracy and precision for binary questions and ranking by a human expert, ranking by Google's AI model Gemini, alongside cosine similarity for long-answer questions. GPT-3.5, when paired with RAG, effectively answers binary and long-answer questions, reaffirming its status as an advanced LLM. Regarding open-source LLMs, Mistral AI's Mistral-7b-instruct paired with RAG surpasses the rest in answering both binary and long-answer questions. However, among the open-source LLMs, Orca-mini-v3-7b reports the shortest average latency in generating responses, whereas LLaMa2-7b-chat by Meta reports the highest average latency. This research underscores the fact that open-source LLMs, too, can go hand in hand with proprietary models like GPT-3.5 with better infrastructure.

Aligning LLMs for the Classroom with Knowledge-Based Retrieval -- A Comparative RAG Study

Artificial Intelligence

Makes AI answers for school more truthful.

9 Sep 2025 1

93%

Optimizing Medical Question-Answering Systems: A Comparative Study of Fine-Tuned and Zero-Shot Large Language Models with RAG Framework

Computation and Language

Answers medical questions accurately using reliable sources.

5 Dec 2025 1

92%

Agri-Query: A Case Study on RAG vs. Long-Context LLMs for Cross-Lingual Technical Question Answering

Computation and Language

Helps computers answer questions from manuals better.

25 Aug 2025 2

View PDF Login to Bookmark

Page Count

18 pages

Comparing the Performance of LLMs in RAG-based Question-Answering: A Case Study in Computer Science Literature

Helps AI answer questions more truthfully and accurately.

Technical Abstract

Aligning LLMs for the Classroom with Knowledge-Based Retrieval -- A Comparative RAG Study

Optimizing Medical Question-Answering Systems: A Comparative Study of Fine-Tuned and Zero-Shot Large Language Models with RAG Framework

Agri-Query: A Case Study on RAG vs. Long-Context LLMs for Cross-Lingual Technical Question Answering