Score: 0

M-Eval: A Heterogeneity-Based Framework for Multi-evidence Validation in Medical RAG Systems

Published: October 28, 2025 | arXiv ID: 2510.23995v1

By: Mengzhou Sun , Sendong Zhao , Jianyu Chen and more

Potential Business Impact:

Finds mistakes in AI medical answers.

Business Areas:

Electronic Health Record (EHR) Health Care

Retrieval-augmented Generation (RAG) has demonstrated potential in enhancing medical question-answering systems through the integration of large language models (LLMs) with external medical literature. LLMs can retrieve relevant medical articles to generate more professional responses efficiently. However, current RAG applications still face problems. They generate incorrect information, such as hallucinations, and they fail to use external knowledge correctly. To solve these issues, we propose a new method named M-Eval. This method is inspired by the heterogeneity analysis approach used in Evidence-Based Medicine (EBM). Our approach can check for factual errors in RAG responses using evidence from multiple sources. First, we extract additional medical literature from external knowledge bases. Then, we retrieve the evidence documents generated by the RAG system. We use heterogeneity analysis to check whether the evidence supports different viewpoints in the response. In addition to verifying the accuracy of the response, we also assess the reliability of the evidence provided by the RAG system. Our method shows an improvement of up to 23.31% accuracy across various LLMs. This work can help detect errors in current RAG-based medical systems. It also makes the applications of LLMs more reliable and reduces diagnostic errors.

META-RAG: Meta-Analysis-Inspired Evidence-Re-Ranking Method for Retrieval-Augmented Generation in Evidence-Based Medicine

Computation and Language

Helps doctors find the best medical facts.

28 Oct 2025 0

90%

Retrieval Augmented Generation Evaluation in the Era of Large Language Models: A Comprehensive Survey

Computation and Language

Tests how AI uses outside facts to answer questions.

21 Apr 2025 0

90%

Rethinking Retrieval-Augmented Generation for Medicine: A Large-Scale, Systematic Expert Evaluation and Practical Insights

Computation and Language

Makes AI doctors more truthful and helpful.

10 Nov 2025 1

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

8 pages

M-Eval: A Heterogeneity-Based Framework for Multi-evidence Validation in Medical RAG Systems

Finds mistakes in AI medical answers.

Technical Abstract

META-RAG: Meta-Analysis-Inspired Evidence-Re-Ranking Method for Retrieval-Augmented Generation in Evidence-Based Medicine

Retrieval Augmented Generation Evaluation in the Era of Large Language Models: A Comprehensive Survey

Rethinking Retrieval-Augmented Generation for Medicine: A Large-Scale, Systematic Expert Evaluation and Practical Insights