Score: 1

Data-efficient Meta-models for Evaluation of Context-based Questions and Answers in LLMs

Published: May 29, 2025 | arXiv ID: 2505.23299v1

By: Julia Belikova , Konstantin Polev , Rauf Parchiev and more

Potential Business Impact:

Finds fake answers from smart computer programs.

Business Areas:

Semantic Search Internet Services

Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) systems are increasingly deployed in industry applications, yet their reliability remains hampered by challenges in detecting hallucinations. While supervised state-of-the-art (SOTA) methods that leverage LLM hidden states -- such as activation tracing and representation analysis -- show promise, their dependence on extensively annotated datasets limits scalability in real-world applications. This paper addresses the critical bottleneck of data annotation by investigating the feasibility of reducing training data requirements for two SOTA hallucination detection frameworks: Lookback Lens, which analyzes attention head dynamics, and probing-based approaches, which decode internal model representations. We propose a methodology combining efficient classification algorithms with dimensionality reduction techniques to minimize sample size demands while maintaining competitive performance. Evaluations on standardized question-answering RAG benchmarks show that our approach achieves performance comparable to strong proprietary LLM-based baselines with only 250 training samples. These results highlight the potential of lightweight, data-efficient paradigms for industrial deployment, particularly in annotation-constrained scenarios.

MetaRAG: Metamorphic Testing for Hallucination Detection in RAG Systems

Computation and Language

Finds when AI makes up wrong information.

11 Sep 2025 0

91%

Hybrid Retrieval for Hallucination Mitigation in Large Language Models: A Comparative Analysis

Information Retrieval

Makes AI tell the truth, not make things up.

28 Feb 2025 1

91%

Optimizing Medical Question-Answering Systems: A Comparative Study of Fine-Tuned and Zero-Shot Large Language Models with RAG Framework

Computation and Language

Answers medical questions accurately using reliable sources.

5 Dec 2025 1

View PDF Login to Bookmark

Page Count

5 pages

Data-efficient Meta-models for Evaluation of Context-based Questions and Answers in LLMs

Finds fake answers from smart computer programs.

Technical Abstract

MetaRAG: Metamorphic Testing for Hallucination Detection in RAG Systems

Hybrid Retrieval for Hallucination Mitigation in Large Language Models: A Comparative Analysis

Optimizing Medical Question-Answering Systems: A Comparative Study of Fine-Tuned and Zero-Shot Large Language Models with RAG Framework