FACTUM: Mechanistic Detection of Citation Hallucination in Long-Form RAG
By: Maxime Dassen , Rebecca Kotula , Kenton Murray and more
Potential Business Impact:
Fixes AI's fake facts and sources.
Retrieval-Augmented Generation (RAG) models are critically undermined by citation hallucinations, a deceptive failure where a model confidently cites a source that fails to support its claim. Existing work often attributes hallucination to a simple over-reliance on the model's parametric knowledge. We challenge this view and introduce FACTUM (Framework for Attesting Citation Trustworthiness via Underlying Mechanisms), a framework of four mechanistic scores measuring the distinct contributions of a model's attention and FFN pathways, and the alignment between them. Our analysis reveals two consistent signatures of correct citation: a significantly stronger contribution from the model's parametric knowledge and greater use of the attention sink for information synthesis. Crucially, we find the signature of a correct citation is not static but evolves with model scale. For example, the signature of a correct citation for the Llama-3.2-3B model is marked by higher pathway alignment, whereas for the Llama-3.1-8B model, it is characterized by lower alignment, where pathways contribute more distinct, orthogonal information. By capturing this complex, evolving signature, FACTUM outperforms state-of-the-art baselines by up to 37.5% in AUC. Our findings reframe citation hallucination as a complex, scale-dependent interplay between internal mechanisms, paving the way for more nuanced and reliable RAG systems.
Similar Papers
Detecting Hallucinations in Retrieval-Augmented Generation via Semantic-level Internal Reasoning Graph
Computation and Language
Finds when AI lies about facts it learned.
MetaRAG: Metamorphic Testing for Hallucination Detection in RAG Systems
Computation and Language
Finds when AI makes up wrong information.
FATHOMS-RAG: A Framework for the Assessment of Thinking and Observation in Multimodal Systems that use Retrieval Augmented Generation
Artificial Intelligence
Helps computers answer questions using text, pictures, and tables.