Score: 0

Distributional Semantics Tracing: A Framework for Explaining Hallucinations in Large Language Models

Published: October 7, 2025 | arXiv ID: 2510.06107v2

By: Gagan Bhatia , Somayajulu G Sripada , Kevin Allan and more

Potential Business Impact:

Finds why AI makes up fake facts.

Business Areas:

Semantic Search Internet Services

Large Language Models (LLMs) are prone to hallucination, the generation of plausible yet factually incorrect statements. This work investigates the intrinsic, architectural origins of this failure mode through three primary contributions. First, to enable the reliable tracing of internal semantic failures, we propose Distributional Semantics Tracing (DST), a unified framework that integrates established interpretability techniques to produce a causal map of a model's reasoning, treating meaning as a function of context (distributional semantics). Second, we pinpoint the model's layer at which a hallucination becomes inevitable, identifying a specific commitment layer where a model's internal representations irreversibly diverge from factuality. Third, we identify the underlying mechanism for these failures. We observe a conflict between distinct computational pathways, which we interpret using the lens of dual-process theory: a fast, heuristic associative pathway (akin to System 1) and a slow, deliberate, contextual pathway (akin to System 2), leading to predictable failure modes such as Reasoning Shortcut Hijacks. Our framework's ability to quantify the coherence of the contextual pathway reveals a strong negative correlation ($\rho = -0.863$) with hallucination rates, implying that these failures are predictable consequences of internal semantic weakness. The result is a mechanistic account of how, when, and why hallucinations occur within the Transformer architecture.

Distributional Semantics Tracing: A Framework for Explaining Hallucinations in Large Language Models

Computation and Language

Finds why AI makes up fake facts.

7 Oct 2025 0

90%

Grounding the Ungrounded: A Spectral-Graph Framework for Quantifying Hallucinations in multimodal LLMs

Machine Learning (CS)

Makes AI tell the truth, not make things up.

26 Aug 2025 0

89%

The Geometry of Truth: Layer-wise Semantic Dynamics for Hallucination Detection in Large Language Models

Computation and Language

Stops AI from making up false information.

6 Oct 2025 0

View PDF Login to Bookmark

Country of Origin

🇬🇧 United Kingdom

Page Count

19 pages

Distributional Semantics Tracing: A Framework for Explaining Hallucinations in Large Language Models

Finds why AI makes up fake facts.

Technical Abstract

Distributional Semantics Tracing: A Framework for Explaining Hallucinations in Large Language Models

Grounding the Ungrounded: A Spectral-Graph Framework for Quantifying Hallucinations in multimodal LLMs

The Geometry of Truth: Layer-wise Semantic Dynamics for Hallucination Detection in Large Language Models