Measuring and Analyzing Intelligence via Contextual Uncertainty in Large Language Models using Information-Theoretic Metrics
By: Jae Wan Shim
Potential Business Impact:
Shows how AI thinks by tracking its guessing.
The remarkable capabilities of Large Language Models (LLMs) are now extensively documented on task-specific benchmarks, yet the internal mechanisms that produce these results are the subject of intense scientific inquiry. This paper contributes to this inquiry by moving beyond metrics that measure \textit{what} models can do, to a methodology that characterizes \textit{how} they process information. We introduce a novel, task-agnostic approach to probe these dynamics by creating a quantitative ``Cognitive Profile" for any given model. This profile is centered on the \textbf{Entropy Decay Curve}, a visualization that traces how a model's normalized predictive uncertainty changes as a function of context length. Applying this methodology to several state-of-the-art LLMs across diverse texts, we uncover unique and consistent cognitive profiles that are sensitive to both model scale and text complexity. We also introduce the Information Gain Span (IGS) index to summarize the desirability of the decay trajectory. This work thus provides a new, principled lens for analyzing and comparing the intrinsic operational dynamics of artificial intelligence.
Similar Papers
GuessingGame: Measuring the Informativeness of Open-Ended Questions in Large Language Models
Computation and Language
Teaches computers to ask smart questions to guess things.
Revisiting Long-context Modeling from Context Denoising Perspective
Computation and Language
Cleans computer brains to understand long stories better.
Cognitive Load Limits in Large Language Models: Benchmarking Multi-Hop Reasoning
Artificial Intelligence
AI struggles with too much information.