LLMLagBench: Identifying Temporal Training Boundaries in Large Language Models
By: Piotr Pęzik , Konrad Kaczyński , Maria Szymańska and more
Potential Business Impact:
Tests how up-to-date a computer's knowledge is.
Large Language Models (LLMs) are pretrained on textual data up to a specific temporal cutoff. This creates a strict knowledge boundary beyond which models cannot provide accurate information without querying external sources. More subtly, when this limitation is unknown or ignored, LLMs may inadvertently blend outdated time-sensitive information with general knowledge during reasoning tasks, potentially compromising response accuracy. We introduce LLMLagBench, an LLM freshness benchmark, as a systematic approach for identifying the earliest probable temporal boundaries of an LLM's training data by evaluating its knowledge of recent events. We then apply this benchmark to evaluate a large set of LLMs, including models with both explicitly declared and undeclared training cutoffs. The reliability of the benchmark is assessed by manual validation and comparison with publicly released information about LLM pretraining.
Similar Papers
Question Answering under Temporal Conflict: Evaluating and Organizing Evolving Knowledge with LLMs
Computation and Language
Helps computers remember and use new facts.
On the Fundamental Limits of LLMs at Scale
Machine Learning (CS)
Limits how much big computer brains can learn.
Realizing LLMs' Causal Potential Requires Science-Grounded, Novel Benchmarks
Machine Learning (CS)
Helps AI understand cause and effect better.