The Imperfective Paradox in Large Language Models
By: Bolei Ma, Yusuke Miyao
Do Large Language Models (LLMs) genuinely grasp the compositional semantics of events, or do they rely on surface-level probabilistic heuristics? We investigate the Imperfective Paradox, a logical phenomenon where the past progressive aspect entails event realization for activities (e.g., running $\to$ ran) but not for accomplishments (e.g., building $\nrightarrow$ built). We introduce ImperfectiveNLI, a diagnostic dataset designed to probe this distinction across diverse semantic classes. Evaluating state-of-the-art open-weight models, we uncover a pervasive Teleological Bias: models systematically hallucinate completion for goal-oriented events, often overriding explicit textual negation. Representational analyses show that while internal embeddings often distinguish process from result, inference decisions are dominated by strong priors about goal attainment. We further find that prompting-based interventions reduce hallucinated completions but also increase incorrect rejections of valid entailments. Our findings suggest that current LLMs lack structural aspectual awareness, operating as predictive narrative engines rather than faithful logical reasoners.
Similar Papers
On the Limits of Innate Planning in Large Language Models
Artificial Intelligence
Computers struggle to solve puzzles without help.
Plausibility as Failure: How LLMs and Humans Co-Construct Epistemic Error
Human-Computer Interaction
AI tricks people into trusting wrong answers.
Large Language Models Do NOT Really Know What They Don't Know
Computation and Language
Computers can't tell true from fake facts.