Score: 0

Metaphors are a Source of Cross-Domain Misalignment of Large Reasoning Models

Published: January 6, 2026 | arXiv ID: 2601.03388v1

By: Zhibo Hu , Chen Wang , Yanfeng Shu and more

Potential Business Impact:

Metaphors make AI models think wrongly.

Business Areas:
Semantic Search Internet Services

Earlier research has shown that metaphors influence human's decision making, which raises the question of whether metaphors also influence large language models (LLMs)' reasoning pathways, considering their training data contain a large number of metaphors. In this work, we investigate the problem in the scope of the emergent misalignment problem where LLMs can generalize patterns learned from misaligned content in one domain to another domain. We discover a strong causal relationship between metaphors in training data and the misalignment degree of LLMs' reasoning contents. With interventions using metaphors in pre-training, fine-tuning and re-alignment phases, models' cross-domain misalignment degrees change significantly. As we delve deeper into the causes behind this phenomenon, we observe that there is a connection between metaphors and the activation of global and local latent features of large reasoning models. By monitoring these latent features, we design a detector that predict misaligned content with high accuracy.

Country of Origin
🇦🇺 Australia

Page Count
17 pages

Category
Computer Science:
Computation and Language