Tug-of-war between idiom's figurative and literal meanings in LLMs
By: Soyoung Oh , Xinting Huang , Mathis Pink and more
Potential Business Impact:
Helps computers understand tricky sayings and jokes.
Idioms present a unique challenge for language models due to their non-compositional figurative meanings, which often strongly diverge from the idiom's literal interpretation. This duality requires a model to learn representing and deciding between the two meanings to interpret an idiom in a figurative sense, or literally. In this paper, we employ tools from mechanistic interpretability to trace how a large pretrained causal transformer (LLama3.2-1B-base) deals with this ambiguity. We localize three steps of idiom processing: First, the idiom's figurative meaning is retrieved in early attention and MLP sublayers. We identify specific attention heads which boost the figurative meaning of the idiom while suppressing the idiom's literal interpretation. The model subsequently represents the figurative representation through an intermediate path. Meanwhile, a parallel bypass route forwards literal interpretation, ensuring that a both reading remain available. Overall, our findings provide a mechanistic evidence for idiom comprehension in an autoregressive transformer.
Similar Papers
Unveiling LLMs' Metaphorical Understanding: Exploring Conceptual Irrelevance, Context Leveraging and Syntactic Influence
Computation and Language
Computers still struggle to understand word pictures.
Beyond Understanding: Evaluating the Pragmatic Gap in LLMs' Cultural Processing of Figurative Language
Computation and Language
Computers understand jokes and sayings better.
Visual Puns from Idioms: An Iterative LLM-T2IM-MLLM Framework
Computation and Language
Creates funny pictures from sayings.