It Depends: Resolving Referential Ambiguity in Minimal Contexts with Commonsense Knowledge
By: Lukas Ellinger, Georg Groh
Potential Business Impact:
Helps computers understand confusing words in chats.
Ambiguous words or underspecified references require interlocutors to resolve them, often by relying on shared context and commonsense knowledge. Therefore, we systematically investigate whether Large Language Models (LLMs) can leverage commonsense to resolve referential ambiguity in multi-turn conversations and analyze their behavior when ambiguity persists. Further, we study how requests for simplified language affect this capacity. Using a novel multilingual evaluation dataset, we test DeepSeek v3, GPT-4o, Qwen3-32B, GPT-4o-mini, and Llama-3.1-8B via LLM-as-Judge and human annotations. Our findings indicate that current LLMs struggle to resolve ambiguity effectively: they tend to commit to a single interpretation or cover all possible references, rather than hedging or seeking clarification. This limitation becomes more pronounced under simplification prompts, which drastically reduce the use of commonsense reasoning and diverse response strategies. Fine-tuning Llama-3.1-8B with Direct Preference Optimization substantially improves ambiguity resolution across all request types. These results underscore the need for advanced fine-tuning to improve LLMs' handling of ambiguity and to ensure robust performance across diverse communication styles.
Similar Papers
Correct-Detect: Balancing Performance and Ambiguity Through the Lens of Coreference Resolution in LLMs
Computation and Language
Computers can't always tell who "he" or "she" is.
Who Relies More on World Knowledge and Bias for Syntactic Ambiguity Resolution: Humans or LLMs?
Computation and Language
Computers struggle to understand tricky sentences.
Disambiguation in Conversational Question Answering in the Era of LLMs and Agents: A Survey
Computation and Language
Helps computers understand confusing words better.