LVLMs and Humans Ground Differently in Referential Communication
By: Peter Zeng , Weiling Li , Amie Paige and more
Potential Business Impact:
Helps AI understand what people mean when they talk.
For generative AI agents to partner effectively with human users, the ability to accurately predict human intent is critical. But this ability to collaborate remains limited by a critical deficit: an inability to model common ground. Here, we present a referential communication experiment with a factorial design involving director-matcher pairs (human-human, human-AI, AI-human, and AI-AI) that interact with multiple turns in repeated rounds to match pictures of objects not associated with any obvious lexicalized labels. We release the online pipeline for data collection, the tools and analyses for accuracy, efficiency, and lexical overlap, and a corpus of 356 dialogues (89 pairs over 4 rounds each) that unmasks LVLMs' limitations in interactively resolving referring expressions, a crucial skill that underlies human language use.
Similar Papers
LVLMs and Humans Ground Differently in Referential Communication
Computation and Language
Helps AI understand what people mean when they talk.
LVLMs are Bad at Overhearing Human Referential Communication
Computation and Language
Computers learn to understand what people are talking about.
Investigating the Development of Task-Oriented Communication in Vision-Language Models
Artificial Intelligence
AI learns secret codes to work together.