Score: 2

Coreference as an indicator of context scope in multimodal narrative

Published: March 7, 2025 | arXiv ID: 2503.05298v2

By: Nikolai Ilinykh , Shalom Lappin , Asad Sayeed and more

Potential Business Impact:

Helps computers tell stories like people.

Business Areas:
Semantic Search Internet Services

We demonstrate that large multimodal language models differ substantially from humans in the distribution of coreferential expressions in a visual storytelling task. We introduce a number of metrics to quantify the characteristics of coreferential patterns in both human- and machine-written texts. Humans distribute coreferential expressions in a way that maintains consistency across texts and images, interleaving references to different entities in a highly varied way. Machines are less able to track mixed references, despite achieving perceived improvements in generation quality. Materials, metrics, and code for our study are available at https://github.com/GU-CLASP/coreference-context-scope.

Country of Origin
🇸🇪 🇬🇧 United Kingdom, Sweden

Page Count
19 pages

Category
Computer Science:
Computation and Language