Score: 0

Large Language Models as Span Annotators

Published: April 11, 2025 | arXiv ID: 2504.08697v2

By: Zdeněk Kasner , Vilém Zouhar , Patrícia Schmidtová and more

Potential Business Impact:

Computers can now find and label text parts.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Span annotation is the task of localizing and classifying text spans according to custom guidelines. Annotated spans can be used to analyze and evaluate high-quality texts for which single-score metrics fail to provide actionable feedback. Until recently, span annotation was limited to human annotators or fine-tuned models. In this study, we show that large language models (LLMs) can serve as flexible and cost-effective span annotation backbones. To demonstrate their utility, we compare LLMs to skilled human annotators on three diverse span annotation tasks: evaluating data-to-text generation, identifying translation errors, and detecting propaganda techniques. We demonstrate that LLMs achieve inter-annotator agreement (IAA) comparable to human annotators at a fraction of a cost per output annotation. We also manually analyze model outputs, finding that LLMs make errors at a similar rate to human annotators. We release the dataset of more than 40k model and human annotations for further research.