SwissGov-RSD: A Human-annotated, Cross-lingual Benchmark for Token-level Recognition of Semantic Differences Between Related Documents
By: Michelle Wastl, Jannis Vamvas, Rico Sennrich
Potential Business Impact:
Helps computers find differences between texts in different languages.
Recognizing semantic differences across documents, especially in different languages, is crucial for text generation evaluation and multilingual content alignment. However, as a standalone task it has received little attention. We address this by introducing SwissGov-RSD, the first naturalistic, document-level, cross-lingual dataset for semantic difference recognition. It encompasses a total of 224 multi-parallel documents in English-German, English-French, and English-Italian with token-level difference annotations by human annotators. We evaluate a variety of open-source and closed source large language models as well as encoder models across different fine-tuning settings on this new benchmark. Our results show that current automatic approaches perform poorly compared to their performance on monolingual, sentence-level, and synthetic benchmarks, revealing a considerable gap for both LLMs and encoder models. We make our code and datasets publicly available.
Similar Papers
Advancing STT for Low-Resource Real-World Speech
Computation and Language
Lets computers understand spoken Swiss German better.
Swiss Parliaments Corpus Re-Imagined (SPC_R): Enhanced Transcription with RAG-based Correction and Predicted BLEU
Computation and Language
Makes computer speech-to-text understand long talks.
20min-XD: A Comparable Corpus of Swiss News Articles
Computation and Language
Helps computers understand news in different languages.