Score: 0

Corpus of Cross-lingual Dialogues with Minutes and Detection of Misunderstandings

Published: December 23, 2025 | arXiv ID: 2512.20204v1

By: Marko Čechovič , Natália Komorníková , Dominik Macháček and more

Potential Business Impact:

Helps people talk across languages, finds misunderstandings.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Speech processing and translation technology have the potential to facilitate meetings of individuals who do not share any common language. To evaluate automatic systems for such a task, a versatile and realistic evaluation corpus is needed. Therefore, we create and present a corpus of cross-lingual dialogues between individuals without a common language who were facilitated by automatic simultaneous speech translation. The corpus consists of 5 hours of speech recordings with ASR and gold transcripts in 12 original languages and automatic and corrected translations into English. For the purposes of research into cross-lingual summarization, our corpus also includes written summaries (minutes) of the meetings. Moreover, we propose automatic detection of misunderstandings. For an overview of this task and its complexity, we attempt to quantify misunderstandings in cross-lingual meetings. We annotate misunderstandings manually and also test the ability of current large language models to detect them automatically. The results show that the Gemini model is able to identify text spans with misunderstandings with recall of 77% and precision of 47%.

Cross-Lingual Interleaving for Speech Language Models

Computation and Language

Helps computers understand many languages from talking.

1 Dec 2025 0

87%

Multilingual corpora for the study of new concepts in the social sciences and humanities:

Computation and Language

Helps computers understand new ideas from company websites.

8 Dec 2025 0

86%

Zero-Shot Recognition of Dysarthric Speech Using Commercial Automatic Speech Recognition and Multimodal Large Language Models

Audio and Speech Processing

Helps people with speech problems talk to computers.

19 Dec 2025 0

View PDF Login to Bookmark

Country of Origin

🇨🇿 Czech Republic

Page Count

12 pages

Corpus of Cross-lingual Dialogues with Minutes and Detection of Misunderstandings

Helps people talk across languages, finds misunderstandings.

Technical Abstract

Cross-Lingual Interleaving for Speech Language Models

Multilingual corpora for the study of new concepts in the social sciences and humanities:

Zero-Shot Recognition of Dysarthric Speech Using Commercial Automatic Speech Recognition and Multimodal Large Language Models