Score: 1

Pushing the Limits of End-to-End Diarization

Published: September 18, 2025 | arXiv ID: 2509.14737v1

By: Samuel J. Broughton, Lahiru Samarakoon

Potential Business Impact:

Helps computers know who is talking when.

Business Areas:
Speech Recognition Data and Analytics, Software

In this paper, we present state-of-the-art diarization error rates (DERs) on multiple publicly available datasets, including AliMeeting-far, AliMeeting-near, AMI-Mix, AMI-SDM, DIHARD III, and MagicData RAMC. Leveraging EEND-TA, a single unified non-autoregressive model for end-to-end speaker diarization, we achieve new benchmark results, most notably a DER of 14.49% on DIHARD III. Our approach scales pretraining through 8-speaker simulation mixtures, ensuring each generated speaker mixture configuration is sufficiently represented. These experiments highlight that EEND-based architectures possess a greater capacity for learning than previously explored, surpassing many existing diarization solutions while maintaining efficient speeds during inference.

Page Count
5 pages

Category
Computer Science:
Sound