Pushing the Limits of End-to-End Diarization
By: Samuel J. Broughton, Lahiru Samarakoon
Potential Business Impact:
Helps computers know who is talking when.
In this paper, we present state-of-the-art diarization error rates (DERs) on multiple publicly available datasets, including AliMeeting-far, AliMeeting-near, AMI-Mix, AMI-SDM, DIHARD III, and MagicData RAMC. Leveraging EEND-TA, a single unified non-autoregressive model for end-to-end speaker diarization, we achieve new benchmark results, most notably a DER of 14.49% on DIHARD III. Our approach scales pretraining through 8-speaker simulation mixtures, ensuring each generated speaker mixture configuration is sufficiently represented. These experiments highlight that EEND-based architectures possess a greater capacity for learning than previously explored, surpassing many existing diarization solutions while maintaining efficient speeds during inference.
Similar Papers
Probabilistic Fusion and Calibration of Neural Speaker Diarization Models
Sound
Makes AI better at knowing who is talking.
Probabilistic Fusion and Calibration of Neural Speaker Diarization Models
Sound
Makes AI better at telling who is talking.
Probabilistic Fusion and Calibration of Neural Speaker Diarization Models
Sound
Makes AI better at knowing who's talking.