O-EENC-SD: Efficient Online End-to-End Neural Clustering for Speaker Diarization
By: Elio Gruttadauria , Mathieu Fontaine , Jonathan Le Roux and more
We introduce O-EENC-SD: an end-to-end online speaker diarization system based on EEND-EDA, featuring a novel RNN-based stitching mechanism for online prediction. In particular, we develop a novel centroid refinement decoder whose usefulness is assessed through a rigorous ablation study. Our system provides key advantages over existing methods: a hyperparameter-free solution compared to unsupervised clustering approaches, and a more efficient alternative to current online end-to-end methods, which are computationally costly. We demonstrate that O-EENC-SD is competitive with the state of the art in the two-speaker conversational telephone speech domain, as tested on the CallHome dataset. Our results show that O-EENC-SD provides a great trade-off between DER and complexity, even when working on independent chunks with no overlap, making the system extremely efficient.
Similar Papers
Probabilistic Fusion and Calibration of Neural Speaker Diarization Models
Sound
Makes AI better at knowing who is talking.
Probabilistic Fusion and Calibration of Neural Speaker Diarization Models
Sound
Makes AI better at telling who is talking.
Probabilistic Fusion and Calibration of Neural Speaker Diarization Models
Sound
Makes AI better at knowing who's talking.