Score: 0

A Comparative Study of Controllability, Explainability, and Performance in Dysfluency Detection Models

Published: August 25, 2025 | arXiv ID: 2509.00058v1

By: Eric Zhang , Li Wei , Sarah Chen and more

Potential Business Impact:

Helps doctors understand speech problems better.

Business Areas:
Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Recent advances in dysfluency detection have introduced a variety of modeling paradigms, ranging from lightweight object-detection inspired networks (YOLOStutter) to modular interpretable frameworks (UDM). While performance on benchmark datasets continues to improve, clinical adoption requires more than accuracy: models must be controllable and explainable. In this paper, we present a systematic comparative analysis of four representative approaches--YOLO-Stutter, FluentNet, UDM, and SSDM--along three dimensions: performance, controllability, and explainability. Through comprehensive evaluation on multiple datasets and expert clinician assessment, we find that YOLO-Stutter and FluentNet provide efficiency and simplicity, but with limited transparency; UDM achieves the best balance of accuracy and clinical interpretability; and SSDM, while promising, could not be fully reproduced in our experiments. Our analysis highlights the trade-offs among competing approaches and identifies future directions for clinically viable dysfluency modeling. We also provide detailed implementation insights and practical deployment considerations for each approach.

Page Count
8 pages

Category
Computer Science:
Artificial Intelligence