A Comparative Study of Controllability, Explainability, and Performance in Dysfluency Detection Models
By: Eric Zhang , Li Wei , Sarah Chen and more
Potential Business Impact:
Helps doctors understand speech problems better.
Recent advances in dysfluency detection have introduced a variety of modeling paradigms, ranging from lightweight object-detection inspired networks (YOLOStutter) to modular interpretable frameworks (UDM). While performance on benchmark datasets continues to improve, clinical adoption requires more than accuracy: models must be controllable and explainable. In this paper, we present a systematic comparative analysis of four representative approaches--YOLO-Stutter, FluentNet, UDM, and SSDM--along three dimensions: performance, controllability, and explainability. Through comprehensive evaluation on multiple datasets and expert clinician assessment, we find that YOLO-Stutter and FluentNet provide efficiency and simplicity, but with limited transparency; UDM achieves the best balance of accuracy and clinical interpretability; and SSDM, while promising, could not be fully reproduced in our experiments. Our analysis highlights the trade-offs among competing approaches and identifies future directions for clinically viable dysfluency modeling. We also provide detailed implementation insights and practical deployment considerations for each approach.
Similar Papers
Deploying UDM Series in Real-Life Stuttered Speech Applications: A Clinical Evaluation Framework
Sound
Helps doctors diagnose speech problems faster.
Revisiting Rule-Based Stuttering Detection: A Comprehensive Analysis of Interpretable Models for Clinical Applications
Artificial Intelligence
Helps doctors understand stuttering better.
VocalBench-DF: A Benchmark for Evaluating Speech LLM Robustness to Disfluency
Computation and Language
Makes talking computers understand shaky voices.