An Effective Strategy for Modeling Score Ordinality and Non-uniform Intervals in Automated Speaking Assessment
By: Tien-Hong Lo , Szu-Yu Chen , Yao-Ting Sung and more
Potential Business Impact:
Helps computers judge how well people speak English.
A recent line of research on automated speaking assessment (ASA) has benefited from self-supervised learning (SSL) representations, which capture rich acoustic and linguistic patterns in non-native speech without underlying assumptions of feature curation. However, speech-based SSL models capture acoustic-related traits but overlook linguistic content, while text-based SSL models rely on ASR output and fail to encode prosodic nuances. Moreover, most prior arts treat proficiency levels as nominal classes, ignoring their ordinal structure and non-uniform intervals between proficiency labels. To address these limitations, we propose an effective ASA approach combining SSL with handcrafted indicator features via a novel modeling paradigm. We further introduce a multi-margin ordinal loss that jointly models both the score ordinality and non-uniform intervals of proficiency labels. Extensive experiments on the TEEMI corpus show that our method consistently outperforms strong baselines and generalizes well to unseen prompts.
Similar Papers
An Effective Strategy for Modeling Score Ordinality and Non-uniform Intervals in Automated Speaking Assessment
Audio and Speech Processing
Helps computers judge how well people speak English.
Layer-wise Analysis for Quality of Multilingual Synthesized Speech
Audio and Speech Processing
Makes computer voices sound more human-like.
A Novel Data Augmentation Approach for Automatic Speaking Assessment on Opinion Expressions
Computation and Language
Teaches computers to judge speaking skills from voice.