Employing self-supervised learning models for cross-linguistic child speech maturity classification
By: Theo Zhang , Madurya Suresh , Anne S. Warlaumont and more
Potential Business Impact:
Helps computers understand babies' sounds better.
Speech technology systems struggle with many downstream tasks for child speech due to small training corpora and the difficulties that child speech pose. We apply a novel dataset, SpeechMaturity, to state-of-the-art transformer models to address a fundamental classification task: identifying child vocalizations. Unlike previous corpora, our dataset captures maximally ecologically-valid child vocalizations across an unprecedented sample, comprising children acquiring 25+ languages in the U.S., Bolivia, Vanuatu, Papua New Guinea, Solomon Islands, and France. The dataset contains 242,004 labeled vocalizations, magnitudes larger than previous work. Models were trained to distinguish between cry, laughter, mature (consonant+vowel), and immature speech (just consonant or vowel). Models trained on the dataset outperform state-of-the-art models trained on previous datasets, achieved classification accuracy comparable to humans, and were robust across rural and urban settings.
Similar Papers
Self-Supervised Models for Phoneme Recognition: Applications in Children's Speech for Reading Learning
Sound
Helps computers understand kids' voices better.
Speech transformer models for extracting information from baby cries
Sound
Helps computers understand baby cries and emotions.
Towards Data-Efficient Language Models: A Child-Inspired Approach to Language Learning
Computation and Language
Teaches computers to learn language like kids.