Speech transformer models for extracting information from baby cries
By: Guillem Bonafos , Jéremy Rouch , Lény Lego and more
Potential Business Impact:
Helps computers understand baby cries and emotions.
Transfer learning using latent representations from pre-trained speech models achieves outstanding performance in tasks where labeled data is scarce. However, their applicability to non-speech data and the specific acoustic properties encoded in these representations remain largely unexplored. In this study, we investigate both aspects. We evaluate five pre-trained speech models on eight baby cries datasets, encompassing 115 hours of audio from 960 babies. For each dataset, we assess the latent representations of each model across all available classification tasks. Our results demonstrate that the latent representations of these models can effectively classify human baby cries and encode key information related to vocal source instability and identity of the crying baby. In addition, a comparison of the architectures and training strategies of these models offers valuable insights for the design of future models tailored to similar tasks, such as emotion detection.
Similar Papers
Making deep neural networks work for medical audio: representation, compression and domain adaptation
Sound
Helps doctors hear sickness in baby cries.
Infant Cry Detection Using Causal Temporal Representation
Sound
Helps machines hear baby cries in noisy places.
Employing self-supervised learning models for cross-linguistic child speech maturity classification
Computation and Language
Helps computers understand babies' sounds better.