Score: 2

Speech transformer models for extracting information from baby cries

Published: September 2, 2025 | arXiv ID: 2509.02259v1

By: Guillem Bonafos , Jéremy Rouch , Lény Lego and more

Potential Business Impact:

Helps computers understand baby cries and emotions.

Business Areas:
Speech Recognition Data and Analytics, Software

Transfer learning using latent representations from pre-trained speech models achieves outstanding performance in tasks where labeled data is scarce. However, their applicability to non-speech data and the specific acoustic properties encoded in these representations remain largely unexplored. In this study, we investigate both aspects. We evaluate five pre-trained speech models on eight baby cries datasets, encompassing 115 hours of audio from 960 babies. For each dataset, we assess the latent representations of each model across all available classification tasks. Our results demonstrate that the latent representations of these models can effectively classify human baby cries and encode key information related to vocal source instability and identity of the crying baby. In addition, a comparison of the architectures and training strategies of these models offers valuable insights for the design of future models tailored to similar tasks, such as emotion detection.

Country of Origin
🇫🇷 France

Repos / Data Links

Page Count
5 pages

Category
Computer Science:
Sound