Score: 0

Heterogeneous Self-Supervised Acoustic Pre-Training with Local Constraints

Published: August 27, 2025 | arXiv ID: 2508.19990v2

By: Xiaodong Cui , A F M Saif , Brian Kingsbury and more

Potential Business Impact:

Teaches computers to understand many kinds of speech.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Self-supervised pre-training using unlabeled data is widely used in automatic speech recognition. In this paper, we propose a new self-supervised pre-training approach to dealing with heterogeneous data. Instead of mixing all the data and minimizing the averaged global loss in the conventional way, we impose additional local constraints to ensure that the model optimizes each source of heterogeneous data to its local optimum after $K$-step gradient descent initialized from the model. We formulate this as a bilevel optimization problem, and use the first-order approximation method to solve the problem. We discuss its connection to model-agnostic meta learning. Experiments are carried out on self-supervised pre-training using multi-domain and multilingual datasets, demonstrating that the proposed approach can significantly improve the adaptivity of the self-supervised pre-trained model for the downstream supervised fine-tuning tasks.

Self-Supervised Pre-Training with Equilibrium Constraints

Machine Learning (CS)

Teaches computers to learn from mixed data better.

27 Aug 2025 0

88%

Controlling Contrastive Self-Supervised Learning with Knowledge-Driven Multiple Hypothesis: Application to Beat Tracking

Sound

Helps music AI understand rhythm better.

29 Oct 2025 0

87%

MT-HuBERT: Self-Supervised Mix-Training for Few-Shot Keyword Spotting in Mixed Speech

Sound

Helps voice assistants hear many words at once.

9 Nov 2025 1

View PDF Login to Bookmark

Page Count

5 pages

Heterogeneous Self-Supervised Acoustic Pre-Training with Local Constraints

Teaches computers to understand many kinds of speech.

Technical Abstract

Self-Supervised Pre-Training with Equilibrium Constraints

Controlling Contrastive Self-Supervised Learning with Knowledge-Driven Multiple Hypothesis: Application to Beat Tracking

MT-HuBERT: Self-Supervised Mix-Training for Few-Shot Keyword Spotting in Mixed Speech