Evaluating Self-Supervised Speech Models via Text-Based LLMS
By: Takashi Maekaku , Keita Goto , Jinchuan Tian and more
Potential Business Impact:
Lets computers check how well other computers learned.
Self-Supervised Learning (SSL) has gained traction for its ability to learn rich representations with low labeling costs, applicable across diverse downstream tasks. However, assessing the downstream-task performance remains challenging due to the cost of extra training and evaluation. Existing methods for task-agnostic evaluation also require extra training or hyperparameter tuning. We propose a novel evaluation metric using large language models (LLMs). By inputting discrete token sequences and minimal domain cues derived from SSL models into LLMs, we obtain the mean log-likelihood; these cues guide in-context learning, rendering the score more reliable without extra training or hyperparameter tuning. Experimental results show a correlation between LLM-based scores and automatic speech recognition task. Additionally, our findings reveal that LLMs not only functions as an SSL evaluation tools but also provides inference-time embeddings that are useful for speaker verification task.
Similar Papers
Assessment of L2 Oral Proficiency using Speech Large Language Models
Computation and Language
Helps computers grade how well people speak English.
Bridging the Evaluation Gap: Leveraging Large Language Models for Topic Model Evaluation
Computation and Language
Helps find science papers by understanding changing topics.
Spoken Language Understanding on Unseen Tasks With In-Context Learning
Computation and Language
Teaches computers to understand new spoken words.