Population-Aligned Audio Reproduction With LLM-Based Equalizers
By: Ioannis Stylianou , Jon Francombe , Pablo Martinez-Nuevo and more
Conventional audio equalization is a static process that requires manual and cumbersome adjustments to adapt to changing listening contexts (e.g., mood, location, or social setting). In this paper, we introduce a Large Language Model (LLM)-based alternative that maps natural language text prompts to equalization settings. This enables a conversational approach to sound system control. By utilizing data collected from a controlled listening experiment, our models exploit in-context learning and parameter-efficient fine-tuning techniques to reliably align with population-preferred equalization settings. Our evaluation methods, which leverage distributional metrics that capture users' varied preferences, show statistically significant improvements in distributional alignment over random sampling and static preset baselines. These results indicate that LLMs could function as "artificial equalizers," contributing to the development of more accessible, context-aware, and expert-level audio tuning methods.
Similar Papers
Audio Large Language Models Can Be Descriptive Speech Quality Evaluators
Sound
Helps computers understand if speech sounds good.
Can Large Language Models Predict Audio Effects Parameters from Natural Language?
Sound
Lets you control music effects with words.
AudioJudge: Understanding What Works in Large Audio Model Based Speech Evaluation
Computation and Language
Lets computers judge speech quality like people.