Via Score to Performance: Efficient Human-Controllable Long Song Generation with Bar-Level Symbolic Notation
By: Tongxi Wang , Yang Yu , Qing Wang and more
Potential Business Impact:
Makes AI create music that sounds good and lasts longer.
Song generation is regarded as the most challenging problem in music AIGC; nonetheless, existing approaches have yet to fully overcome four persistent limitations: controllability, generalizability, perceptual quality, and duration. We argue that these shortcomings stem primarily from the prevailing paradigm of attempting to learn music theory directly from raw audio, a task that remains prohibitively difficult for current models. To address this, we present Bar-level AI Composing Helper (BACH), the first model explicitly designed for song generation through human-editable symbolic scores. BACH introduces a tokenization strategy and a symbolic generative procedure tailored to hierarchical song structure. Consequently, it achieves substantial gains in the efficiency, duration, and perceptual quality of song generation. Experiments demonstrate that BACH, with a small model size, establishes a new SOTA among all publicly reported song generation systems, even surpassing commercial solutions such as Suno. Human evaluations further confirm its superiority across multiple subjective metrics.
Similar Papers
Musical Score Understanding Benchmark: Evaluating Large Language Models' Comprehension of Complete Musical Scores
Sound
Helps computers understand music scores like a human.
XMusic: Towards a Generalized and Controllable Symbolic Music Generation Framework
Sound
Makes computers create music from pictures or humming.
SAGE-Music: Low-Latency Symbolic Music Generation via Attribute-Specialized Key-Value Head Sharing
Sound
Makes AI create music faster without losing quality.