Score: 1

Via Score to Performance: Efficient Human-Controllable Long Song Generation with Bar-Level Symbolic Notation

Published: August 2, 2025 | arXiv ID: 2508.01394v1

By: Tongxi Wang , Yang Yu , Qing Wang and more

Potential Business Impact:

Makes AI create music that sounds good and lasts longer.

Song generation is regarded as the most challenging problem in music AIGC; nonetheless, existing approaches have yet to fully overcome four persistent limitations: controllability, generalizability, perceptual quality, and duration. We argue that these shortcomings stem primarily from the prevailing paradigm of attempting to learn music theory directly from raw audio, a task that remains prohibitively difficult for current models. To address this, we present Bar-level AI Composing Helper (BACH), the first model explicitly designed for song generation through human-editable symbolic scores. BACH introduces a tokenization strategy and a symbolic generative procedure tailored to hierarchical song structure. Consequently, it achieves substantial gains in the efficiency, duration, and perceptual quality of song generation. Experiments demonstrate that BACH, with a small model size, establishes a new SOTA among all publicly reported song generation systems, even surpassing commercial solutions such as Suno. Human evaluations further confirm its superiority across multiple subjective metrics.

Country of Origin
πŸ‡ΈπŸ‡¬ Singapore

Page Count
10 pages

Category
Computer Science:
Sound