Score: 1

SAGE-Music: Low-Latency Symbolic Music Generation via Attribute-Specialized Key-Value Head Sharing

Published: October 1, 2025 | arXiv ID: 2510.00395v1

By: Jiaye Tan , Haonan Luo , Linfeng Song and more

Potential Business Impact:

Makes AI create music faster without losing quality.

Business Areas:

A/B Testing Data and Analytics

Low-latency symbolic music generation is essential for real-time improvisation and human-AI co-creation. Existing transformer-based models, however, face a trade-off between inference speed and musical quality. Traditional acceleration techniques such as embedding pooling significantly degrade quality, while recently proposed Byte Pair Encoding (BPE) methods - though effective on single-track piano data - suffer large performance drops in multi-track settings, as revealed by our analysis. We propose Attribute-Specialized Key-Value Head Sharing (AS-KVHS), adapted to music's structured symbolic representation, achieving about 30% inference speedup with only a negligible (about 0.4%) quality drop in objective evaluations and slight improvements in subjective listening tests. Our main contributions are (1) the first systematic study of BPE's generalizability in multi-track symbolic music, and (2) the introduction of AS-KVHS for low-latency symbolic music generation. Beyond these, we also release SAGE-Music, an open-source benchmark that matches or surpasses state-of-the-art models in generation quality.

Via Score to Performance: Efficient Human-Controllable Long Song Generation with Bar-Level Symbolic Notation

Sound

Makes AI create music that sounds good and lasts longer.

2 Aug 2025 1

86%

Generating Piano Music with Transformers: A Comparative Study of Scale, Data, and Metrics

Sound

Makes computer-made music sound like a human wrote it.

10 Nov 2025 3

86%

PhraseVAE and PhraseLDM: Latent Diffusion for Full-Song Multitrack Symbolic Music Generation

Sound

Makes computers write whole songs in seconds.

12 Dec 2025 0

View PDF Login to Bookmark

Page Count

12 pages

SAGE-Music: Low-Latency Symbolic Music Generation via Attribute-Specialized Key-Value Head Sharing

Makes AI create music faster without losing quality.

Technical Abstract

Via Score to Performance: Efficient Human-Controllable Long Song Generation with Bar-Level Symbolic Notation

Generating Piano Music with Transformers: A Comparative Study of Scale, Data, and Metrics

PhraseVAE and PhraseLDM: Latent Diffusion for Full-Song Multitrack Symbolic Music Generation