Score: 2

Efficient Transformer-Based Piano Transcription With Sparse Attention Mechanisms

Published: September 11, 2025 | arXiv ID: 2509.09318v1

By: Weixing Wei, Kazuyoshi Yoshii

Potential Business Impact:

Turns piano music into notes faster.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

This paper investigates automatic piano transcription based on computationally-efficient yet high-performant variants of the Transformer that can capture longer-term dependency over the whole musical piece. Recently, transformer-based sequence-to-sequence models have demonstrated excellent performance in piano transcription. These models, however, fail to deal with the whole piece at once due to the quadratic complexity of the self-attention mechanism, and music signals are thus typically processed in a sliding-window manner in practice. To overcome this limitation, we propose an efficient architecture with sparse attention mechanisms. Specifically, we introduce sliding-window self-attention mechanisms for both the encoder and decoder, and a hybrid global-local cross-attention mechanism that attends to various spans according to the MIDI token types. We also use a hierarchical pooling strategy between the encoder and decoder to further reduce computational load. Our experiments on the MAESTRO dataset showed that the proposed model achieved a significant reduction in computational cost and memory usage, accelerating inference speed, while maintaining transcription performance comparable to the full-attention baseline. This allows for training with longer audio contexts on the same hardware, demonstrating the viability of sparse attention for building efficient and high-performance piano transcription systems. The code is available at https://github.com/WX-Wei/efficient-seq2seq-piano-trans.

Pianist Transformer: Towards Expressive Piano Performance Rendering via Scalable Self-Supervised Pre-Training

Sound

Makes music sound like a real person played it.

2 Dec 2025 2

87%

Efficient Attention Mechanisms for Large Language Models: A Survey

Computation and Language

Makes computers understand long stories faster.

25 Jul 2025 0

87%

Generating Piano Music with Transformers: A Comparative Study of Scale, Data, and Metrics

Sound

Makes computer-made music sound like a human wrote it.

10 Nov 2025 3

View PDF Login to Bookmark

Country of Origin

🇯🇵 Japan

Repos / Data Links

github.com

Page Count

6 pages

Efficient Transformer-Based Piano Transcription With Sparse Attention Mechanisms

Turns piano music into notes faster.

Technical Abstract

Pianist Transformer: Towards Expressive Piano Performance Rendering via Scalable Self-Supervised Pre-Training

Efficient Attention Mechanisms for Large Language Models: A Survey

Generating Piano Music with Transformers: A Comparative Study of Scale, Data, and Metrics