Score: 2

MIDI-LLM: Adapting Large Language Models for Text-to-MIDI Music Generation

Published: November 6, 2025 | arXiv ID: 2511.03942v1

By: Shih-Lun Wu, Yoon Kim, Cheng-Zhi Anna Huang

BigTech Affiliations: Massachusetts Institute of Technology

Potential Business Impact:

Writes music from your words.

Business Areas:
Translation Service Professional Services

We present MIDI-LLM, an LLM for generating multitrack MIDI music from free-form text prompts. Our approach expands a text LLM's vocabulary to include MIDI tokens, and uses a two-stage training recipe to endow text-to-MIDI abilities. By preserving the original LLM's parameter structure, we can directly leverage the vLLM library for accelerated inference. Experiments show that MIDI-LLM achieves higher quality, better text control, and faster inference compared to the recent Text2midi model. Live demo at https://midi-llm-demo.vercel.app.

Country of Origin
🇺🇸 United States

Repos / Data Links

Page Count
6 pages

Category
Computer Science:
Sound