MIDI-LLM: Adapting Large Language Models for Text-to-MIDI Music Generation
By: Shih-Lun Wu, Yoon Kim, Cheng-Zhi Anna Huang
Potential Business Impact:
Writes music from your words.
We present MIDI-LLM, an LLM for generating multitrack MIDI music from free-form text prompts. Our approach expands a text LLM's vocabulary to include MIDI tokens, and uses a two-stage training recipe to endow text-to-MIDI abilities. By preserving the original LLM's parameter structure, we can directly leverage the vLLM library for accelerated inference. Experiments show that MIDI-LLM achieves higher quality, better text control, and faster inference compared to the recent Text2midi model. Live demo at https://midi-llm-demo.vercel.app.
Similar Papers
Large Language Models' Internal Perception of Symbolic Music
Computation and Language
Computers learn music from text descriptions.
Integrating Large Language Models into Text Animation: An Intelligent Editing System with Inline and Chat Interaction
Human-Computer Interaction
Makes creating animated text videos easy for anyone.
Training-Free Multimodal Large Language Model Orchestration
Computation and Language
Lets AI understand and talk using pictures and words.