Story2MIDI: Emotionally Aligned Music Generation from Text
By: Mohammad Shokri , Alexandra C. Salem , Gabriel Levine and more
Potential Business Impact:
Turns stories into music that matches feelings.
In this paper, we introduce Story2MIDI, a sequence-to-sequence Transformer-based model for generating emotion-aligned music from a given piece of text. To develop this model, we construct the Story2MIDI dataset by merging existing datasets for sentiment analysis from text and emotion classification in music. The resulting dataset contains pairs of text blurbs and music pieces that evoke the same emotions in the reader or listener. Despite the small scale of our dataset and limited computational resources, our results indicate that our model effectively learns emotion-relevant features in music and incorporates them into its generation process, producing samples with diverse emotional responses. We evaluate the generated outputs using objective musical metrics and a human listening study, confirming the model's ability to capture intended emotional cues.
Similar Papers
Art2Music: Generating Music for Art Images with Multi-modal Feeling Alignment
Sound
Creates music from pictures and words.
Zero-Effort Image-to-Music Generation: An Interpretable RAG-based VLM Approach
Sound
Turns pictures into music with explanations.
MIDI-LLM: Adapting Large Language Models for Text-to-MIDI Music Generation
Sound
Writes music from your words.