Score: 2

A Controllable Perceptual Feature Generative Model for Melody Harmonization via Conditional Variational Autoencoder

Published: November 18, 2025 | arXiv ID: 2511.14600v1

By: Dengyun Huang, Yonghua Zhu

Potential Business Impact:

Creates new music with feeling and style.

Business Areas:
Musical Instruments Media and Entertainment, Music and Audio

While Large Language Models (LLMs) make symbolic music generation increasingly accessible, producing music with distinctive composition and rich expressiveness remains a significant challenge. Many studies have introduced emotion models to guide the generative process. However, these approaches still fall short of delivering novelty and creativity. In the field of Music Information Retrieval (MIR), auditory perception is recognized as a key dimension of musical experience, offering insights into both compositional intent and emotional patterns. To this end, we propose a neural network named CPFG-Net, along with a transformation algorithm that maps perceptual feature values to chord representations, enabling melody harmonization. The system can controllably predict sequences of perceptual features and tonal structures from given melodies, and subsequently generate harmonically coherent chord progressions. Our network is trained on our newly constructed perceptual feature dataset BCPT-220K, derived from classical music. Experimental results show state-of-the-art perceptual feature prediction capability of our model as well as demonstrate our musical expressiveness and creativity in chord inference. This work offers a novel perspective on melody harmonization and contributes to broader music generation tasks. Our symbolic-based model can be easily extended to audio-based models.

Country of Origin
🇨🇳 China

Repos / Data Links

Page Count
13 pages

Category
Computer Science:
Sound