On the Joint Minimization of Regularization Loss Functions in Deep Variational Bayesian Methods for Attribute-Controlled Symbolic Music Generation
By: Matteo Pettenó, Alessandro Ilic Mezza, Alberto Bernardini
Potential Business Impact:
Makes music generators create songs with specific feelings.
Explicit latent variable models provide a flexible yet powerful framework for data synthesis, enabling controlled manipulation of generative factors. With latent variables drawn from a tractable probability density function that can be further constrained, these models enable continuous and semantically rich exploration of the output space by navigating their latent spaces. Structured latent representations are typically obtained through the joint minimization of regularization loss functions. In variational information bottleneck models, reconstruction loss and Kullback-Leibler Divergence (KLD) are often linearly combined with an auxiliary Attribute-Regularization (AR) loss. However, balancing KLD and AR turns out to be a very delicate matter. When KLD dominates over AR, generative models tend to lack controllability; when AR dominates over KLD, the stochastic encoder is encouraged to violate the standard normal prior. We explore this trade-off in the context of symbolic music generation with explicit control over continuous musical attributes. We show that existing approaches struggle to jointly minimize both regularization objectives, whereas suitable attribute transformations can help achieve both controllability and regularization of the target latent dimensions.
Similar Papers
Shortcut Invariance: Targeted Jacobian Regularization in Disentangled Latent Space
Machine Learning (CS)
Makes AI ignore fake clues and learn better.
Beyond the Black Box: Identifiable Interpretation and Control in Generative Models via Causal Minimality
Machine Learning (CS)
Makes AI understand how it makes things.
Toward Architecture-Agnostic Local Control of Posterior Collapse in VAEs
Machine Learning (CS)
Makes computer art look more real and varied.