Steering Autoregressive Music Generation with Recursive Feature Machines
By: Daniel Zhao , Daniel Beaglehole , Taylor Berg-Kirkpatrick and more
Potential Business Impact:
Guides music AI to play specific notes.
Controllable music generation remains a significant challenge, with existing methods often requiring model retraining or introducing audible artifacts. We introduce MusicRFM, a framework that adapts Recursive Feature Machines (RFMs) to enable fine-grained, interpretable control over frozen, pre-trained music models by directly steering their internal activations. RFMs analyze a model's internal gradients to produce interpretable "concept directions", or specific axes in the activation space that correspond to musical attributes like notes or chords. We first train lightweight RFM probes to discover these directions within MusicGen's hidden states; then, during inference, we inject them back into the model to guide the generation process in real-time without per-step optimization. We present advanced mechanisms for this control, including dynamic, time-varying schedules and methods for the simultaneous enforcement of multiple musical properties. Our method successfully navigates the trade-off between control and generation quality: we can increase the accuracy of generating a target musical note from 0.23 to 0.82, while text prompt adherence remains within approximately 0.02 of the unsteered baseline, demonstrating effective control with minimal impact on prompt fidelity. We release code to encourage further exploration on RFMs in the music domain.
Similar Papers
Fine-Grained control over Music Generation with Activation Steering
Sound
Changes music's sound, style, and genre.
ARM-FM: Automated Reward Machines via Foundation Models for Compositional Reinforcement Learning
Artificial Intelligence
Teaches robots new tasks from simple words.
A Controllable Perceptual Feature Generative Model for Melody Harmonization via Conditional Variational Autoencoder
Sound
Creates new music with feeling and style.