Automatic Music Mixing using a Generative Model of Effect Embeddings
By: Eloi Moliner , Marco A. Martínez-Ramírez , Junghyun Koo and more
Potential Business Impact:
Makes music sound better automatically.
Music mixing involves combining individual tracks into a cohesive mixture, a task characterized by subjectivity where multiple valid solutions exist for the same input. Existing automatic mixing systems treat this task as a deterministic regression problem, thus ignoring this multiplicity of solutions. Here we introduce MEGAMI (Multitrack Embedding Generative Auto MIxing), a generative framework that models the conditional distribution of professional mixes given unprocessed tracks. MEGAMI uses a track-agnostic effects processor conditioned on per-track generated embeddings, handles arbitrary unlabeled tracks through a permutation-equivariant architecture, and enables training on both dry and wet recordings via domain adaptation. Our objective evaluation using distributional metrics shows consistent improvements over existing methods, while listening tests indicate performances approaching human-level quality across diverse musical genres.
Similar Papers
AV-Edit: Multimodal Generative Sound Effect Editing via Audio-Visual Semantic Joint Control
Multimedia
Changes video sounds using pictures and words.
AutoMashup: Automatic Music Mashups Creation
Sound
Makes music mixes by matching sounds automatically.
Art2Music: Generating Music for Art Images with Multi-modal Feeling Alignment
Sound
Creates music from pictures and words.