CAK: Emergent Audio Effects from Minimal Deep Learning
By: Austin Rockman
Potential Business Impact:
Makes music from just a few sounds.
We demonstrate that a single 3x3 convolutional kernel can produce emergent audio effects when trained on 200 samples from a personalized corpus. We achieve this through two key techniques: (1) Conditioning Aware Kernels (CAK), where output = input + (learned_pattern x control), with a soft-gate mechanism supporting identity preservation at zero control; and (2) AuGAN (Audit GAN), which reframes adversarial training from "is this real?" to "did you apply the requested value?" Rather than learning to generate or detect forgeries, our networks cooperate to verify control application, discovering unique transformations. The learned kernel exhibits a diagonal structure creating frequency-dependent temporal shifts that are capable of producing musical effects based on input characteristics. Our results show the potential of adversarial training to discover audio transformations from minimal data, enabling new approaches to effect design.
Similar Papers
Time-Varying Audio Effect Modeling by End-to-End Adversarial Training
Sound
Makes music effects work without hidden controls.
Multi-level SSL Feature Gating for Audio Deepfake Detection
Sound
Stops fake voices from tricking people.
Emotion Detection Using Conditional Generative Adversarial Networks (cGAN): A Deep Learning Approach
Machine Learning (CS)
Computers understand your feelings from voice, text, face.