Bob's Confetti: Phonetic Memorization Attacks in Music and Video Generation
By: Jaechul Roh , Zachary Novack , Yuefeng Peng and more
Potential Business Impact:
AI copies music and videos from altered lyrics.
Lyrics-to-Song (LS2) generation models promise end-to-end music synthesis from text, yet their vulnerability to training data memorization remains underexplored. We introduce Adversarial PhoneTic Prompting (APT), a novel attack where lyrics are semantically altered while preserving their acoustic structure through homophonic substitutions (e.g., Eminem's famous "mom's spaghetti" $\rightarrow$ "Bob's confetti"). Despite these distortions, we uncover a powerful form of sub-lexical memorization: models like SUNO and YuE regenerate outputs strikingly similar to known training content, achieving high similarity across audio-domain metrics, including CLAP, AudioJudge, and CoverID. This vulnerability persists across multiple languages and genres. More surprisingly, we discover that phoneme-altered lyrics alone can trigger visual memorization in text-to-video models. When prompted with phonetically modified lyrics from Lose Yourself, Veo 3 reconstructs visual elements from the original music video -- including character appearance and scene composition -- despite no visual cues in the prompt. We term this phenomenon phonetic-to-visual regurgitation. Together, these findings expose a critical vulnerability in transcript-conditioned multimodal generation: phonetic prompting alone can unlock memorized audiovisual content, raising urgent questions about copyright, safety, and content provenance in modern generative systems. Example generations are available on our demo page (jrohsc.github.io/music_attack/).
Similar Papers
Bob's Confetti: Phonetic Memorization Attacks in Music and Video Generation
Sound
AI models copy songs using sound-alike words.
Lyrics Matter: Exploiting the Power of Learnt Representations for Music Popularity Prediction
Sound
Predicts song hits using lyrics and sound.
Talking Like a Phisher: LLM-Based Attacks on Voice Phishing Classifiers
Cryptography and Security
Makes fake voice calls trick computers.