Poisoned at Scale: A Scalable Audit Uncovers Hidden Scam Endpoints in Production LLMs
By: Zhiyang Chen , Tara Saba , Xun Deng and more
Potential Business Impact:
AI models can accidentally create harmful code.
Large Language Models (LLMs) have become critical to modern software development, but their reliance on internet datasets for training introduces a significant security risk: the absorption and reproduction of malicious content. To evaluate this threat, this paper introduces a scalable, automated audit framework that synthesizes innocuous, developer-style prompts from known scam databases to query production LLMs and determine if they generate code containing harmful URLs. We conducted a large-scale evaluation across four production LLMs (GPT-4o, GPT-4o-mini, Llama-4-Scout, and DeepSeek-V3), and found a systemic vulnerability, with all tested models generating malicious code at a non-negligible rate. On average, 4.2\% of programs generated in our experiments contained malicious URLs. Crucially, this malicious code is often generated in response to benign prompts. We manually validate the prompts which cause all four LLMs to generate malicious code, and resulting in 177 innocuous prompts that trigger all models to produce harmful outputs. These results provide strong empirical evidence that the training data of production LLMs has been successfully poisoned at scale, underscoring the urgent need for more robust defense mechanisms and post-generation safety checks to mitigate the propagation of hidden security threats.
Similar Papers
On The Dangers of Poisoned LLMs In Security Automation
Cryptography and Security
Makes AI ignore important warnings on purpose.
A Systematic Review of Poisoning Attacks Against Large Language Models
Cryptography and Security
Stops bad guys from tricking AI models.
Exploiting Web Search Tools of AI Agents for Data Exfiltration
Cryptography and Security
Protects smart computer brains from being tricked.