SafeGPT: Preventing Data Leakage and Unethical Outputs in Enterprise LLM Use
By: Pratyush Desai , Luoxi Tang , Yuqiao Meng and more
Potential Business Impact:
Keeps company secrets safe from smart computers.
Large Language Models (LLMs) are transforming enterprise workflows but introduce security and ethics challenges when employees inadvertently share confidential data or generate policy-violating content. This paper proposes SafeGPT, a two-sided guardrail system preventing sensitive data leakage and unethical outputs. SafeGPT integrates input-side detection/redaction, output-side moderation/reframing, and human-in-the-loop feedback. Experiments demonstrate SafeGPT effectively reduces data leakage risk and biased outputs while maintaining satisfaction.
Similar Papers
Evaluating Adversarial Vulnerabilities in Modern Large Language Models
Cryptography and Security
Finds ways to trick AI into saying bad things.
A Data-Centric Approach for Safe and Secure Large Language Models against Threatening and Toxic Content
Cryptography and Security
Makes AI say safer, less harmful things.
ChatGPT: Excellent Paper! Accept It. Editor: Imposter Found! Review Rejected
Cryptography and Security
Finds fake science reviews using hidden words.