SafeGen: Embedding Ethical Safeguards in Text-to-Image Generation
By: Dang Phuong Nam, Nguyen Kieu, Pham Thanh Hieu
Generative Artificial Intelligence (AI) has created unprecedented opportunities for creative expression, education, and research. Text-to-image systems such as DALL.E, Stable Diffusion, and Midjourney can now convert ideas into visuals within seconds, but they also present a dual-use dilemma, raising critical ethical concerns: amplifying societal biases, producing high-fidelity disinformation, and violating intellectual property. This paper introduces SafeGen, a framework that embeds ethical safeguards directly into the text-to-image generation pipeline, grounding its design in established principles for Trustworthy AI. SafeGen integrates two complementary components: BGE-M3, a fine-tuned text classifier that filters harmful or misleading prompts, and Hyper-SD, an optimized diffusion model that produces high fidelity, semantically aligned images. Built on a curated multilingual (English- Vietnamese) dataset and a fairness-aware training process, SafeGen demonstrates that creative freedom and ethical responsibility can be reconciled within a single workflow. Quantitative evaluations confirm its effectiveness, with Hyper-SD achieving IS = 3.52, FID = 22.08, and SSIM = 0.79, while BGE-M3 reaches an F1-Score of 0.81. An ablation study further validates the importance of domain-specific fine-tuning for both modules. Case studies illustrate SafeGen's practical impact in blocking unsafe prompts, generating inclusive teaching materials, and reinforcing academic integrity.
Similar Papers
Training-Free Safe Text Embedding Guidance for Text-to-Image Diffusion Models
Machine Learning (CS)
Makes AI art generators avoid making bad pictures.
SP-Guard: Selective Prompt-adaptive Guidance for Safe Text-to-Image Generation
CV and Pattern Recognition
Stops AI from making bad pictures.
SafeGuider: Robust and Practical Content Safety Control for Text-to-Image Models
Cryptography and Security
Stops AI from making bad pictures from words.