Score: 0

Guardians and Offenders: A Survey on Harmful Content Generation and Safety Mitigation of LLM

Published: August 7, 2025 | arXiv ID: 2508.05775v2

By: Chi Zhang , Changjia Zhu , Junjie Xiong and more

Potential Business Impact:

Makes AI safer and less likely to say bad things.

Large Language Models (LLMs) have revolutionized content creation across digital platforms, offering unprecedented capabilities in natural language generation and understanding. These models enable beneficial applications such as content generation, question and answering (Q&A), programming, and code reasoning. Meanwhile, they also pose serious risks by inadvertently or intentionally producing toxic, offensive, or biased content. This dual role of LLMs, both as powerful tools for solving real-world problems and as potential sources of harmful language, presents a pressing sociotechnical challenge. In this survey, we systematically review recent studies spanning unintentional toxicity, adversarial jailbreaking attacks, and content moderation techniques. We propose a unified taxonomy of LLM-related harms and defenses, analyze emerging multimodal and LLM-assisted jailbreak strategies, and assess mitigation efforts, including reinforcement learning with human feedback (RLHF), prompt engineering, and safety alignment. Our synthesis highlights the evolving landscape of LLM safety, identifies limitations in current evaluation methodologies, and outlines future research directions to guide the development of robust and ethically aligned language technologies.

Guardians and Offenders: A Survey on Harmful Content Generation and Safety Mitigation

Computation and Language

Makes AI safer and less likely to say bad things.

7 Aug 2025 0

92%

A Survey on Responsible LLMs: Inherent Risk, Malicious Use, and Mitigation Strategy

Artificial Intelligence

Makes AI safer and more honest.

16 Jan 2025 0

92%

LLM Harms: A Taxonomy and Discussion

Computers and Society

Makes AI safer and fairer for everyone.

5 Dec 2025 1

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

33 pages

Guardians and Offenders: A Survey on Harmful Content Generation and Safety Mitigation of LLM

Makes AI safer and less likely to say bad things.

Technical Abstract

Guardians and Offenders: A Survey on Harmful Content Generation and Safety Mitigation

A Survey on Responsible LLMs: Inherent Risk, Malicious Use, and Mitigation Strategy

LLM Harms: A Taxonomy and Discussion