Score: 1

RiskAtlas: Exposing Domain-Specific Risks in LLMs through Knowledge-Graph-Guided Harmful Prompt Generation

Published: January 8, 2026 | arXiv ID: 2601.04740v1

By: Huawei Zheng , Xinqi Jiang , Sen Yang and more

Potential Business Impact:

Makes AI safer from tricky, hidden dangers.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Large language models (LLMs) are increasingly applied in specialized domains such as finance and healthcare, where they introduce unique safety risks. Domain-specific datasets of harmful prompts remain scarce and still largely rely on manual construction; public datasets mainly focus on explicit harmful prompts, which modern LLM defenses can often detect and refuse. In contrast, implicit harmful prompts-expressed through indirect domain knowledge-are harder to detect and better reflect real-world threats. We identify two challenges: transforming domain knowledge into actionable constraints and increasing the implicitness of generated harmful prompts. To address them, we propose an end-to-end framework that first performs knowledge-graph-guided harmful prompt generation to systematically produce domain-relevant prompts, and then applies dual-path obfuscation rewriting to convert explicit harmful prompts into implicit variants via direct and context-enhanced rewriting. This framework yields high-quality datasets combining strong domain relevance with implicitness, enabling more realistic red-teaming and advancing LLM safety research. We release our code and datasets at GitHub.

GraphAttack: Exploiting Representational Blindspots in LLM Safety Mechanisms

Cryptography and Security

Finds ways to trick AI into saying bad things.

17 Apr 2025 0

91%

Guardians and Offenders: A Survey on Harmful Content Generation and Safety Mitigation

Computation and Language

Makes AI safer and less likely to say bad things.

7 Aug 2025 0

90%

Guardians and Offenders: A Survey on Harmful Content Generation and Safety Mitigation of LLM

Computation and Language

Makes AI safer and less likely to say bad things.

7 Aug 2025 0

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Repos / Data Links

github.com

Page Count

18 pages

RiskAtlas: Exposing Domain-Specific Risks in LLMs through Knowledge-Graph-Guided Harmful Prompt Generation

Makes AI safer from tricky, hidden dangers.

Technical Abstract

GraphAttack: Exploiting Representational Blindspots in LLM Safety Mechanisms

Guardians and Offenders: A Survey on Harmful Content Generation and Safety Mitigation

Guardians and Offenders: A Survey on Harmful Content Generation and Safety Mitigation of LLM