Score: 0

Adversarial versification in portuguese as a jailbreak operator in LLMs

Published: December 17, 2025 | arXiv ID: 2512.15353v1

By: Joao Queiroz

Potential Business Impact:

Makes AI chatbots ignore rules when asked in poems.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Recent evidence shows that the versification of prompts constitutes a highly effective adversarial mechanism against aligned LLMs. The study 'Adversarial poetry as a universal single-turn jailbreak mechanism in large language models' demonstrates that instructions routinely refused in prose become executable when rewritten as verse, producing up to 18 x more safety failures in benchmarks derived from MLCommons AILuminate. Manually written poems reach approximately 62% ASR, and automated versions 43%, with some models surpassing 90% success in single-turn interactions. The effect is structural: systems trained with RLHF, constitutional AI, and hybrid pipelines exhibit consistent degradation under minimal semiotic formal variation. Versification displaces the prompt into sparsely supervised latent regions, revealing guardrails that are excessively dependent on surface patterns. This dissociation between apparent robustness and real vulnerability exposes deep limitations in current alignment regimes. The absence of evaluations in Portuguese, a language with high morphosyntactic complexity, a rich metric-prosodic tradition, and over 250 million speakers, constitutes a critical gap. Experimental protocols must parameterise scansion, metre, and prosodic variation to test vulnerabilities specific to Lusophone patterns, which are currently ignored.

Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models

Computation and Language

Makes AI write bad things using poems.

19 Nov 2025 0

93%

Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models

Computation and Language

Makes AI write bad things even when told not to.

19 Nov 2025 0

88%

Say It Differently: Linguistic Styles as Jailbreak Vectors

Computation and Language

Makes AI safer by spotting tricky wording.

13 Nov 2025 0

View PDF Login to Bookmark

Page Count

15 pages

Adversarial versification in portuguese as a jailbreak operator in LLMs

Makes AI chatbots ignore rules when asked in poems.

Technical Abstract

Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models

Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models

Say It Differently: Linguistic Styles as Jailbreak Vectors