Score: 0

Reasoning over Precedents Alongside Statutes: Case-Augmented Deliberative Alignment for LLM Safety

Published: January 12, 2026 | arXiv ID: 2601.08000v1

By: Can Jin , Rui Wu , Tong Che and more

Ensuring that Large Language Models (LLMs) adhere to safety principles without refusing benign requests remains a significant challenge. While OpenAI introduces deliberative alignment (DA) to enhance the safety of its o-series models through reasoning over detailed ``code-like'' safety rules, the effectiveness of this approach in open-source LLMs, which typically lack advanced reasoning capabilities, is understudied. In this work, we systematically evaluate the impact of explicitly specifying extensive safety codes versus demonstrating them through illustrative cases. We find that referencing explicit codes inconsistently improves harmlessness and systematically degrades helpfulness, whereas training on case-augmented simple codes yields more robust and generalized safety behaviors. By guiding LLMs with case-augmented reasoning instead of extensive code-like safety rules, we avoid rigid adherence to narrowly enumerated rules and enable broader adaptability. Building on these insights, we propose CADA, a case-augmented deliberative alignment method for LLMs utilizing reinforcement learning on self-generated safety reasoning chains. CADA effectively enhances harmlessness, improves robustness against attacks, and reduces over-refusal while preserving utility across diverse benchmarks, offering a practical alternative to rule-only DA for improving safety while maintaining helpfulness.

Rethinking Autonomy: Preventing Failures in AI-Driven Software Engineering

Software Engineering

Makes AI writing computer code safer.

15 Aug 2025 1

89%

Balancing Safety and Helpfulness in Healthcare AI Assistants through Iterative Preference Alignment

Artificial Intelligence

Makes AI doctors safer by catching bad advice.

3 Dec 2025 1

89%

Towards Safety Reasoning in LLMs: AI-agentic Deliberation for Policy-embedded CoT Data Creation

Artificial Intelligence

Makes AI safer by teaching it rules.

27 May 2025 2

View PDF Login to Bookmark

Reasoning over Precedents Alongside Statutes: Case-Augmented Deliberative Alignment for LLM Safety

Technical Abstract

Rethinking Autonomy: Preventing Failures in AI-Driven Software Engineering

Balancing Safety and Helpfulness in Healthcare AI Assistants through Iterative Preference Alignment

Towards Safety Reasoning in LLMs: AI-agentic Deliberation for Policy-embedded CoT Data Creation