Score: 1

Steering Over-refusals Towards Safety in Retrieval Augmented Generation

Published: October 12, 2025 | arXiv ID: 2510.10452v1

By: Utsav Maskey, Mark Dras, Usman Naseem

Potential Business Impact:

Helps AI understand safe questions better.

Business Areas:

Semantic Search Internet Services

Safety alignment in large language models (LLMs) induces over-refusals -- where LLMs decline benign requests due to aggressive safety filters. We analyze this phenomenon in retrieval-augmented generation (RAG), where both the query intent and retrieved context properties influence refusal behavior. We construct RagRefuse, a domain-stratified benchmark spanning medical, chemical, and open domains, pairing benign and harmful queries with controlled context contamination patterns and sizes. Our analysis shows that context arrangement / contamination, domain of query and context, and harmful-text density trigger refusals even on benign queries, with effects depending on model-specific alignment choices. To mitigate over-refusals, we introduce \textsc{SafeRAG-Steering}, a model-centric embedding intervention that steers the embedding regions towards the confirmed safe, non-refusing output regions at inference time. This reduces over-refusals in contaminated RAG pipelines while preserving legitimate refusals.

RAG LLMs are Not Safer: A Safety Analysis of Retrieval-Augmented Generation for Large Language Models

Computation and Language

Makes AI that uses outside info less safe.

25 Apr 2025 1

90%

RAGuard: A Novel Approach for in-context Safe Retrieval Augmented Generation for LLMs

Artificial Intelligence

Keeps wind turbines safe and working right.

3 Sep 2025 0

90%

When Evidence Contradicts: Toward Safer Retrieval-Augmented Generation in Healthcare

Information Retrieval

Makes AI give better medicine answers.

10 Nov 2025 1

View PDF Login to Bookmark

Country of Origin

🇦🇺 Australia

Repos / Data Links

github.com

Page Count

7 pages

Steering Over-refusals Towards Safety in Retrieval Augmented Generation

Helps AI understand safe questions better.

Technical Abstract

RAG LLMs are Not Safer: A Safety Analysis of Retrieval-Augmented Generation for Large Language Models

RAGuard: A Novel Approach for in-context Safe Retrieval Augmented Generation for LLMs

When Evidence Contradicts: Toward Safer Retrieval-Augmented Generation in Healthcare