Score: 0

Protect: Towards Robust Guardrailing Stack for Trustworthy Enterprise LLM Systems

Published: October 15, 2025 | arXiv ID: 2510.13351v1

By: Karthik Avinash, Nikhil Pareek, Rishav Hada

Potential Business Impact:

Keeps AI safe with text, pictures, and sounds.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

The increasing deployment of Large Language Models (LLMs) across enterprise and mission-critical domains has underscored the urgent need for robust guardrailing systems that ensure safety, reliability, and compliance. Existing solutions often struggle with real-time oversight, multi-modal data handling, and explainability -- limitations that hinder their adoption in regulated environments. Existing guardrails largely operate in isolation, focused on text alone making them inadequate for multi-modal, production-scale environments. We introduce Protect, natively multi-modal guardrailing model designed to operate seamlessly across text, image, and audio inputs, designed for enterprise-grade deployment. Protect integrates fine-tuned, category-specific adapters trained via Low-Rank Adaptation (LoRA) on an extensive, multi-modal dataset covering four safety dimensions: toxicity, sexism, data privacy, and prompt injection. Our teacher-assisted annotation pipeline leverages reasoning and explanation traces to generate high-fidelity, context-aware labels across modalities. Experimental results demonstrate state-of-the-art performance across all safety dimensions, surpassing existing open and proprietary models such as WildGuard, LlamaGuard-4, and GPT-4.1. Protect establishes a strong foundation for trustworthy, auditable, and production-ready safety systems capable of operating across text, image, and audio modalities.

Evaluating the Robustness of Large Language Model Safety Guardrails Against Adversarial Attacks

Cryptography and Security

Makes AI safer from bad instructions.

27 Nov 2025 0

91%

OpenGuardrails: A Configurable, Unified, and Scalable Guardrails Platform for Large Language Models

Cryptography and Security

Keeps AI from saying bad things or stealing secrets.

22 Oct 2025 2

91%

MrGuard: A Multilingual Reasoning Guardrail for Universal LLM Safety

Computation and Language

Keeps AI safe from bad words in any language.

21 Apr 2025 1

View PDF Login to Bookmark

Page Count

20 pages

Protect: Towards Robust Guardrailing Stack for Trustworthy Enterprise LLM Systems

Keeps AI safe with text, pictures, and sounds.

Technical Abstract

Evaluating the Robustness of Large Language Model Safety Guardrails Against Adversarial Attacks

OpenGuardrails: A Configurable, Unified, and Scalable Guardrails Platform for Large Language Models

MrGuard: A Multilingual Reasoning Guardrail for Universal LLM Safety