Protect: Towards Robust Guardrailing Stack for Trustworthy Enterprise LLM Systems
By: Karthik Avinash, Nikhil Pareek, Rishav Hada
Potential Business Impact:
Keeps AI safe with text, pictures, and sounds.
The increasing deployment of Large Language Models (LLMs) across enterprise and mission-critical domains has underscored the urgent need for robust guardrailing systems that ensure safety, reliability, and compliance. Existing solutions often struggle with real-time oversight, multi-modal data handling, and explainability -- limitations that hinder their adoption in regulated environments. Existing guardrails largely operate in isolation, focused on text alone making them inadequate for multi-modal, production-scale environments. We introduce Protect, natively multi-modal guardrailing model designed to operate seamlessly across text, image, and audio inputs, designed for enterprise-grade deployment. Protect integrates fine-tuned, category-specific adapters trained via Low-Rank Adaptation (LoRA) on an extensive, multi-modal dataset covering four safety dimensions: toxicity, sexism, data privacy, and prompt injection. Our teacher-assisted annotation pipeline leverages reasoning and explanation traces to generate high-fidelity, context-aware labels across modalities. Experimental results demonstrate state-of-the-art performance across all safety dimensions, surpassing existing open and proprietary models such as WildGuard, LlamaGuard-4, and GPT-4.1. Protect establishes a strong foundation for trustworthy, auditable, and production-ready safety systems capable of operating across text, image, and audio modalities.
Similar Papers
Evaluating the Robustness of Large Language Model Safety Guardrails Against Adversarial Attacks
Cryptography and Security
Makes AI safer from bad instructions.
OpenGuardrails: A Configurable, Unified, and Scalable Guardrails Platform for Large Language Models
Cryptography and Security
Keeps AI from saying bad things or stealing secrets.
MrGuard: A Multilingual Reasoning Guardrail for Universal LLM Safety
Computation and Language
Keeps AI safe from bad words in any language.