A Domain-Agnostic Scalable AI Safety Ensuring Framework
By: Beomjun Kim , Kangyeon Kim , Sunwoo Kim and more
Potential Business Impact:
Makes AI safe and reliable for any job.
Ensuring the safety of AI systems has emerged as a critical priority as these systems are increasingly deployed in real-world applications. We propose a novel domain-agnostic framework that guarantees AI systems satisfy user-defined safety constraints with specified probabilities. Our approach combines any AI model with an optimization problem that ensures outputs meet safety requirements while maintaining performance. The key challenge is handling uncertain constraints -- those whose satisfaction cannot be deterministically evaluated~(e.g., whether a chatbot response is ``harmful''). We address this through three innovations: (1) a safety classification model that assesses constraint satisfaction probability, (2) internal test data to evaluate this classifier's reliability, and (3) conservative testing to prevent overfitting when this data is used in training. We prove our method guarantees probabilistic safety under mild conditions and establish the first scaling law in AI safety -- showing that the safety-performance trade-off improves predictably with more internal test data. Experiments across production planning, reinforcement learning, and language generation demonstrate our framework achieves up to 140 times better safety than existing methods at the same performance levels. This work enables AI systems to achieve both rigorous safety guarantees and high performance across diverse domains.
Similar Papers
Towards provable probabilistic safety for scalable embodied AI systems
Systems and Control
Makes robots safer by predicting rare problems.
A Safety and Security Framework for Real-World Agentic Systems
Machine Learning (CS)
Makes smart computer helpers safer to use.
AGENTSAFE: A Unified Framework for Ethical Assurance and Governance in Agentic AI
Multiagent Systems
Makes AI agents safer and more trustworthy.