ASTRA: Agentic Steerability and Risk Assessment Framework
By: Itay Hazan , Yael Mathov , Guy Shtar and more
Potential Business Impact:
Makes AI agents follow rules to prevent harm.
Securing AI agents powered by Large Language Models (LLMs) represents one of the most critical challenges in AI security today. Unlike traditional software, AI agents leverage LLMs as their "brain" to autonomously perform actions via connected tools. This capability introduces significant risks that go far beyond those of harmful text presented in a chatbot that was the main application of LLMs. A compromised AI agent can deliberately abuse powerful tools to perform malicious actions, in many cases irreversible, and limited solely by the guardrails on the tools themselves and the LLM ability to enforce them. This paper presents ASTRA, a first-of-its-kind framework designed to evaluate the effectiveness of LLMs in supporting the creation of secure agents that enforce custom guardrails defined at the system-prompt level (e.g., "Do not send an email out of the company domain," or "Never extend the robotic arm in more than 2 meters"). Our holistic framework simulates 10 diverse autonomous agents varying between a coding assistant and a delivery drone equipped with 37 unique tools. We test these agents against a suite of novel attacks developed specifically for agentic threats, inspired by the OWASP Top 10 but adapted to challenge the ability of the LLM for policy enforcement during multi-turn planning and execution of strict tool activation. By evaluating 13 open-source, tool-calling LLMs, we uncovered surprising and significant differences in their ability to remain secure and keep operating within their boundaries. The purpose of this work is to provide the community with a robust and unified methodology to build and validate better LLMs, ultimately pushing for more secure and reliable agentic AI systems.
Similar Papers
Securing Agentic AI: Threat Modeling and Risk Analysis for Network Monitoring Agentic AI System
Cryptography and Security
Protects smart AI from being tricked or broken.
ASTREA: Introducing Agentic Intelligence for Orbital Thermal Autonomy
Robotics
Space robot learns to control temperature better.
ASTRA: Autonomous Spatial-Temporal Red-teaming for AI Software Assistants
Cryptography and Security
Finds hidden mistakes in AI-written code.