Score: 0

ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback

Published: January 15, 2026 | arXiv ID: 2601.10156v1

By: Yutao Mou , Zhangchi Xue , Lijun Li and more

While LLM-based agents can interact with environments via invoking external tools, their expanded capabilities also amplify security risks. Monitoring step-level tool invocation behaviors in real time and proactively intervening before unsafe execution is critical for agent deployment, yet remains under-explored. In this work, we first construct TS-Bench, a novel benchmark for step-level tool invocation safety detection in LLM agents. We then develop a guardrail model, TS-Guard, using multi-task reinforcement learning. The model proactively detects unsafe tool invocation actions before execution by reasoning over the interaction history. It assesses request harmfulness and action-attack correlations, producing interpretable and generalizable safety judgments and feedback. Furthermore, we introduce TS-Flow, a guardrail-feedback-driven reasoning framework for LLM agents, which reduces harmful tool invocations of ReAct-style agents by 65 percent on average and improves benign task completion by approximately 10 percent under prompt injection attacks.

AgentGuard: Repurposing Agentic Orchestrator for Safety Evaluation of Tool Orchestration

Cryptography and Security

Keeps AI from doing harmful things with tools.

13 Feb 2025 0

90%

Towards Verifiably Safe Tool Use for LLM Agents

Software Engineering

Makes AI agents safer by checking their actions.

12 Jan 2026 1

89%

SafeToolBench: Pioneering a Prospective Benchmark to Evaluating Tool Utilization Safety in LLMs

Cryptography and Security

Makes AI tools safer to use.

9 Sep 2025 1

View PDF Login to Bookmark

ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback

Technical Abstract

AgentGuard: Repurposing Agentic Orchestrator for Safety Evaluation of Tool Orchestration

Towards Verifiably Safe Tool Use for LLM Agents

SafeToolBench: Pioneering a Prospective Benchmark to Evaluating Tool Utilization Safety in LLMs