Fault-Tolerant Sandboxing for AI Coding Agents: A Transactional Approach to Safe Autonomous Execution
By: Boyang Yan
The transition of Large Language Models (LLMs) from passive code generators to autonomous agents introduces significant safety risks, specifically regarding destructive commands and inconsistent system states. Existing commercial solutions often prioritize interactive user safety, enforcing authentication barriers that break the headless loops required for true autonomy. This paper presents a Fault-Tolerant Sandboxing framework designed to mitigate these risks through a policy-based interception layer and a transactional filesystem snapshot mechanism. We hypothesize that wrapping agent actions in atomic transactions can guarantee safety with acceptable latency, outperforming the heavy initialization overhead of containers or the interactive friction of commercial CLIs. We validated this approach by deploying the Minimind-MoE LLM served via nano-vllm on a custom Proxmox-based testbed utilizing EVPN/VXLAN isolation. Experimental results demonstrate a 100\% interception rate for high-risk commands and a 100\% success rate in rolling back failed states. Crucially, our prototype incurs only a 14.5\% performance overhead (approx. 1.8s) per transaction. In contrast, benchmarking against the Gemini CLI sandbox revealed that it requires interactive authentication ("Sign in"), rendering it unusable for headless, autonomous agent workflows.
Similar Papers
Rethinking Autonomy: Preventing Failures in AI-Driven Software Engineering
Software Engineering
Makes AI writing computer code safer.
AGENTSAFE: A Unified Framework for Ethical Assurance and Governance in Agentic AI
Multiagent Systems
Makes AI agents safer and more trustworthy.
Trustworthy AI in the Agentic Lakehouse: from Concurrency to Governance
Artificial Intelligence
Makes AI agents safe for important company data.