Score: 3

Blue Teaming Function-Calling Agents

Published: January 14, 2026 | arXiv ID: 2601.09292v1

By: Greta Dolcetti, Giulio Zizzo, Sergio Maffeis

BigTech Affiliations: IBM

Potential Business Impact:

AI models can't be trusted with important jobs yet.

Business Areas:

Penetration Testing Information Technology, Privacy and Security

We present an experimental evaluation that assesses the robustness of four open source LLMs claiming function-calling capabilities against three different attacks, and we measure the effectiveness of eight different defences. Our results show how these models are not safe by default, and how the defences are not yet employable in real-world scenarios.