SafeMobile: Chain-level Jailbreak Detection and Automated Evaluation for Multimodal Mobile Agents
By: Siyuan Liang , Tianmeng Fang , Zhe Liu and more
Potential Business Impact:
Stops smart assistants from doing bad things.
With the wide application of multimodal foundation models in intelligent agent systems, scenarios such as mobile device control, intelligent assistant interaction, and multimodal task execution are gradually relying on such large model-driven agents. However, the related systems are also increasingly exposed to potential jailbreak risks. Attackers may induce the agents to bypass the original behavioral constraints through specific inputs, and then trigger certain risky and sensitive operations, such as modifying settings, executing unauthorized commands, or impersonating user identities, which brings new challenges to system security. Existing security measures for intelligent agents still have limitations when facing complex interactions, especially in detecting potentially risky behaviors across multiple rounds of conversations or sequences of tasks. In addition, an efficient and consistent automated methodology to assist in assessing and determining the impact of such risks is currently lacking. This work explores the security issues surrounding mobile multimodal agents, attempts to construct a risk discrimination mechanism by incorporating behavioral sequence information, and designs an automated assisted assessment scheme based on a large language model. Through preliminary validation in several representative high-risk tasks, the results show that the method can improve the recognition of risky behaviors to some extent and assist in reducing the probability of agents being jailbroken. We hope that this study can provide some valuable references for the security risk modeling and protection of multimodal intelligent agent systems.
Similar Papers
Guardians of the Agentic System: Preventing Many Shots Jailbreak with Agentic System
Cryptography and Security
Keeps smart computer helpers safe from bad guys.
Multimodal Safety Is Asymmetric: Cross-Modal Exploits Unlock Black-Box MLLMs Jailbreaks
Cryptography and Security
Makes AI models safer from harmful tricks.
Immunity memory-based jailbreak detection: multi-agent adaptive guard for large language models
Cryptography and Security
AI learns to remember and block bad instructions.