Uncovering the Vulnerability of Large Language Models in the Financial Domain via Risk Concealment
By: Gang Cheng , Haibo Jin , Wenbin Zhang and more
Potential Business Impact:
Makes AI break money rules without seeming to.
Large Language Models (LLMs) are increasingly integrated into financial applications, yet existing red-teaming research primarily targets harmful content, largely neglecting regulatory risks. In this work, we aim to investigate the vulnerability of financial LLMs through red-teaming approaches. We introduce Risk-Concealment Attacks (RCA), a novel multi-turn framework that iteratively conceals regulatory risks to provoke seemingly compliant yet regulatory-violating responses from LLMs. To enable systematic evaluation, we construct FIN-Bench, a domain-specific benchmark for assessing LLM safety in financial contexts. Extensive experiments on FIN-Bench demonstrate that RCA effectively bypasses nine mainstream LLMs, achieving an average attack success rate (ASR) of 93.18%, including 98.28% on GPT-4.1 and 97.56% on OpenAI o1. These findings reveal a critical gap in current alignment techniques and underscore the urgent need for stronger moderation mechanisms in financial domains. We hope this work offers practical insights for advancing robust and domain-aware LLM alignment.
Similar Papers
Analysing Safety Risks in LLMs Fine-Tuned with Pseudo-Malicious Cyber Security Data
Cryptography and Security
Makes AI safer from hackers and bad code.
Risk Assessment and Security Analysis of Large Language Models
Cryptography and Security
Protects smart computer programs from bad uses.
Quantifying Return on Security Controls in LLM Systems
Cryptography and Security
Helps protect AI from secrets being stolen.