Score: 1

DRIP: Defending Prompt Injection via De-instruction Training and Residual Fusion Model Architecture

Published: November 1, 2025 | arXiv ID: 2511.00447v1

By: Ruofan Liu, Yun Lin, Jin Song Dong

Potential Business Impact:

Stops smart computer programs from being tricked.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Large language models (LLMs) have demonstrated impressive instruction-following capabilities. However, these capabilities also expose models to prompt injection attacks, where maliciously crafted inputs overwrite or distract from the intended instructions. A core vulnerability lies in the model's lack of semantic role understanding: it cannot distinguish directive intent from descriptive content, leading it to execute instruction-like phrases embedded in data. We propose DRIP, a training-time defense grounded in a semantic modeling perspective, which enforces robust separation between instruction and data semantics without sacrificing utility. DRIP introduces two lightweight yet complementary mechanisms: (1) a token-wise de-instruction shift that performs semantic disentanglement, weakening directive semantics in data tokens while preserving content meaning; and (2) a residual fusion pathway that provides a persistent semantic anchor, reinforcing the influence of the true top-level instruction during generation. Experimental results on LLaMA-8B and Mistral-7B across three prompt injection benchmarks (SEP, AlpacaFarm, and InjecAgent) demonstrate that DRIP outperforms state-of-the-art defenses, including StruQ, SecAlign, ISE, and PFT, improving role separation by 49%, and reducing attack success rate by 66% for adaptive attacks. Meanwhile, DRIP's utility is on par with the undefended model across AlpacaEval, IFEval, and MT-Bench. Our findings underscore the power of lightweight representation edits and role-aware supervision in securing LLMs against adaptive prompt injection.

Prompt Injection as an Emerging Threat: Evaluating the Resilience of Large Language Models

Cryptography and Security

Makes AI safer from tricky instructions.

3 Nov 2025 0

89%

Mitigating Indirect Prompt Injection via Instruction-Following Intent Analysis

Cryptography and Security

Stops AI from following secret bad commands.

30 Nov 2025 0

89%

Robustness via Referencing: Defending against Prompt Injection Attacks by Referencing the Executed Instruction

Cryptography and Security

Stops bad instructions from tricking smart computer programs.

29 Apr 2025 0

View PDF Login to Bookmark

Page Count

23 pages

DRIP: Defending Prompt Injection via De-instruction Training and Residual Fusion Model Architecture

Stops smart computer programs from being tricked.

Technical Abstract

Prompt Injection as an Emerging Threat: Evaluating the Resilience of Large Language Models

Mitigating Indirect Prompt Injection via Instruction-Following Intent Analysis

Robustness via Referencing: Defending against Prompt Injection Attacks by Referencing the Executed Instruction