Score: 0

CacheTrap: Injecting Trojans in LLMs without Leaving any Traces in Inputs or Weights

Published: November 27, 2025 | arXiv ID: 2511.22681v1

By: Mohaiminul Al Nahian , Abeer Matar A. Almalky , Gamana Aragonda and more

Potential Business Impact:

Makes AI models do bad things with tiny changes.

Business Areas:

Intrusion Detection Information Technology, Privacy and Security

Adversarial weight perturbation has emerged as a concerning threat to LLMs that either use training privileges or system-level access to inject adversarial corruption in model weights. With the emergence of innovative defensive solutions that place system- and algorithm-level checks and corrections in the input and weight spaces, these perturbations are increasingly susceptible to defenses. This work develops a novel perspective on Trojan attacks that generates an attacker-designed model output while leaving no attack traces on the inputs or weights. Such an attack space can be unlocked through corruption of the key-value (KV) cache. In this paper, we introduce CacheTrap, a novel Trojan attack that corrupts the value vectors stored in the KV cache. These vectors capture the dynamic activations for specific token positions and therefore constitute a natural surface for transient, inference-time trigger insertion. The transient nature of these KV values and their dependence on victim input imply additional constraints on our attack, such as a lack of knowledge of the victim's data or domain application, and, consequently, a lack of gradient information. The objective of the proposed CacheTrap is to develop a vulnerable KV bit-searching algorithm so that, once the attack employs the identified bit-flip as a trigger, the model generates targeted behavior, e.g., classifying inputs towards the target class. Moreover, CacheTrap is a data- and gradient-free attack which also has no impact on the model's utility. Our evaluation demonstrates that the proposed attack enables the first successful Trojan attack on LLMs with a single bit flip in the KV cache. In addition, the data-independent nature of the attack ensures that once the attacker identifies the vulnerable bit index, the location remains constant and can be transferred to a wide range of victim tasks/datasets/queries with no overhead.

Inverting Trojans in LLMs

Machine Learning (CS)

Finds hidden bad words in AI writing.

19 Sep 2025 0

88%

ThinkTrap: Denial-of-Service Attacks against Black-box LLM Services via Infinite Thinking

Cryptography and Security

Makes AI get stuck, stopping its work.

8 Dec 2025 1

88%

Can Transformer Memory Be Corrupted? Investigating Cache-Side Vulnerabilities in Large Language Models

Cryptography and Security

Makes AI models say wrong things by messing with memory.

20 Oct 2025 0

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

8 pages

CacheTrap: Injecting Trojans in LLMs without Leaving any Traces in Inputs or Weights

Makes AI models do bad things with tiny changes.

Technical Abstract

Inverting Trojans in LLMs

ThinkTrap: Denial-of-Service Attacks against Black-box LLM Services via Infinite Thinking

Can Transformer Memory Be Corrupted? Investigating Cache-Side Vulnerabilities in Large Language Models