Invasive Context Engineering to Control Large Language Models
By: Thomas Rivasseau
Potential Business Impact:
Keeps AI from being tricked, even with long talks.
Current research on operator control of Large Language Models improves model robustness against adversarial attacks and misbehavior by training on preference examples, prompting, and input/output filtering. Despite good results, LLMs remain susceptible to abuse, and jailbreak probability increases with context length. There is a need for robust LLM security guarantees in long-context situations. We propose control sentences inserted into the LLM context as invasive context engineering to partially solve the problem. We suggest this technique can be generalized to the Chain-of-Thought process to prevent scheming. Invasive Context Engineering does not rely on LLM training, avoiding data shortage pitfalls which arise in training models for long context situations.
Similar Papers
LLM Reinforcement in Context
Computation and Language
Stops AI from being tricked by long talks.
Context Misleads LLMs: The Role of Context Filtering in Maintaining Safe Alignment of LLMs
Cryptography and Security
Stops AI from being tricked into saying bad things.
Defending Large Language Models Against Jailbreak Exploits with Responsible AI Considerations
Cryptography and Security
Stops AI from saying bad or unsafe things.