Multilingual Hidden Prompt Injection Attacks on LLM-Based Academic Reviewing
By: Panagiotis Theocharopoulos, Ajinkya Kulkarni, Mathew Magimai. -Doss
Potential Business Impact:
Makes AI reviewers unfairly change paper grades.
Large language models (LLMs) are increasingly considered for use in high-impact workflows, including academic peer review. However, LLMs are vulnerable to document-level hidden prompt injection attacks. In this work, we construct a dataset of approximately 500 real academic papers accepted to ICML and evaluate the effect of embedding hidden adversarial prompts within these documents. Each paper is injected with semantically equivalent instructions in four different languages and reviewed using an LLM. We find that prompt injection induces substantial changes in review scores and accept/reject decisions for English, Japanese, and Chinese injections, while Arabic injections produce little to no effect. These results highlight the susceptibility of LLM-based reviewing systems to document-level prompt injection and reveal notable differences in vulnerability across languages.
Similar Papers
Publish to Perish: Prompt Injection Attacks on LLM-Assisted Peer Review
Cryptography and Security
Tricks AI into writing fake science reviews.
Publish to Perish: Prompt Injection Attacks on LLM-Assisted Peer Review
Cryptography and Security
Tricks AI reviewers to miss hidden bad ideas.
Prompt Injection Attacks on LLM Generated Reviews of Scientific Publications
Machine Learning (CS)
Makes AI reviewers unfairly accept almost everything.