Score: 0

Paraphrasing Adversarial Attack on LLM-as-a-Reviewer

Published: January 11, 2026 | arXiv ID: 2601.06884v1

By: Masahiro Kaneko

Potential Business Impact:

Tricks AI reviewers to give papers higher scores.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

The use of large language models (LLMs) in peer review systems has attracted growing attention, making it essential to examine their potential vulnerabilities. Prior attacks rely on prompt injection, which alters manuscript content and conflates injection susceptibility with evaluation robustness. We propose the Paraphrasing Adversarial Attack (PAA), a black-box optimization method that searches for paraphrased sequences yielding higher review scores while preserving semantic equivalence and linguistic naturalness. PAA leverages in-context learning, using previous paraphrases and their scores to guide candidate generation. Experiments across five ML and NLP conferences with three LLM reviewers and five attacking models show that PAA consistently increases review scores without changing the paper's claims. Human evaluation confirms that generated paraphrases maintain meaning and naturalness. We also find that attacked papers exhibit increased perplexity in reviews, offering a potential detection signal, and that paraphrasing submissions can partially mitigate attacks.

Breaking the Reviewer: Assessing the Vulnerability of Large Language Models in Automated Peer Review Under Textual Adversarial Attacks

Computation and Language

AI reviewers can be tricked by small text changes.

8 Jun 2025 1

91%

When Reject Turns into Accept: Quantifying the Vulnerability of LLM-Based Scientific Reviewers to Indirect Prompt Injection

Artificial Intelligence

Tricks AI judges to accept bad science papers.

11 Dec 2025 0

91%

When Reject Turns into Accept: Quantifying the Vulnerability of LLM-Based Scientific Reviewers to Indirect Prompt Injection

Artificial Intelligence

Makes AI judges for papers easily fooled.

11 Dec 2025 0

View PDF Login to Bookmark

Country of Origin

🇦🇪 United Arab Emirates

Page Count

14 pages

Paraphrasing Adversarial Attack on LLM-as-a-Reviewer

Tricks AI reviewers to give papers higher scores.

Technical Abstract

Breaking the Reviewer: Assessing the Vulnerability of Large Language Models in Automated Peer Review Under Textual Adversarial Attacks

When Reject Turns into Accept: Quantifying the Vulnerability of LLM-Based Scientific Reviewers to Indirect Prompt Injection

When Reject Turns into Accept: Quantifying the Vulnerability of LLM-Based Scientific Reviewers to Indirect Prompt Injection