LLM-REVal: Can We Trust LLM Reviewers Yet?
By: Rui Li , Jia-Chen Gu , Po-Nien Kung and more
Potential Business Impact:
AI reviewers unfairly favor AI-written papers.
The rapid advancement of large language models (LLMs) has inspired researchers to integrate them extensively into the academic workflow, potentially reshaping how research is practiced and reviewed. While previous studies highlight the potential of LLMs in supporting research and peer review, their dual roles in the academic workflow and the complex interplay between research and review bring new risks that remain largely underexplored. In this study, we focus on how the deep integration of LLMs into both peer-review and research processes may influence scholarly fairness, examining the potential risks of using LLMs as reviewers by simulation. This simulation incorporates a research agent, which generates papers and revises, alongside a review agent, which assesses the submissions. Based on the simulation results, we conduct human annotations and identify pronounced misalignment between LLM-based reviews and human judgments: (1) LLM reviewers systematically inflate scores for LLM-authored papers, assigning them markedly higher scores than human-authored ones; (2) LLM reviewers persistently underrate human-authored papers with critical statements (e.g., risk, fairness), even after multiple revisions. Our analysis reveals that these stem from two primary biases in LLM reviewers: a linguistic feature bias favoring LLM-generated writing styles, and an aversion toward critical statements. These results highlight the risks and equity concerns posed to human authors and academic research if LLMs are deployed in the peer review cycle without adequate caution. On the other hand, revisions guided by LLM reviews yield quality gains in both LLM-based and human evaluations, illustrating the potential of the LLMs-as-reviewers for early-stage researchers and enhancing low-quality papers.
Similar Papers
Justice in Judgment: Unveiling (Hidden) Bias in LLM-assisted Peer Reviews
Computers and Society
Finds AI reviews unfairly favor famous schools.
When Your Reviewer is an LLM: Biases, Divergence, and Prompt Injection Risks in Peer Review
Computers and Society
Helps AI review science papers, but can be tricked.
Unveiling the Merits and Defects of LLMs in Automatic Review Generation for Scientific Papers
Computation and Language
Helps computers write better science paper reviews.