Counterfactual LLM-based Framework for Measuring Rhetorical Style
By: Jingyi Qiu, Hong Chen, Zongyi Li
Potential Business Impact:
AI helps tell hype from real science news.
The rise of AI has fueled growing concerns about ``hype'' in machine learning papers, yet a reliable way to quantify rhetorical style independently of substantive content has remained elusive. Because bold language can stem from either strong empirical results or mere rhetorical style, it is often difficult to distinguish between the two. To disentangle rhetorical style from substantive content, we introduce a counterfactual, LLM-based framework: multiple LLM rhetorical personas generate counterfactual writings from the same substantive content, an LLM judge compares them through pairwise evaluations, and the outcomes are aggregated using a Bradley--Terry model. Applying this method to 8,485 ICLR submissions sampled from 2017 to 2025, we generate more than 250,000 counterfactual writings and provide a large-scale quantification of rhetorical style in ML papers. We find that visionary framing significantly predicts downstream attention, including citations and media attention, even after controlling for peer-review evaluations. We also observe a sharp rise in rhetorical strength after 2023, and provide empirical evidence showing that this increase is largely driven by the adoption of LLM-based writing assistance. The reliability of our framework is validated by its robustness to the choice of personas and the high correlation between LLM judgments and human annotations. Our work demonstrates that LLMs can serve as instruments to measure and improve scientific evaluation.
Similar Papers
Persona-Augmented Benchmarking: Evaluating LLMs Across Diverse Writing Styles
Computation and Language
Makes AI understand different writing styles better.
Automatic Reviewers Fail to Detect Faulty Reasoning in Research Papers: A New Counterfactual Evaluation Framework
Computation and Language
AI can't spot bad research logic yet.
A Generalizable Rhetorical Strategy Annotation Model Using LLM-based Debate Simulation and Labelling
Computation and Language
Finds how speakers try to convince you.