Score: 0

The Erosion of LLM Signatures: Can We Still Distinguish Human and LLM-Generated Scientific Ideas After Iterative Paraphrasing?

Published: December 4, 2025 | arXiv ID: 2512.05311v1

By: Sadat Shahriar, Navid Ayoobi, Arjun Mukherjee

With the increasing reliance on LLMs as research agents, distinguishing between LLM and human-generated ideas has become crucial for understanding the cognitive nuances of LLMs' research capabilities. While detecting LLM-generated text has been extensively studied, distinguishing human vs LLM-generated scientific idea remains an unexplored area. In this work, we systematically evaluate the ability of state-of-the-art (SOTA) machine learning models to differentiate between human and LLM-generated ideas, particularly after successive paraphrasing stages. Our findings highlight the challenges SOTA models face in source attribution, with detection performance declining by an average of 25.4\% after five consecutive paraphrasing stages. Additionally, we demonstrate that incorporating the research problem as contextual information improves detection performance by up to 2.97%. Notably, our analysis reveals that detection algorithms struggle significantly when ideas are paraphrased into a simplified, non-expert style, contributing the most to the erosion of distinguishable LLM signatures.

Can LLMs extract human-like fine-grained evidence for evidence-based fact-checking?

Computation and Language

Helps computers find truth in online comments.

26 Nov 2025 0

88%

Computational Turing Test Reveals Systematic Differences Between Human and AI Language

Computation and Language

Makes AI talk like people, but it's not quite there.

6 Nov 2025 0

88%

Watermarking Needs Input Repetition Masking

Machine Learning (CS)

Makes AI text harder to spot, even with watermarks.

16 Apr 2025 0

View PDF Login to Bookmark

The Erosion of LLM Signatures: Can We Still Distinguish Human and LLM-Generated Scientific Ideas After Iterative Paraphrasing?

Technical Abstract

Can LLMs extract human-like fine-grained evidence for evidence-based fact-checking?

Computational Turing Test Reveals Systematic Differences Between Human and AI Language

Watermarking Needs Input Repetition Masking