Can You Detect the Difference?
By: İsmail Tarım, Aytuğ Onan
Potential Business Impact:
Finds AI writing that tricks human detectors.
The rapid advancement of large language models (LLMs) has raised concerns about reliably detecting AI-generated text. Stylometric metrics work well on autoregressive (AR) outputs, but their effectiveness on diffusion-based models is unknown. We present the first systematic comparison of diffusion-generated text (LLaDA) and AR-generated text (LLaMA) using 2 000 samples. Perplexity, burstiness, lexical diversity, readability, and BLEU/ROUGE scores show that LLaDA closely mimics human text in perplexity and burstiness, yielding high false-negative rates for AR-oriented detectors. LLaMA shows much lower perplexity but reduced lexical fidelity. Relying on any single metric fails to separate diffusion outputs from human writing. We highlight the need for diffusion-aware detectors and outline directions such as hybrid models, diffusion-specific stylometric signatures, and robust watermarking.
Similar Papers
Large Language Diffusion Models
Computation and Language
New AI learns language like magic, not just predicting words.
Beyond Next-Token Prediction: A Performance Characterization of Diffusion versus Autoregressive Language Models
Machine Learning (CS)
Makes computers write faster and understand longer stories.
The Arabic AI Fingerprint: Stylometric Analysis and Detection of Large Language Models Text
Computation and Language
Finds fake Arabic writing made by computers.