Score: 0

Can AI Recognize Its Own Reflection? Self-Detection Performance of LLMs in Computing Education

Published: December 29, 2025 | arXiv ID: 2512.23587v1

By: Christopher Burger, Karmece Talley, Christina Trotter

Potential Business Impact:

AI can't reliably tell if students cheated.

Business Areas:
Natural Language Processing Artificial Intelligence, Data and Analytics, Software

The rapid advancement of Large Language Models (LLMs) presents a significant challenge to academic integrity within computing education. As educators seek reliable detection methods, this paper evaluates the capacity of three prominent LLMs (GPT-4, Claude, and Gemini) to identify AI-generated text in computing-specific contexts. We test their performance under both standard and 'deceptive' prompt conditions, where the models were instructed to evade detection. Our findings reveal a significant instability: while default AI-generated text was easily identified, all models struggled to correctly classify human-written work (with error rates up to 32%). Furthermore, the models were highly susceptible to deceptive prompts, with Gemini's output completely fooling GPT-4. Given that simple prompt alterations significantly degrade detection efficacy, our results demonstrate that these LLMs are currently too unreliable for making high-stakes academic misconduct judgments.

Country of Origin
🇺🇸 United States

Page Count
10 pages

Category
Computer Science:
Computers and Society