Can AI Recognize Its Own Reflection? Self-Detection Performance of LLMs in Computing Education
By: Christopher Burger, Karmece Talley, Christina Trotter
Potential Business Impact:
AI can't reliably tell if students cheated.
The rapid advancement of Large Language Models (LLMs) presents a significant challenge to academic integrity within computing education. As educators seek reliable detection methods, this paper evaluates the capacity of three prominent LLMs (GPT-4, Claude, and Gemini) to identify AI-generated text in computing-specific contexts. We test their performance under both standard and 'deceptive' prompt conditions, where the models were instructed to evade detection. Our findings reveal a significant instability: while default AI-generated text was easily identified, all models struggled to correctly classify human-written work (with error rates up to 32%). Furthermore, the models were highly susceptible to deceptive prompts, with Gemini's output completely fooling GPT-4. Given that simple prompt alterations significantly degrade detection efficacy, our results demonstrate that these LLMs are currently too unreliable for making high-stakes academic misconduct judgments.
Similar Papers
Benchmarking Large Language Models for Personalized Guidance in AI-Enhanced Learning
Artificial Intelligence
Helps AI tutors give better, personalized learning help.
Defend LLMs Through Self-Consciousness
Artificial Intelligence
Keeps AI from being tricked by bad instructions.
AI Generated Text Detection Using Instruction Fine-tuned Large Language and Transformer-Based Models
Computation and Language
Finds fake writing made by computers.