Can LLMs Detect Their Own Hallucinations?
By: Sora Kadotani, Kosuke Nishida, Kyosuke Nishida
Potential Business Impact:
Helps computers spot when they make up facts.
Large language models (LLMs) can generate fluent responses, but sometimes hallucinate facts. In this paper, we investigate whether LLMs can detect their own hallucinations. We formulate hallucination detection as a classification task of a sentence. We propose a framework for estimating LLMs' capability of hallucination detection and a classification method using Chain-of-Thought (CoT) to extract knowledge from their parameters. The experimental results indicated that GPT-$3.5$ Turbo with CoT detected $58.2\%$ of its own hallucinations. We concluded that LLMs with CoT can detect hallucinations if sufficient knowledge is contained in their parameters.
Similar Papers
Can LLMs Detect Intrinsic Hallucinations in Paraphrasing and Machine Translation?
Computation and Language
Helps computers tell if their answers are true.
Detecting Hallucinations in Authentic LLM-Human Interactions
Computation and Language
Finds when AI lies in real conversations.
The Illusion of Progress: Re-evaluating Hallucination Detection in LLMs
Computation and Language
Fixes AI mistakes that humans can't see.