CiteGuard: Faithful Citation Attribution for LLMs via Retrieval-Augmented Validation
By: Yee Man Choi , Xuehang Guo , Yi R. and more
Potential Business Impact:
Helps computers check if science writing uses real sources.
Large Language Models (LLMs) have emerged as promising assistants for scientific writing. However, there have been concerns regarding the quality and reliability of the generated text, one of which is the citation accuracy and faithfulness. While most recent work relies on methods such as LLM-as-a-Judge, the reliability of LLM-as-a-Judge alone is also in doubt. In this work, we reframe citation evaluation as a problem of citation attribution alignment, which is assessing whether LLM-generated citations match those a human author would include for the same text. We propose CiteGuard, a retrieval-aware agent framework designed to provide more faithful grounding for citation validation. CiteGuard improves the prior baseline by 12.3%, and achieves up to 65.4% accuracy on the CiteME benchmark, on par with human-level performance (69.7%). It also enables the identification of alternative but valid citations.
Similar Papers
CiteGuard: Faithful Citation Attribution for LLMs via Retrieval-Augmented Validation
Digital Libraries
Helps AI write science papers with real sources.
Document Attribution: Examining Citation Relationships using Large Language Models
Information Retrieval
Checks if AI answers come from the right documents.
Attribution, Citation, and Quotation: A Survey of Evidence-based Text Generation with Large Language Models
Computation and Language
Makes AI stories show where their facts came from.