TokenShapley: Token Level Context Attribution with Shapley Value
By: Yingtai Xiao , Yuqing Zhu , Sirat Samyoun and more
Potential Business Impact:
Shows where AI got its facts from.
Large language models (LLMs) demonstrate strong capabilities in in-context learning, but verifying the correctness of their generated responses remains a challenge. Prior work has explored attribution at the sentence level, but these methods fall short when users seek attribution for specific keywords within the response, such as numbers, years, or names. To address this limitation, we propose TokenShapley, a novel token-level attribution method that combines Shapley value-based data attribution with KNN-based retrieval techniques inspired by recent advances in KNN-augmented LLMs. By leveraging a precomputed datastore for contextual retrieval and computing Shapley values to quantify token importance, TokenShapley provides a fine-grained data attribution approach. Extensive evaluations on four benchmarks show that TokenShapley outperforms state-of-the-art baselines in token-level attribution, achieving an 11-23% improvement in accuracy.
Similar Papers
Document Valuation in LLM Summaries: A Cluster Shapley Approach
Computation and Language
Gives credit to sources used in AI summaries.
MaxShapley: Towards Incentive-compatible Generative Search with Fair Context Attribution
Machine Learning (CS)
Fairly pays creators for search engine answers.
Concept-Level Explainability for Auditing & Steering LLM Responses
Computation and Language
Helps AI understand what makes it say good or bad things.