Score: 0

Enhancing LLM Knowledge Learning through Generalization

Published: March 5, 2025 | arXiv ID: 2503.03705v2

By: Mingkang Zhu , Xi Chen , Zhongdao Wang and more

Potential Business Impact:

Helps computers remember new facts without forgetting old ones.

Business Areas:
Natural Language Processing Artificial Intelligence, Data and Analytics, Software

As Large language models (LLMs) are increasingly deployed in diverse applications, faithfully integrating evolving factual knowledge into these models remains a critical challenge. Continued pre-training on paraphrased data has shown empirical promise for enhancing knowledge acquisition. However, this approach is often costly and unreliable, as it relies on external models or manual effort for rewriting, and may inadvertently alter the factual content. In this work, we hypothesize and empirically show that an LLM's ability to continually predict the same factual knowledge tokens given diverse paraphrased contexts is positively correlated with its capacity to extract that knowledge via question-answering. Based on this view and aiming to improve generalization to diverse paraphrased contexts, we introduce two strategies to enhance LLMs' ability to predict the same knowledge tokens given varied contexts, thereby enhancing knowledge acquisition. First, we propose formatting-based data augmentation, which diversifies documents conveying the same knowledge by altering document formats rather than their content, thereby preserving factual integrity. Second, we adopt sharpness-aware minimization as the optimizer to better improve generalization. Extensive experiments demonstrate our methods' effectiveness in both continued pre-training and instruction tuning, and further gains can be achieved by combining with paraphrased data.

Country of Origin
🇭🇰 Hong Kong

Page Count
11 pages

Category
Computer Science:
Computation and Language