Effect of Document Packing on the Latent Multi-Hop Reasoning Capabilities of Large Language Models
By: Gabriele Prato , Shagun Sodhani , Alessandro Sordoni and more
Potential Business Impact:
Makes AI better at solving problems by grouping information.
The standard practice for training large language models involves packing multiple documents together to optimize computational efficiency. However, the impact of this process on the models' capabilities remains largely unexplored. To address this gap, we investigate how different document-packing strategies influence the latent multi-hop reasoning abilities of LLMs. Our findings indicate that packing can improve model performance compared to training on individual documents, at the expense of more compute. To further understand the underlying mechanisms, we conduct an ablation study, identifying key factors that explain the advantages of packing. Ultimately, our research deepens the understanding of LLM training dynamics and provides practical insights for optimizing model development.
Similar Papers
An In-depth Study of LLM Contributions to the Bin Packing Problem
Artificial Intelligence
Makes math problems easier to solve and understand.
Efficient Strategy for Improving Large Language Model (LLM) Capabilities
Computation and Language
Makes smart computer programs run faster with less power.
Lightweight Latent Reasoning for Narrative Tasks
Computation and Language
Makes AI think faster and use less power.