Generalizing Scaling Laws for Dense and Sparse Large Language Models
By: Md Arafat Hossain , Xingfu Wu , Valerie Taylor and more
Potential Business Impact:
Predicts computer brain size and needs better.
Over the past few years, the size of language models has grown exponentially, as has the computational cost to train these large models. This rapid growth has motivated researchers to develop new techniques aimed at enhancing the efficiency of the training process. Despite these advancements, optimally predicting the model size or allocating optimal resources remains a challenge. Several efforts have addressed the challenge by proposing different scaling laws, but almost all of them are architecture-specific (dense or sparse). In this work we revisit existing scaling laws and propose a generalized scaling law to provide a unified framework that is applicable to both dense and sparse large language models. We evaluate and compare our proposed scaling law with existing scaling laws to demonstrate its effectiveness.
Similar Papers
Generalizing Scaling Laws for Dense and Sparse Large Language Models
Machine Learning (CS)
Makes big computer brains train faster, cheaper.
Scaling Law Phenomena Across Regression Paradigms: Multiple and Kernel Approaches
Machine Learning (CS)
Makes AI smarter by understanding how to train them.
Scaling Laws for Code: A More Data-Hungry Regime
Computation and Language
Makes computer code smarter with more data.