Investigating ReLoRA: Effects on the Learning Dynamics of Small Language Models
By: Yuval Weiss , David Demitri Africa , Paula Buttery and more
Potential Business Impact:
Makes small AI models learn better and faster.
Parameter-efficient methods such as LoRA have revolutionised the fine-tuning of LLMs. Still, their extension to pretraining via ReLoRA is less well understood, especially for small language models (SLMs), which offer lower computational and environmental costs. This work is the first systematic study of ReLoRA in SLMs (11M-66M parameters), evaluating both performance and learning dynamics. Through ablation experiments, we find that ReLoRA generally performs worse than standard training on loss, Paloma perplexity and BLiMP, with the gap widening for the larger models. Further analysis of the learning dynamics of the models indicates that ReLoRA reinforces the rank deficiencies found in smaller models. These results indicate that low-rank update strategies may not transfer easily to SLM pretraining, highlighting the need for more research in the low-compute regime.
Similar Papers
LoRA Is Slower Than You Think
Machine Learning (CS)
Makes AI learn faster and use less power.
How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?
Computation and Language
Teaches AI new facts without forgetting old ones.
Less is More: Resource-Efficient Low-Rank Adaptation
Computation and Language
Makes AI learn faster and better with less effort.