Plasticity vs. Rigidity: The Impact of Low-Rank Adapters on Reasoning on a Micro-Budget
By: Zohaib Khan, Omer Tafveez, Zoha Hayat Bhatti
Potential Business Impact:
Small AI learns math problems with less power.
Recent advances in mathematical reasoning typically rely on massive scale, yet the question remains: can strong reasoning capabilities be induced in small language models ($\leq1.5\text{B}$) under extreme constraints? We investigate this by training models on a single A40 GPU (48GB) for under 24 hours using Reinforcement Learning with Verifiable Rewards (RLVR) and Low-Rank Adaptation (LoRA). We find that the success of this ``micro-budget" regime depends critically on the interplay between adapter capacity and model initialization. While low-rank adapters ($r=8$) consistently fail to capture the complex optimization dynamics of reasoning, high-rank adapters ($r=256$) unlock significant plasticity in standard instruction-tuned models. Our best result achieved an impressive 40.0\% Pass@1 on AIME 24 (an 11.1\% absolute improvement over baseline) and pushed Pass@16 to 70.0\%, demonstrating robust exploration capabilities. However, this plasticity is not universal: while instruction-tuned models utilized the budget to elongate their chain-of-thought and maximize reward, heavily math-aligned models suffered performance collapse, suggesting that noisy, low-budget RL updates can act as destructive interference for models already residing near a task-specific optimum.
Similar Papers
Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't
Machine Learning (CS)
Makes small AI smarter with less money.
Rank-1 LoRAs Encode Interpretable Reasoning Signals
Machine Learning (CS)
Makes AI smarter with tiny, understandable changes.
Small Models, Big Impact: Efficient Corpus and Graph-Based Adaptation of Small Multilingual Language Models for Low-Resource Languages
Computation and Language
Helps computers understand rare languages better.