Structured Pruning for Diverse Best-of-N Reasoning Optimization
By: Hieu Trung Nguyen, Bao Nguyen, Viet Anh Nguyen
Potential Business Impact:
Makes AI better at solving math problems.
Model pruning in transformer-based language models, traditionally viewed as a means of achieving computational savings, can enhance the model's reasoning capabilities. In this work, we uncover a surprising phenomenon: the selective pruning of certain attention heads leads to improvements in reasoning performance, particularly on challenging tasks. Motivated by this observation, we propose SPRINT, a novel contrastive learning framework that dynamically selects the optimal head and layer to prune during inference. By aligning question embeddings with head embeddings, SPRINT identifies those pruned-head configurations that result in more accurate reasoning. Extensive experiments demonstrate that our method significantly outperforms traditional best-of-$N$ and random head selection strategies on the MATH500 and GSM8K datasets.
Similar Papers
SPRINT: Enabling Interleaved Planning and Parallelized Execution in Reasoning Models
Artificial Intelligence
Makes smart computers solve problems much faster.
Think Clearly: Improving Reasoning via Redundant Token Pruning
Artificial Intelligence
Clears thinking in AI for better answers.
Think, Prune, Train, Improve: Scaling Reasoning without Scaling Models
Machine Learning (CS)
Computers learn to solve harder math problems.