Score: 1

Don't Be Greedy, Just Relax! Pruning LLMs via Frank-Wolfe

Published: October 15, 2025 | arXiv ID: 2510.13713v1

By: Christophe Roux , Max Zimmer , Alexandre d'Aspremont and more

Potential Business Impact:

Makes big computer brains smaller and faster.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Pruning is a common technique to reduce the compute and storage requirements of Neural Networks. While conventional approaches typically retrain the model to recover pruning-induced performance degradation, state-of-the-art Large Language Model (LLM) pruning methods operate layer-wise, minimizing the per-layer pruning error on a small calibration dataset to avoid full retraining, which is considered computationally prohibitive for LLMs. However, finding the optimal pruning mask is a hard combinatorial problem and solving it to optimality is intractable. Existing methods hence rely on greedy heuristics that ignore the weight interactions in the pruning objective. In this work, we instead consider the convex relaxation of these combinatorial constraints and solve the resulting problem using the Frank-Wolfe (FW) algorithm. Our method drastically reduces the per-layer pruning error, outperforms strong baselines on state-of-the-art GPT architectures, and remains memory-efficient. We provide theoretical justification by showing that, combined with the convergence guarantees of the FW algorithm, we obtain an approximate solution to the original combinatorial problem upon rounding the relaxed solution to integrality.

FastForward Pruning: Efficient LLM Pruning via Single-Step Reinforcement Learning

Machine Learning (CS)

Makes AI models smaller and faster to train.

24 Nov 2025 0

90%

Fewer Weights, More Problems: A Practical Attack on LLM Pruning

Machine Learning (CS)

Pruning AI models can hide bad behavior.

9 Oct 2025 1

89%

Projection-Free CNN Pruning via Frank-Wolfe with Momentum: Sparser Models with Less Pretraining

Machine Learning (CS)

Makes computer "brains" smaller, faster, and smarter.

30 Nov 2025 0

View PDF Login to Bookmark

Page Count

18 pages

Don't Be Greedy, Just Relax! Pruning LLMs via Frank-Wolfe

Makes big computer brains smaller and faster.

Technical Abstract

FastForward Pruning: Efficient LLM Pruning via Single-Step Reinforcement Learning

Fewer Weights, More Problems: A Practical Attack on LLM Pruning

Projection-Free CNN Pruning via Frank-Wolfe with Momentum: Sparser Models with Less Pretraining