Score: 0

LLMs can Compress LLMs: Adaptive Pruning by Agents

Published: January 14, 2026 | arXiv ID: 2601.09694v1

By: Sai Varun Kodathala, Rakesh Vunnam

As Large Language Models (LLMs) continue to scale, post-training pruning has emerged as a promising approach to reduce computational costs while preserving performance. Existing methods such as SparseGPT and Wanda achieve high sparsity through layer-wise weight reconstruction or activation-aware magnitude pruning, but rely on uniform or hand-crafted heuristics to determine per-layer sparsity ratios. Moreover, recent work has shown that pruned LLMs suffer from severe factual knowledge degradation, with structured pruning methods experiencing near-total collapse in factual question-answering capabilities. We introduce agent-guided pruning, where a foundation model acts as an adaptive pruning agent to intelligently select which layers to prune at each iteration while preserving critical knowledge pathways. Our method constructs layer-wise sensitivity profiles by combining Wanda-inspired weight-activation metrics with gradient importance scores, normalized as z-scores for model-agnostic comparison. These statistics are processed by an LLM agent equipped with self-reflection capabilities, enabling it to learn from previous pruning outcomes and iteratively refine its strategy. A checkpoint rollback mechanism maintains model quality by reverting when perplexity degradation exceeds a threshold. We evaluate our approach on Qwen3 models (4B and 8B parameters) at approximately 45% sparsity, demonstrating substantial improvements over structured pruning baselines: 56% relative improvement in MMLU accuracy, 19x better factual knowledge retention on FreebaseQA, and 69% lower perplexity degradation. Notably, our framework requires no retraining, operates in a model-agnostic manner, and exhibits effective self-correction with only 2-4 rollbacks across 21-40 iterations, demonstrating that foundation models can effectively guide the compression of other foundation models.

Investigating Structural Pruning and Recovery Techniques for Compressing Multimodal Large Language Models: An Empirical Study

Computation and Language

Makes smart AI programs smaller and faster.

28 Jul 2025 2

91%

Breaking Expert Knowledge Limits: Self-Pruning for Large Language Models

CV and Pattern Recognition

Lets computers shrink themselves to work better.

19 Nov 2025 1

91%

Adapt-Pruner: Adaptive Structural Pruning for Efficient Small Language Model Training

Machine Learning (CS)

Makes small AI models as smart as big ones.

5 Feb 2025 2

View PDF Login to Bookmark

LLMs can Compress LLMs: Adaptive Pruning by Agents

Technical Abstract

Investigating Structural Pruning and Recovery Techniques for Compressing Multimodal Large Language Models: An Empirical Study

Breaking Expert Knowledge Limits: Self-Pruning for Large Language Models

Adapt-Pruner: Adaptive Structural Pruning for Efficient Small Language Model Training