ProCut: LLM Prompt Compression via Attribution Estimation
By: Zhentao Xu , Fengyi Li , Albert Chen and more
Potential Business Impact:
Shrinks AI prompts 78% without losing smarts
In large-scale industrial LLM systems, prompt templates often expand to thousands of tokens as teams iteratively incorporate sections such as task instructions, few-shot examples, and heuristic rules to enhance robustness and coverage. This expansion leads to bloated prompts that are difficult to maintain and incur significant inference latency and serving costs. To address this, we introduce Prompt Compression via Attribution Estimation (ProCut), a flexible, LLM-agnostic, training-free framework that compresses prompts through attribution analysis. ProCut segments prompt templates into semantically meaningful units, quantifies their impact on task performance, and prunes low-utility components. Through extensive experiments on five public benchmark datasets and real-world industrial prompts, we show that ProCut achieves substantial prompt size reductions (78% fewer tokens in production) while maintaining or even slightly improving task performance (up to 62% better than alternative methods). We further introduce an LLM-driven attribution estimator that reduces compression latency by over 50%, and demonstrate that ProCut integrates seamlessly with existing prompt-optimization frameworks to produce concise, high-performing prompts.
Similar Papers
CompactPrompt: A Unified Pipeline for Prompt Data Compression in LLM Workflows
Artificial Intelligence
Makes AI use less computer power and money.
SCOPE: A Generative Approach for LLM Prompt Compression
Computation and Language
Makes AI understand long texts with fewer words.
Dynamic Compressing Prompts for Efficient Inference of Large Language Models
Computation and Language
Makes AI understand more with fewer words.