Score: 0

TELL-TALE: Task Efficient LLMs with Task Aware Layer Elimination

Published: October 26, 2025 | arXiv ID: 2510.22767v1

By: Omar Naim, Krish Sharma, Nicholas Asher

Potential Business Impact:

Makes smart computer programs smaller, faster, and better.

Business Areas:
Machine Learning Artificial Intelligence, Data and Analytics, Software

In this paper we introduce Tale, Task-Aware Layer Elimination, an inference-time algorithm that prunes entire transformer layers in an LLM by directly optimizing task-specific validation performance. We evaluate TALE on 9 tasks and 5 models, including LLaMA 3.1 8B, Qwen 2.5 7B, Qwen 2.5 0.5B, Mistral 7B, and Lucie 7B, under both zero-shot and few-shot settings. Unlike prior approaches, TALE requires no retraining and consistently improves accuracy while reducing computational cost across all benchmarks. Furthermore, applying TALE during finetuning leads to additional performance gains. Finally, TALE provides flexible user control over trade-offs between accuracy and efficiency. Mutual information analysis shows that certain layers act as bottlenecks, degrading task-relevant representations. Tale's selective layer removal remedies this problem, producing smaller, faster, and more accurate models that are also faster to fine-tune while offering new insights into transformer interpretability.

Page Count
18 pages

Category
Computer Science:
Machine Learning (CS)