Efficiently Estimating Data Efficiency for Language Model Fine-tuning
By: Gyung Hyun Je, Colin Raffel
While large language models (LLMs) demonstrate reasonable zero-shot capability across many downstream tasks, fine-tuning is a common practice to improve their performance. However, a task's data efficiency--i.e., the number of fine-tuning examples needed to achieve a desired level of performance--is often unknown, resulting in costly cycles of incremental annotation and retraining. Indeed, we demonstrate across a curated set of 30 specialized tasks that performant LLMs may struggle zero-shot but can attain stronger performance after fine-tuning. This motivates the need for methods to predict a task's data efficiency without requiring incremental annotation. After introducing a concrete metric that quantifies a task's data efficiency, we propose using the gradient cosine similarity of low-confidence examples to predict data efficiency based on a small number of labeled samples. We validate our approach on a diverse set of tasks with varying data efficiencies, attaining 8.6% error in overall data efficiency prediction and typically eliminating hundreds of unnecessary annotations on each task. Our experiment results and implementation code are available on GitHub.
Similar Papers
Improving Task Diversity in Label Efficient Supervised Finetuning of LLMs
Computation and Language
Teaches computers with less work.
Revisiting Privacy, Utility, and Efficiency Trade-offs when Fine-Tuning Large Language Models
Artificial Intelligence
Makes AI safer without slowing it down.
Fine-Tuned In-Context Learners for Efficient Adaptation
Machine Learning (CS)
Teaches computers to learn better with less data.