PLoRA: Efficient LoRA Hyperparameter Tuning for Large Models
By: Minghao Yan , Zhuang Wang , Zhen Jia and more
Potential Business Impact:
Makes AI learn new things much faster.
Low-rank Adaptation (LoRA) has gained popularity as a fine-tuning approach for Large Language Models (LLMs) due to its low resource requirements and good performance. While a plethora of work has investigated improving LoRA serving efficiency by serving multiple LoRAs concurrently, existing methods assume that a wide range of LoRA adapters are available for serving. In our work, we conduct extensive empirical studies to identify that current training paradigms do not utilize hardware resources efficiently and require high overhead to obtain a performant LoRA. Leveraging these insights, we propose PLoRA, which automatically orchestrates concurrent LoRA fine-tuning jobs under given hardware and model constraints and develops performant kernels to improve training efficiency. Our experimental studies show that PLoRA reduces the makespan of LoRA fine-tuning over a given hyperparameter search space by up to 7.52x and improves training throughput by up to 12.8x across a range of state-of-the-art LLMs.
Similar Papers
Serving Heterogeneous LoRA Adapters in Distributed LLM Inference Systems
Distributed, Parallel, and Cluster Computing
Makes AI models run faster using fewer computers.
Less is More: Resource-Efficient Low-Rank Adaptation
Computation and Language
Makes AI learn faster and better with less effort.
LoRA on the Go: Instance-level Dynamic LoRA Selection and Merging
Computation and Language
Lets AI switch jobs instantly without retraining.