Study on LLMs for Promptagator-Style Dense Retriever Training
By: Daniel Gwon, Nour Jedidi, Jimmy Lin
Potential Business Impact:
Makes AI better at finding specific information.
Promptagator demonstrated that Large Language Models (LLMs) with few-shot prompts can be used as task-specific query generators for fine-tuning domain-specialized dense retrieval models. However, the original Promptagator approach relied on proprietary and large-scale LLMs which users may not have access to or may be prohibited from using with sensitive data. In this work, we study the impact of open-source LLMs at accessible scales ($\leq$14B parameters) as an alternative. Our results demonstrate that open-source LLMs as small as 3B parameters can serve as effective Promptagator-style query generators. We hope our work will inform practitioners with reliable alternatives for synthetic data generation and give insights to maximize fine-tuning results for domain-specific applications.
Similar Papers
Fine-tuning for Better Few Shot Prompting: An Empirical Comparison for Short Answer Grading
Machine Learning (CS)
Teaches computers to grade homework faster.
Green Prompting
Computation and Language
Makes AI use less electricity by changing its questions.
Are Prompts All You Need? Evaluating Prompt-Based Large Language Models (LLM)s for Software Requirements Classification
Software Engineering
Helps computers sort software ideas faster, needing less data.