Retrieval Capabilities of Large Language Models Scale with Pretraining FLOPs
By: Jacob Portes , Connor Jennings , Erica Ji Yuen and more
Potential Business Impact:
Bigger AI learns better to find information.
How does retrieval performance scale with pretraining FLOPs? We benchmark retrieval performance across LLM model sizes from 125 million parameters to 7 billion parameters pretrained on datasets ranging from 1 billion tokens to more than 2 trillion tokens. We find that retrieval performance on zero-shot BEIR tasks predictably scales with LLM size, training duration, and estimated FLOPs. We also show that In-Context Learning scores are strongly correlated with retrieval scores across retrieval tasks. Finally, we highlight the implications this has for the development of LLM-based retrievers.
Similar Papers
Retrofitting Small Multilingual Models for Retrieval: Matching 7B Performance with 300M Parameters
Computation and Language
Makes small computer models search languages better.
Scaling Intelligence: Designing Data Centers for Next-Gen Language Models
Hardware Architecture
Builds faster, cheaper computer centers for giant AI.
Think Before You Retrieve: Learning Test-Time Adaptive Search with Small Language Models
Artificial Intelligence
Teaches small computers to find information better.