TiME: Tiny Monolingual Encoders for Efficient NLP Pipelines
By: David Schulmeister , Valentin Hartmann , Lars Klein and more
Potential Business Impact:
Makes computer language tasks faster and use less power.
Today, a lot of research on language models is focused on large, general-purpose models. However, many NLP pipelines only require models with a well-defined, small set of capabilities. While large models are capable of performing the tasks of those smaller models, they are simply not fast enough to process large amounts of data or offer real-time responses. Furthermore, they often use unnecessarily large amounts of energy, leading to sustainability concerns and problems when deploying them on battery-powered devices. In our work, we show how to train small models for such efficiency-critical applications. As opposed to many off-the-shelf NLP pipelines, our models use modern training techniques such as distillation, and offer support for low-resource languages. We call our models TiME (Tiny Monolingual Encoders) and comprehensively evaluate them on a range of common NLP tasks, observing an improved trade-off between benchmark performance on one hand, and throughput, latency and energy consumption on the other. Along the way, we show that distilling monolingual models from multilingual teachers is possible, and likewise distilling models with absolute positional embeddings from teachers with relative positional embeddings.
Similar Papers
Beyond Decoder-only: Large Language Models Can be Good Encoders for Machine Translation
Computation and Language
Makes computer translation faster and uses less memory.
From Large to Super-Tiny: End-to-End Optimization for Cost-Efficient LLMs
Computation and Language
Makes smart computer programs cheaper and faster.
Regional Tiny Stories: Using Small Models to Compare Language Learning and Tokenizer Performance
Computation and Language
Helps small computers understand Indian languages.