MiniLingua: A Small Open-Source LLM for European Languages
By: Anna Aksenova , Boris Zverkov , Nicola Dainese and more
Potential Business Impact:
Makes AI understand many languages on your phone.
Large language models are powerful but often limited by high computational cost, privacy concerns, and English-centric training. Recent progress demonstrates that small, efficient models with around one billion parameters can deliver strong results and enable on-device use. This paper introduces MiniLingua, a multilingual open-source LLM of one billion parameters trained from scratch for 13 European languages, designed to balance coverage and instruction-following capabilities. Based on evaluation results, the instruction-tuned version of MiniLingua outperforms EuroLLM, a model with a similar training approach but a larger training budget, on summarization, classification and both open- and closed-book question answering. Moreover, it remains competitive with more advanced state-of-the-art models on open-ended generation tasks. We release model weights, tokenizer and source code used for data processing and model training.
Similar Papers
EuroLLM-9B: Technical Report
Computation and Language
Helps computers understand many European languages.
Fine-tuning of lightweight large language models for sentiment classification on heterogeneous financial textual data
Computation and Language
Small AI models understand money news well.
Babel: Open Multilingual Large Language Models Serving Over 90% of Global Speakers
Computation and Language
Helps computers understand many more languages.