Fine Tuning Methods for Low-resource Languages
By: Tim Bakkenes, Daniel Wang, Anton Johansson
Potential Business Impact:
Helps AI understand and use other languages better.
The rise of Large Language Models has not been inclusive of all cultures. The models are mostly trained on English texts and culture which makes them underperform in other languages and cultural contexts. By developing a generalizable method for preparing culturally relevant datasets and post-training the Gemma 2 model, this project aimed to increase the performance of Gemma 2 for an underrepresented language and showcase how others can do the same to unlock the power of Generative AI in their country and preserve their cultural heritage.
Similar Papers
Fine-Tuning LLMs for Low-Resource Dialect Translation: The Case of Lebanese
Computation and Language
Teaches computers to translate Arabic dialects better.
Overcoming Data Scarcity in Generative Language Modelling for Low-Resource Languages: A Systematic Review
Computation and Language
Helps computers talk in less common languages.
Empowering Smaller Models: Tuning LLaMA and Gemma with Chain-of-Thought for Ukrainian Exam Tasks
Computation and Language
Helps Ukrainian computers understand hard tests.