Comparing Approaches to Automatic Summarization in Less-Resourced Languages
By: Chester Palen-Michel, Constantine Lignos
Automatic text summarization has achieved high performance in high-resourced languages like English, but comparatively less attention has been given to summarization in less-resourced languages. This work compares a variety of different approaches to summarization from zero-shot prompting of LLMs large and small to fine-tuning smaller models like mT5 with and without three data augmentation approaches and multilingual transfer. We also explore an LLM translation pipeline approach, translating from the source language to English, summarizing and translating back. Evaluating with five different metrics, we find that there is variation across LLMs in their performance across similar parameter sizes, that our multilingual fine-tuned mT5 baseline outperforms most other approaches including zero-shot LLM performance for most metrics, and that LLM as judge may be less reliable on less-resourced languages.
Similar Papers
Large Language Models for the Summarization of Czech Documents: From History to the Present
Computation and Language
Makes computers understand old Czech writings.
Evaluating the Effectiveness of Large Language Models in Automated News Article Summarization
Artificial Intelligence
Helps companies quickly understand news about their suppliers.
Leveraging Large Language Models for Zero-shot Lay Summarisation in Biomedicine and Beyond
Computation and Language
Makes complex science easy for everyone to understand.