A First Context-Free Grammar Applied to Nawatl Corpora Augmentation
By: Juan-José Guzmán-Landa , Juan-Manuel Torres-Moreno , Miguel Figueroa-Saavedra and more
Potential Business Impact:
Helps computers learn an old language better.
In this article we introduce a context-free grammar (CFG) for the Nawatl language. Nawatl (or Nahuatl) is an Amerindian language of the $\pi$-language type, i.e. a language with few digital resources, in which the corpora available for machine learning are virtually non-existent. The objective here is to generate a significant number of grammatically correct artificial sentences, in order to increase the corpora available for language model training. We want to show that a grammar enables us significantly to expand a corpus in Nawatl which we call $\pi$-\textsc{yalli}. The corpus, thus enriched, enables us to train algorithms such as FastText and to evaluate them on sentence-level semantic tasks. Preliminary results show that by using the grammar, comparative improvements are achieved over some LLMs. However, it is observed that to achieve more significant improvement, grammars that model the Nawatl language even more effectively are required.
Similar Papers
Two CFG Nahuatl for automatic corpora expansion
Computation and Language
Helps computers learn a rare language.
A symbolic Perl algorithm for the unification of Nahuatl word spellings
Computation and Language
Makes old Nawatl writings easier to read.
Awal -- Community-Powered Language Technology for Tamazight
Computation and Language
Helps computers understand a rare language.