LatPhon: Lightweight Multilingual G2P for Romance Languages and English
By: Luis Felipe Chary, Miguel Arjona Ramirez
Potential Business Impact:
Lets computers speak and understand six languages.
Grapheme-to-phoneme (G2P) conversion is a key front-end for text-to-speech (TTS), automatic speech recognition (ASR), speech-to-speech translation (S2ST) and alignment systems, especially across multiple Latin-script languages.We present LatPhon, a 7.5 M - parameter Transformer jointly trained on six such languages--English, Spanish, French, Italian, Portuguese, and Romanian. On the public ipa-dict corpus, it attains a mean phoneme error rate (PER) of 3.5%, outperforming the byte-level ByT5 baseline (5.4%) and approaching language-specific WFSTs (3.2%) while occupying 30 MB of memory, which makes on-device deployment feasible when needed. These results indicate that compact multilingual G2P can serve as a universal front-end for Latin-language speech pipelines.
Similar Papers
Bridging the Gap: An Intermediate Language for Enhanced and Cost-Effective Grapheme-to-Phoneme Conversion with Homographs with Multiple Pronunciations Disambiguation
Computation and Language
Helps computers speak Persian words correctly.
Pronunciation-Lexicon Free Training for Phoneme-based Crosslingual ASR via Joint Stochastic Approximation
Audio and Speech Processing
Lets computers understand any language's speech.
LLM-based phoneme-to-grapheme for phoneme-based speech recognition
Sound
Makes computers understand spoken words better.