Efficient Multilingual Name Type Classification Using Convolutional Networks
By: Davor Lauc
Potential Business Impact:
Identifies names and their language super fast.
We present a convolutional neural network approach for classifying proper names by language and entity type. Our model, Onomas-CNN X, combines parallel convolution branches with depthwise-separable operations and hierarchical classification to process names efficiently on CPU hardware. We evaluate the architecture on a large multilingual dataset covering 104 languages and four entity types (person, organization, location, other). Onomas-CNN X achieves 92.1% accuracy while processing 2,813 names per second on a single CPU core - 46 times faster than fine-tuned XLM-RoBERTa with comparable accuracy. The model reduces energy consumption by a factor of 46 compared to transformer baselines. Our experiments demonstrate that specialized CNN architectures remain competitive with large pre-trained models for focused NLP tasks when sufficient training data exists.
Similar Papers
What Matters When Building Universal Multilingual Named Entity Recognition Models?
Computation and Language
Helps computers understand names in over 100 languages.
Advancing Text Classification with Large Language Models and Neural Attention Mechanisms
Computation and Language
Helps computers understand and sort text better.
Enhancing Neural Spoken Language Recognition: An Exploration with Multilingual Datasets
Sound
Lets computers understand many languages spoken.