Local LLM Ensembles for Zero-shot Portuguese Named Entity Recognition
By: João Lucas Luz Lima Sarcinelli, Diego Furtado Silva
Potential Business Impact:
Helps computers find names in text better.
Large Language Models (LLMs) excel in many Natural Language Processing (NLP) tasks through in-context learning but often under-perform in Named Entity Recognition (NER), especially for lower-resource languages like Portuguese. While open-weight LLMs enable local deployment, no single model dominates all tasks, motivating ensemble approaches. However, existing LLM ensembles focus on text generation or classification, leaving NER under-explored. In this context, this work proposes a novel three-step ensemble pipeline for zero-shot NER using similarly capable, locally run LLMs. Our method outperforms individual LLMs in four out of five Portuguese NER datasets by leveraging a heuristic to select optimal model combinations with minimal annotated data. Moreover, we show that ensembles obtained on different source datasets generally outperform individual LLMs in cross-dataset configurations, potentially eliminating the need for annotated data for the current task. Our work advances scalable, low-resource, and zero-shot NER by effectively combining multiple small LLMs without fine-tuning. Code is available at https://github.com/Joao-Luz/local-llm-ner-ensemble.
Similar Papers
Named Entity Recognition of Historical Text via Large Language Model
Digital Libraries
Helps computers find names in old writings.
LLM-based Prompt Ensemble for Reliable Medical Entity Recognition from EHRs
Artificial Intelligence
Helps doctors find important patient info faster.
A Unified Biomedical Named Entity Recognition Framework with Large Language Models
Computation and Language
Helps doctors find important words in medical texts.