Score: 0

ESNERA: Empirical and semantic named entity alignment for named entity dataset merging

Published: August 9, 2025 | arXiv ID: 2508.06877v1

By: Xiaobo Zhang , Congqing He , Ying He and more

Potential Business Impact:

Combines text data to improve computer understanding.

Named Entity Recognition (NER) is a fundamental task in natural language processing. It remains a research hotspot due to its wide applicability across domains. Although recent advances in deep learning have significantly improved NER performance, they rely heavily on large, high-quality annotated datasets. However, building these datasets is expensive and time-consuming, posing a major bottleneck for further research. Current dataset merging approaches mainly focus on strategies like manual label mapping or constructing label graphs, which lack interpretability and scalability. To address this, we propose an automatic label alignment method based on label similarity. The method combines empirical and semantic similarities, using a greedy pairwise merging strategy to unify label spaces across different datasets. Experiments are conducted in two stages: first, merging three existing NER datasets into a unified corpus with minimal impact on NER performance; second, integrating this corpus with a small-scale, self-built dataset in the financial domain. The results show that our method enables effective dataset merging and enhances NER performance in the low-resource financial domain. This study presents an efficient, interpretable, and scalable solution for integrating multi-source NER corpora.

CyberNER: A Harmonized STIX Corpus for Cybersecurity Named Entity Recognition

Cryptography and Security

Makes computer security smarter by organizing data.

30 Oct 2025 0

87%

SEDA: A Self-Adapted Entity-Centric Data Augmentation for Boosting Gird-based Discontinuous NER Models

Computation and Language

Helps computers find tricky, broken-up words.

25 Nov 2025 1

87%

FiNERweb: Datasets and Artifacts for Scalable Multilingual Named Entity Recognition

Computation and Language

Teaches computers to find names in many languages.

15 Dec 2025 3

View PDF Login to Bookmark

Country of Origin

🇲🇾 Malaysia

Page Count

30 pages

ESNERA: Empirical and semantic named entity alignment for named entity dataset merging

Combines text data to improve computer understanding.

Technical Abstract

CyberNER: A Harmonized STIX Corpus for Cybersecurity Named Entity Recognition

SEDA: A Self-Adapted Entity-Centric Data Augmentation for Boosting Gird-based Discontinuous NER Models

FiNERweb: Datasets and Artifacts for Scalable Multilingual Named Entity Recognition