Score: 1

Database Entity Recognition with Data Augmentation and Deep Learning

Published: August 26, 2025 | arXiv ID: 2508.19372v1

By: Zikun Fu , Chen Yang , Kourosh Davoudi and more

Potential Business Impact:

Helps computers understand questions about data.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

This paper addresses the challenge of Database Entity Recognition (DB-ER) in Natural Language Queries (NLQ). We present several key contributions to advance this field: (1) a human-annotated benchmark for DB-ER task, derived from popular text-to-sql benchmarks, (2) a novel data augmentation procedure that leverages automatic annotation of NLQs based on the corresponding SQL queries which are available in popular text-to-SQL benchmarks, (3) a specialized language model based entity recognition model using T5 as a backbone and two down-stream DB-ER tasks: sequence tagging and token classification for fine-tuning of backend and performing DB-ER respectively. We compared our DB-ER tagger with two state-of-the-art NER taggers, and observed better performance in both precision and recall for our model. The ablation evaluation shows that data augmentation boosts precision and recall by over 10%, while fine-tuning of the T5 backbone boosts these metrics by 5-10%.

SEDA: A Self-Adapted Entity-Centric Data Augmentation for Boosting Gird-based Discontinuous NER Models

Computation and Language

Helps computers find tricky, broken-up words.

25 Nov 2025 1

88%

A Unified Biomedical Named Entity Recognition Framework with Large Language Models

Computation and Language

Helps doctors find important words in medical texts.

10 Oct 2025 2

88%

DeKeyNLU: Enhancing Natural Language to SQL Generation through Task Decomposition and Keyword Extraction

Artificial Intelligence

Lets anyone ask computers questions using normal words.

18 Sep 2025 1

View PDF Login to Bookmark

Country of Origin

🇺🇸 🇨🇦 United States, Canada

Page Count

6 pages

Database Entity Recognition with Data Augmentation and Deep Learning

Helps computers understand questions about data.

Technical Abstract

SEDA: A Self-Adapted Entity-Centric Data Augmentation for Boosting Gird-based Discontinuous NER Models

A Unified Biomedical Named Entity Recognition Framework with Large Language Models

DeKeyNLU: Enhancing Natural Language to SQL Generation through Task Decomposition and Keyword Extraction