Score: 1

Harnessing Large Language Models for Biomedical Named Entity Recognition

Published: December 28, 2025 | arXiv ID: 2512.22738v1

By: Jian Chen, Leilei Su, Cong Sun

Potential Business Impact:

Makes computers understand medical words better.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Background and Objective: Biomedical Named Entity Recognition (BioNER) is a foundational task in medical informatics, crucial for downstream applications like drug discovery and clinical trial matching. However, adapting general-domain Large Language Models (LLMs) to this task is often hampered by their lack of domain-specific knowledge and the performance degradation caused by low-quality training data. To address these challenges, we introduce BioSelectTune, a highly efficient, data-centric framework for fine-tuning LLMs that prioritizes data quality over quantity. Methods and Results: BioSelectTune reformulates BioNER as a structured JSON generation task and leverages our novel Hybrid Superfiltering strategy, a weak-to-strong data curation method that uses a homologous weak model to distill a compact, high-impact training dataset. Conclusions: Through extensive experiments, we demonstrate that BioSelectTune achieves state-of-the-art (SOTA) performance across multiple BioNER benchmarks. Notably, our model, trained on only 50% of the curated positive data, not only surpasses the fully-trained baseline but also outperforms powerful domain-specialized models like BioMedBERT.

A Unified Biomedical Named Entity Recognition Framework with Large Language Models

Computation and Language

Helps doctors find important words in medical texts.

10 Oct 2025 2

90%

MedBioLM: Optimizing Medical and Biological QA with Fine-Tuned Large Language Models and Retrieval-Augmented Generation

Computation and Language

Helps doctors answer hard medical questions better.

5 Feb 2025 0

89%

GLiNER-BioMed: A Suite of Efficient Models for Open Biomedical Named Entity Recognition

Computation and Language

Finds new medical words automatically.

1 Apr 2025 2

View PDF Login to Bookmark

Page Count

9 pages

Harnessing Large Language Models for Biomedical Named Entity Recognition

Makes computers understand medical words better.

Technical Abstract

A Unified Biomedical Named Entity Recognition Framework with Large Language Models

MedBioLM: Optimizing Medical and Biological QA with Fine-Tuned Large Language Models and Retrieval-Augmented Generation

GLiNER-BioMed: A Suite of Efficient Models for Open Biomedical Named Entity Recognition