ReProCon: Scalable and Resource-Efficient Few-Shot Biomedical Named Entity Recognition
By: Jeongkyun Yoo, Nela Riddle, Andrew Hoblitzell
Potential Business Impact:
Helps computers understand rare medical words.
Named Entity Recognition (NER) in biomedical domains faces challenges due to data scarcity and imbalanced label distributions, especially with fine-grained entity types. We propose ReProCon, a novel few-shot NER framework that combines multi-prototype modeling, cosine-contrastive learning, and Reptile meta-learning to tackle these issues. By representing each category with multiple prototypes, ReProCon captures semantic variability, such as synonyms and contextual differences, while a cosine-contrastive objective ensures strong interclass separation. Reptile meta-updates enable quick adaptation with little data. Using a lightweight fastText + BiLSTM encoder with much lower memory usage, ReProCon achieves a macro-$F_1$ score close to BERT-based baselines (around 99 percent of BERT performance). The model remains stable with a label budget of 30 percent and only drops 7.8 percent in $F_1$ when expanding from 19 to 50 categories, outperforming baselines such as SpanProto and CONTaiNER, which see 10 to 32 percent degradation in Few-NERD. Ablation studies highlight the importance of multi-prototype modeling and contrastive learning in managing class imbalance. Despite difficulties with label ambiguity, ReProCon demonstrates state-of-the-art performance in resource-limited settings, making it suitable for biomedical applications.
Similar Papers
RePro: Training Language Models to Faithfully Recycle the Web for Pretraining
Computation and Language
Makes AI smarter with less data.
RePro: Leveraging Large Language Models for Semi-Automated Reproduction of Networking Research Results
Networking and Internet Architecture
Helps computers rebuild network programs from papers.
Effective Multi-Task Learning for Biomedical Named Entity Recognition
Computation and Language
Helps computers find medical words in text.