Score: 1

Local Obfuscation by GLINER for Impartial Context Aware Lineage: Development and evaluation of PII Removal system

Published: October 22, 2025 | arXiv ID: 2510.19346v1

By: Prakrithi Shivaprakash , Lekhansh Shukla , Animesh Mukherjee and more

Potential Business Impact:

Cleans patient notes for safe research.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Removing Personally Identifiable Information (PII) from clinical notes in Electronic Health Records (EHRs) is essential for research and AI development. While Large Language Models (LLMs) are powerful, their high computational costs and the data privacy risks of API-based services limit their use, especially in low-resource settings. To address this, we developed LOGICAL (Local Obfuscation by GLINER for Impartial Context-Aware Lineage), an efficient, locally deployable PII removal system built on a fine-tuned Generalist and Lightweight Named Entity Recognition (GLiNER) model. We used 1515 clinical documents from a psychiatric hospital's EHR system. We defined nine PII categories for removal. A modern-gliner-bi-large-v1.0 model was fine-tuned on 2849 text instances and evaluated on a test set of 376 instances using character-level precision, recall, and F1-score. We compared its performance against Microsoft Azure NER, Microsoft Presidio, and zero-shot prompting with Gemini-Pro-2.5 and Llama-3.3-70B-Instruct. The fine-tuned GLiNER model achieved superior performance, with an overall micro-average F1-score of 0.980, significantly outperforming Gemini-Pro-2.5 (F1-score: 0.845). LOGICAL correctly sanitised 95% of documents completely, compared to 64% for the next-best solution. The model operated efficiently on a standard laptop without a dedicated GPU. However, a 2% entity-level false negative rate underscores the need for human-in-the-loop validation across all tested systems. Fine-tuned, specialised transformer models like GLiNER offer an accurate, computationally efficient, and secure solution for PII removal from clinical notes. This "sanitisation at the source" approach is a practical alternative to resource-intensive LLMs, enabling the creation of de-identified datasets for research and AI development while preserving data privacy, particularly in resource-constrained environments.

PRvL: Quantifying the Capabilities and Risks of Large Language Models for PII Redaction

Cryptography and Security

Hides private words in texts safely.

7 Aug 2025 0

88%

Adaptive PII Mitigation Framework for Large Language Models

Machine Learning (CS)

Keeps private information safe in smart computer programs.

21 Jan 2025 1

88%

LegalGuardian: A Privacy-Preserving Framework for Secure Integration of Large Language Models in Legal Practice

Computation and Language

Keeps client secrets safe when lawyers use AI.

19 Jan 2025 0

View PDF Login to Bookmark

Repos / Data Links

github.com

Page Count

30 pages

Local Obfuscation by GLINER for Impartial Context Aware Lineage: Development and evaluation of PII Removal system

Cleans patient notes for safe research.

Technical Abstract

PRvL: Quantifying the Capabilities and Risks of Large Language Models for PII Redaction

Adaptive PII Mitigation Framework for Large Language Models

LegalGuardian: A Privacy-Preserving Framework for Secure Integration of Large Language Models in Legal Practice