Automated Privacy Information Annotation in Large Language Model Interactions
By: Hang Zeng , Xiangyu Liu , Yong Hu and more
Potential Business Impact:
Warns you when you share private info online.
Users interacting with large language models (LLMs) under their real identifiers often unknowingly risk disclosing private information. Automatically notifying users whether their queries leak privacy and which phrases leak what private information has therefore become a practical need. Existing privacy detection methods, however, were designed for different objectives and application domains, typically tagging personally identifiable information (PII) in anonymous content, which is insufficient in real-name interaction scenarios with LLMs. In this work, to support the development and evaluation of privacy detection models for LLM interactions that are deployable on local user devices, we construct a large-scale multilingual dataset with 249K user queries and 154K annotated privacy phrases. In particular, we build an automated privacy annotation pipeline with strong LLMs to automatically extract privacy phrases from dialogue datasets and annotate leaked information. We also design evaluation metrics at the levels of privacy leakage, extracted privacy phrase, and privacy information. We further establish baseline methods using light-weight LLMs with both tuning-free and tuning-based methods, and report a comprehensive evaluation of their performance. Evaluation results reveal a gap between current performance and the requirements of real-world LLM applications, motivating future research into more effective local privacy detection methods grounded in our dataset.
Similar Papers
Large Language Model Empowered Privacy-Protected Framework for PHI Annotation in Clinical Notes
Cryptography and Security
Keeps patient secrets safe in doctor notes.
User Behavior Analysis in Privacy Protection with Large Language Models: A Study on Privacy Preferences with Limited Data
Cryptography and Security
Protects your online secrets with less data.
A Survey on Privacy Risks and Protection in Large Language Models
Cryptography and Security
Keeps your secrets safe from smart computer programs.