Probabilistic Reasoning with LLMs for k-anonymity Estimation
By: Jonathan Zheng , Sauvik Das , Alan Ritter and more
Potential Business Impact:
Helps computers guess how private your writing is.
Probabilistic reasoning is a key aspect of both human and artificial intelligence that allows for handling uncertainty and ambiguity in decision-making. In this paper, we introduce a new numerical reasoning task under uncertainty for large language models, focusing on estimating the privacy risk of user-generated documents containing privacy-sensitive information. We propose BRANCH, a new LLM methodology that estimates the k-privacy value of a text-the size of the population matching the given information. BRANCH factorizes a joint probability distribution of personal information as random variables. The probability of each factor in a population is estimated separately using a Bayesian network and combined to compute the final k-value. Our experiments show that this method successfully estimates the k-value 73% of the time, a 13% increase compared to o3-mini with chain-of-thought reasoning. We also find that LLM uncertainty is a good indicator for accuracy, as high-variance predictions are 37.47% less accurate on average.
Similar Papers
Reasoning Under Uncertainty: Exploring Probabilistic Reasoning Capabilities of LLMs
Computation and Language
Helps computers understand and use probability better.
Don't Miss the Forest for the Trees: In-Depth Confidence Estimation for LLMs via Reasoning over the Answer Space
Computation and Language
Helps AI know how sure it is about answers.
Exploring the Potential for Large Language Models to Demonstrate Rational Probabilistic Beliefs
Artificial Intelligence
Makes AI understand "maybe" better for trust.