Score: 0

Evaluating the Quality of Randomness and Entropy in Tasks Supported by Large Language Models

Published: October 14, 2025 | arXiv ID: 2510.12080v1

By: Rabimba Karanjai , Yang Lu , Ranjith Chodavarapu and more

Potential Business Impact:

Computers can't reliably make random numbers yet.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

The rapid advancement of large language model (LLM) technology has led to diverse applications, many of which inherently require randomness, such as stochastic decision-making, gaming, scheduling, AI agents, and cryptography-related tasks. However, the capabilities of LLMs in handling randomness, particularly in generating and utilizing random numbers effectively, remain unclear. This paper investigates the capacity of LLMs for handling tasks that involve randomness through a series of experiments. We designed a set of experiments that consider various factors that can influence an LLM's performance in tasks involving randomness, such as accessibility to external tools, types of tasks, model states (fresh vs. non-fresh), and prompting strategies. The experiments cover a range of tasks, including generating random numbers, generating random strings such as passwords, shuffling items, and evaluating the quality of randomness using entropy and the NIST randomness test-suite. Our findings reveal that while LLMs can generate outputs that exhibit some degree of randomness, their performance is inconsistent and often deviates significantly from the expected behavior. The analysis of the experimental results highlights key limitations and areas where improvement is needed for the LLMs to effectively handle tasks involving randomness

Exploiting the Randomness of Large Language Models (LLM) in Text Classification Tasks: Locating Privileged Documents in Legal Matters

Information Retrieval

Makes AI better at finding secret legal papers.

8 Dec 2025 1

88%

Failure to Mix: Large language models struggle to answer according to desired probability distributions

Machine Learning (CS)

AI models can't follow simple chance rules.

18 Nov 2025 0

88%

Bayesian Evaluation of Large Language Model Behavior

Computation and Language

Measures AI honesty and safety more accurately.

4 Nov 2025 0

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

16 pages

Evaluating the Quality of Randomness and Entropy in Tasks Supported by Large Language Models

Computers can't reliably make random numbers yet.

Technical Abstract

Exploiting the Randomness of Large Language Models (LLM) in Text Classification Tasks: Locating Privileged Documents in Legal Matters

Failure to Mix: Large language models struggle to answer according to desired probability distributions

Bayesian Evaluation of Large Language Model Behavior