Score: 0

A systematic assessment of Large Language Models for constructing two-level fractional factorial designs

Published: December 18, 2025 | arXiv ID: 2512.17113v1

By: Alan R. Vazquez, Kilian M. Rother, Marco V. Charles-Gonzalez

Two-level fractional factorial designs permit the study multiple factors using a limited number of runs. Traditionally, these designs are obtained from catalogs available in standard textbooks or statistical software. However, modern Large Language Models (LLMs) can now produce two-level fractional factorial designs, but the quality of these designs has not been previously assessed. In this paper, we perform a systematic evaluation of two popular classes of LLMs, namely GPT and Gemini models, to construct two-level fractional factorial designs with 8, 16, and 32 runs, and 4 to 26 factors. To this end, we use prompting techniques to develop a high-quality set of design construction tasks for the LLMs. We compare the designs obtained by the LLMs with the best-known designs in terms of resolution and minimum aberration criteria. We show that the LLMs can effectively construct optimal 8-, 16-, and 32-run designs with up to eight factors.

Delving Into the Psychology of Machines: Exploring the Structure of Self-Regulated Learning via LLM-Generated Survey Responses

Artificial Intelligence

Computers can pretend to be students learning.

16 Jun 2025 0

88%

Large Language Models for Fault Localization: An Empirical Study

Software Engineering

Finds bugs in computer code faster.

23 Oct 2025 0

88%

Prompt perturbation and fraction facilitation sometimes strengthen Large Language Model scores

Digital Libraries

Helps computers judge research quality better.

1 Dec 2025 1

View PDF Login to Bookmark

A systematic assessment of Large Language Models for constructing two-level fractional factorial designs

Technical Abstract

Delving Into the Psychology of Machines: Exploring the Structure of Self-Regulated Learning via LLM-Generated Survey Responses

Large Language Models for Fault Localization: An Empirical Study

Prompt perturbation and fraction facilitation sometimes strengthen Large Language Model scores