Score: 1

The Intricate Dance of Prompt Complexity, Quality, Diversity, and Consistency in T2I Models

Published: October 22, 2025 | arXiv ID: 2510.19557v1

By: Xiaofeng Zhang , Aaron Courville , Michal Drozdzal and more

Potential Business Impact:

Makes computer art better by changing word instructions.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Text-to-image (T2I) models offer great potential for creating virtually limitless synthetic data, a valuable resource compared to fixed and finite real datasets. Previous works evaluate the utility of synthetic data from T2I models on three key desiderata: quality, diversity, and consistency. While prompt engineering is the primary means of interacting with T2I models, the systematic impact of prompt complexity on these critical utility axes remains underexplored. In this paper, we first conduct synthetic experiments to motivate the difficulty of generalization w.r.t. prompt complexity and explain the observed difficulty with theoretical derivations. Then, we introduce a new evaluation framework that can compare the utility of real data and synthetic data, and present a comprehensive analysis of how prompt complexity influences the utility of synthetic data generated by commonly used T2I models. We conduct our study across diverse datasets, including CC12M, ImageNet-1k, and DCI, and evaluate different inference-time intervention methods. Our synthetic experiments show that generalizing to more general conditions is harder than the other way round, since the former needs an estimated likelihood that is not learned by diffusion models. Our large-scale empirical experiments reveal that increasing prompt complexity results in lower conditional diversity and prompt consistency, while reducing the synthetic-to-real distribution shift, which aligns with the synthetic experiments. Moreover, current inference-time interventions can augment the diversity of the generations at the expense of moving outside the support of real data. Among those interventions, prompt expansion, by deliberately using a pre-trained language model as a likelihood estimator, consistently achieves the highest performance in both image diversity and aesthetics, even higher than that of real data.

Iterative Refinement Improves Compositional Image Generation

CV and Pattern Recognition

Makes AI draw pictures that match tricky instructions.

21 Jan 2026 0

91%

Easier Painting Than Thinking: Can Text-to-Image Models Set the Stage, but Not Direct the Play?

CV and Pattern Recognition

Tests how well AI makes pictures from words.

3 Sep 2025 2

90%

Exploring Language Patterns of Prompts in Text-to-Image Generation and Their Impact on Visual Diversity

Human-Computer Interaction

Makes AI art more creative and less repetitive.

19 Apr 2025 1

View PDF Login to Bookmark

Repos / Data Links

github.com github.com

Page Count

34 pages

The Intricate Dance of Prompt Complexity, Quality, Diversity, and Consistency in T2I Models

Makes computer art better by changing word instructions.

Technical Abstract

Iterative Refinement Improves Compositional Image Generation

Easier Painting Than Thinking: Can Text-to-Image Models Set the Stage, but Not Direct the Play?

Exploring Language Patterns of Prompts in Text-to-Image Generation and Their Impact on Visual Diversity