Score: 0

Chain-of-Image Generation: Toward Monitorable and Controllable Image Generation

Published: December 9, 2025 | arXiv ID: 2512.08645v1

By: Young Kyung Kim , Oded Schlesinger , Yuzhou Zhao and more

While state-of-the-art image generation models achieve remarkable visual quality, their internal generative processes remain a "black box." This opacity limits human observation and intervention, and poses a barrier to ensuring model reliability, safety, and control. Furthermore, their non-human-like workflows make them difficult for human observers to interpret. To address this, we introduce the Chain-of-Image Generation (CoIG) framework, which reframes image generation as a sequential, semantic process analogous to how humans create art. Similar to the advantages in monitorability and performance that Chain-of-Thought (CoT) brought to large language models (LLMs), CoIG can produce equivalent benefits in text-to-image generation. CoIG utilizes an LLM to decompose a complex prompt into a sequence of simple, step-by-step instructions. The image generation model then executes this plan by progressively generating and editing the image. Each step focuses on a single semantic entity, enabling direct monitoring. We formally assess this property using two novel metrics: CoIG Readability, which evaluates the clarity of each intermediate step via its corresponding output; and Causal Relevance, which quantifies the impact of each procedural step on the final generated image. We further show that our framework mitigates entity collapse by decomposing the complex generation task into simple subproblems, analogous to the procedural reasoning employed by CoT. Our experimental results indicate that CoIG substantially enhances quantitative monitorability while achieving competitive compositional robustness compared to established baseline models. The framework is model-agnostic and can be integrated with any image generation model.

Visual-CoG: Stage-Aware Reinforcement Learning with Chain of Guidance for Text-to-Image Generation

CV and Pattern Recognition

Makes AI draw better pictures from words.

25 Aug 2025 0

91%

Visual-CoG: Stage-Aware Reinforcement Learning with Chain of Guidance for Text-to-Image Generation

CV and Pattern Recognition

Makes AI draw better pictures from words.

25 Aug 2025 0

89%

Generating Storytelling Images with Rich Chains-of-Reasoning

CV and Pattern Recognition

AI creates pictures that tell a whole story.

8 Dec 2025 2

View PDF Login to Bookmark

Chain-of-Image Generation: Toward Monitorable and Controllable Image Generation

Technical Abstract

Visual-CoG: Stage-Aware Reinforcement Learning with Chain of Guidance for Text-to-Image Generation

Visual-CoG: Stage-Aware Reinforcement Learning with Chain of Guidance for Text-to-Image Generation

Generating Storytelling Images with Rich Chains-of-Reasoning