Score: 1

LayerCraft: Enhancing Text-to-Image Generation with CoT Reasoning and Layered Object Integration

Published: March 25, 2025 | arXiv ID: 2504.00010v2

By: Yuyao Zhang, Jinghao Li, Yu-Wing Tai

Potential Business Impact:

Makes AI create and edit pictures with more control.

Business Areas:

Image Recognition Data and Analytics, Software

Text-to-image (T2I) generation has made remarkable progress, yet existing systems still lack intuitive control over spatial composition, object consistency, and multi-step editing. We present $\textbf{LayerCraft}$, a modular framework that uses large language models (LLMs) as autonomous agents to orchestrate structured, layered image generation and editing. LayerCraft supports two key capabilities: (1) $\textit{structured generation}$ from simple prompts via chain-of-thought (CoT) reasoning, enabling it to decompose scenes, reason about object placement, and guide composition in a controllable, interpretable manner; and (2) $\textit{layered object integration}$, allowing users to insert and customize objects -- such as characters or props -- across diverse images or scenes while preserving identity, context, and style. The system comprises a coordinator agent, the $\textbf{ChainArchitect}$ for CoT-driven layout planning, and the $\textbf{Object Integration Network (OIN)}$ for seamless image editing using off-the-shelf T2I models without retraining. Through applications like batch collage editing and narrative scene generation, LayerCraft empowers non-experts to iteratively design, customize, and refine visual content with minimal manual effort. Code will be released at https://github.com/PeterYYZhang/LayerCraft.

LayerComposer: Interactive Personalized T2I via Spatially-Aware Layered Canvas

CV and Pattern Recognition

Lets you easily put many things into one picture.

23 Oct 2025 1

88%

ImageGen-CoT: Enhancing Text-to-Image In-context Learning with Chain-of-Thought Reasoning

CV and Pattern Recognition

Makes AI draw better pictures from descriptions.

25 Mar 2025 0

87%

T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT

CV and Pattern Recognition

Makes AI create better pictures from words.

1 May 2025 2

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Repos / Data Links

github.com github.com github.com

Page Count

26 pages

LayerCraft: Enhancing Text-to-Image Generation with CoT Reasoning and Layered Object Integration

Makes AI create and edit pictures with more control.

Technical Abstract

LayerComposer: Interactive Personalized T2I via Spatially-Aware Layered Canvas

ImageGen-CoT: Enhancing Text-to-Image In-context Learning with Chain-of-Thought Reasoning

T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT