Score: 0

PlanGen: Towards Unified Layout Planning and Image Generation in Auto-Regressive Vision Language Models

Published: March 13, 2025 | arXiv ID: 2503.10127v2

By: Runze He , Bo Cheng , Yuhang Ma and more

Potential Business Impact:

Creates pictures from layout plans and text.

Business Areas:

Image Recognition Data and Analytics, Software

In this paper, we propose a unified layout planning and image generation model, PlanGen, which can pre-plan spatial layout conditions before generating images. Unlike previous diffusion-based models that treat layout planning and layout-to-image as two separate models, PlanGen jointly models the two tasks into one autoregressive transformer using only next-token prediction. PlanGen integrates layout conditions into the model as context without requiring specialized encoding of local captions and bounding box coordinates, which provides significant advantages over the previous embed-and-pool operations on layout conditions, particularly when dealing with complex layouts. Unified prompting allows PlanGen to perform multitasking training related to layout, including layout planning, layout-to-image generation, image layout understanding, etc. In addition, PlanGen can be seamlessly expanded to layout-guided image manipulation thanks to the well-designed modeling, with teacher-forcing content manipulation policy and negative layout guidance. Extensive experiments verify the effectiveness of our PlanGen in multiple layoutrelated tasks, showing its great potential. Code is available at: https://360cvgroup.github.io/PlanGen.

Image Generation as a Visual Planner for Robotic Manipulation

CV and Pattern Recognition

Lets computers plan robot actions by watching videos.

29 Nov 2025 1

88%

CogniPlan: Uncertainty-Guided Path Planning with Conditional Generative Layout Prediction

Robotics

Helps robots explore and move in new places.

5 Aug 2025 1

87%

Nexus-Gen: Unified Image Understanding, Generation, and Editing via Prefilled Autoregression in Shared Embedding Space

CV and Pattern Recognition

Makes computers understand, create, and change pictures.

30 Apr 2025 3

View PDF Login to Bookmark

Page Count

15 pages

PlanGen: Towards Unified Layout Planning and Image Generation in Auto-Regressive Vision Language Models

Creates pictures from layout plans and text.

Technical Abstract

Image Generation as a Visual Planner for Robotic Manipulation

CogniPlan: Uncertainty-Guided Path Planning with Conditional Generative Layout Prediction

Nexus-Gen: Unified Image Understanding, Generation, and Editing via Prefilled Autoregression in Shared Embedding Space