Score: 2

Training-free Dense-Aligned Diffusion Guidance for Modular Conditional Image Synthesis

Published: April 2, 2025 | arXiv ID: 2504.01515v2

By: Zixuan Wang , Duo Peng , Feng Chen and more

Potential Business Impact:

Makes pictures from words, shapes, and moving things.

Business Areas:

Image Recognition Data and Analytics, Software

Conditional image synthesis is a crucial task with broad applications, such as artistic creation and virtual reality. However, current generative methods are often task-oriented with a narrow scope, handling a restricted condition with constrained applicability. In this paper, we propose a novel approach that treats conditional image synthesis as the modular combination of diverse fundamental condition units. Specifically, we divide conditions into three primary units: text, layout, and drag. To enable effective control over these conditions, we design a dedicated alignment module for each. For the text condition, we introduce a Dense Concept Alignment (DCA) module, which achieves dense visual-text alignment by drawing on diverse textual concepts. For the layout condition, we propose a Dense Geometry Alignment (DGA) module to enforce comprehensive geometric constraints that preserve the spatial configuration. For the drag condition, we introduce a Dense Motion Alignment (DMA) module to apply multi-level motion regularization, ensuring that each pixel follows its desired trajectory without visual artifacts. By flexibly inserting and combining these alignment modules, our framework enhances the model's adaptability to diverse conditional generation tasks and greatly expands its application range. Extensive experiments demonstrate the superior performance of our framework across a variety of conditions, including textual description, segmentation mask (bounding box), drag manipulation, and their combinations. Code is available at https://github.com/ZixuanWang0525/DADG.

Conditional Data Synthesis Augmentation

Methodology

Makes computer learning fair for everyone.

10 Apr 2025 2

86%

Jointly Conditioned Diffusion Model for Multi-View Pose-Guided Person Image Synthesis

CV and Pattern Recognition

Creates realistic people from different angles.

19 Nov 2025 1

86%

DDAE++: Enhancing Diffusion Models Towards Unified Generative and Discriminative Learning

CV and Pattern Recognition

Makes AI better at creating and understanding pictures.

16 May 2025 0

View PDF Login to Bookmark

Country of Origin

🇦🇺 🇨🇳 🇸🇬 Singapore, China, Australia

Repos / Data Links

github.com

Page Count

11 pages

Training-free Dense-Aligned Diffusion Guidance for Modular Conditional Image Synthesis

Makes pictures from words, shapes, and moving things.

Technical Abstract

Conditional Data Synthesis Augmentation

Jointly Conditioned Diffusion Model for Multi-View Pose-Guided Person Image Synthesis

DDAE++: Enhancing Diffusion Models Towards Unified Generative and Discriminative Learning