Score: 2

PSDiffusion: Harmonized Multi-Layer Image Generation via Layout and Appearance Alignment

Published: May 16, 2025 | arXiv ID: 2505.11468v1

By: Dingbang Huang , Wenbo Li , Yifei Zhao and more

Potential Business Impact:

Creates layered pictures with real-looking shadows.

Business Areas:

Visual Search Internet Services

Diffusion models have made remarkable advancements in generating high-quality images from textual descriptions. Recent works like LayerDiffuse have extended the previous single-layer, unified image generation paradigm to transparent image layer generation. However, existing multi-layer generation methods fail to handle the interactions among multiple layers such as rational global layout, physics-plausible contacts and visual effects like shadows and reflections while maintaining high alpha quality. To solve this problem, we propose PSDiffusion, a unified diffusion framework for simultaneous multi-layer text-to-image generation. Our model can automatically generate multi-layer images with one RGB background and multiple RGBA foregrounds through a single feed-forward process. Unlike existing methods that combine multiple tools for post-decomposition or generate layers sequentially and separately, our method introduces a global-layer interactive mechanism that generates layered-images concurrently and collaboratively, ensuring not only high quality and completeness for each layer, but also spatial and visual interactions among layers for global coherence.