Score: 1

Local Prompt Adaptation for Style-Consistent Multi-Object Generation in Diffusion Models

Published: July 27, 2025 | arXiv ID: 2507.20094v1

By: Ankit Sanjyal

Potential Business Impact:

Makes AI images match style and objects better.

Business Areas:
Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Diffusion models have become a powerful backbone for text-to-image generation, enabling users to synthesize high-quality visuals from natural language prompts. However, they often struggle with complex prompts involving multiple objects and global or local style specifications. In such cases, the generated scenes tend to lack style uniformity and spatial coherence, limiting their utility in creative and controllable content generation. In this paper, we propose a simple, training-free architectural method called Local Prompt Adaptation (LPA). Our method decomposes the prompt into content and style tokens, and injects them selectively into the U-Net's attention layers at different stages. By conditioning object tokens early and style tokens later in the generation process, LPA enhances both layout control and stylistic consistency. We evaluate our method on a custom benchmark of 50 style-rich prompts across five categories and compare against strong baselines including Composer, MultiDiffusion, Attend-and-Excite, LoRA, and SDXL. Our approach outperforms prior work on both CLIP score and style consistency metrics, offering a new direction for controllable, expressive diffusion-based generation.

Country of Origin
πŸ‡ΊπŸ‡Έ United States

Repos / Data Links

Page Count
10 pages

Category
Computer Science:
CV and Pattern Recognition