Local Prompt Adaptation for Style-Consistent Multi-Object Generation in Diffusion Models
By: Ankit Sanjyal
Potential Business Impact:
Makes AI images match style and objects better.
Diffusion models have become a powerful backbone for text-to-image generation, enabling users to synthesize high-quality visuals from natural language prompts. However, they often struggle with complex prompts involving multiple objects and global or local style specifications. In such cases, the generated scenes tend to lack style uniformity and spatial coherence, limiting their utility in creative and controllable content generation. In this paper, we propose a simple, training-free architectural method called Local Prompt Adaptation (LPA). Our method decomposes the prompt into content and style tokens, and injects them selectively into the U-Net's attention layers at different stages. By conditioning object tokens early and style tokens later in the generation process, LPA enhances both layout control and stylistic consistency. We evaluate our method on a custom benchmark of 50 style-rich prompts across five categories and compare against strong baselines including Composer, MultiDiffusion, Attend-and-Excite, LoRA, and SDXL. Our approach outperforms prior work on both CLIP score and style consistency metrics, offering a new direction for controllable, expressive diffusion-based generation.
Similar Papers
Local Prompt Adaptation for Style-Consistent Multi-Object Generation in Diffusion Models
CV and Pattern Recognition
Makes AI pictures match words and styles better.
Style Composition within Distinct LoRA modules for Traditional Art
CV and Pattern Recognition
Mixes different art styles in one picture.
PromptLNet: Region-Adaptive Aesthetic Enhancement via Prompt Guidance in Low-Light Enhancement Net
CV and Pattern Recognition
Makes dark pictures look good, not just bright.