Structure-Preserving Zero-Shot Image Editing via Stage-Wise Latent Injection in Diffusion Models
By: Dasol Jeong , Donggoo Kang , Jiwon Park and more
Potential Business Impact:
Changes pictures using words and other pictures.
We propose a diffusion-based framework for zero-shot image editing that unifies text-guided and reference-guided approaches without requiring fine-tuning. Our method leverages diffusion inversion and timestep-specific null-text embeddings to preserve the structural integrity of the source image. By introducing a stage-wise latent injection strategy-shape injection in early steps and attribute injection in later steps-we enable precise, fine-grained modifications while maintaining global consistency. Cross-attention with reference latents facilitates semantic alignment between the source and reference. Extensive experiments across expression transfer, texture transformation, and style infusion demonstrate state-of-the-art performance, confirming the method's scalability and adaptability to diverse image editing scenarios.
Similar Papers
Energy-Guided Optimization for Personalized Image Editing with Pretrained Text-to-Image Diffusion Models
CV and Pattern Recognition
Changes pictures to match your exact ideas.
Contrastive Learning Guided Latent Diffusion Model for Image-to-Image Translation
CV and Pattern Recognition
Changes pictures using words without retraining.
Tuning-Free Image Editing with Fidelity and Editability via Unified Latent Diffusion Model
CV and Pattern Recognition
Makes editing pictures from words better.