Score: 1

Structure-Preserving Zero-Shot Image Editing via Stage-Wise Latent Injection in Diffusion Models

Published: April 22, 2025 | arXiv ID: 2504.15723v2

By: Dasol Jeong , Donggoo Kang , Jiwon Park and more

Potential Business Impact:

Changes pictures using words and other pictures.

Business Areas:
Photo Editing Content and Publishing, Media and Entertainment

We propose a diffusion-based framework for zero-shot image editing that unifies text-guided and reference-guided approaches without requiring fine-tuning. Our method leverages diffusion inversion and timestep-specific null-text embeddings to preserve the structural integrity of the source image. By introducing a stage-wise latent injection strategy-shape injection in early steps and attribute injection in later steps-we enable precise, fine-grained modifications while maintaining global consistency. Cross-attention with reference latents facilitates semantic alignment between the source and reference. Extensive experiments across expression transfer, texture transformation, and style infusion demonstrate state-of-the-art performance, confirming the method's scalability and adaptability to diverse image editing scenarios.

Page Count
10 pages

Category
Computer Science:
CV and Pattern Recognition