Score: 1

Structure-Preserving Zero-Shot Image Editing via Stage-Wise Latent Injection in Diffusion Models

Published: April 22, 2025 | arXiv ID: 2504.15723v2

By: Dasol Jeong , Donggoo Kang , Jiwon Park and more

Potential Business Impact:

Changes pictures using words and other pictures.

Business Areas:

Photo Editing Content and Publishing, Media and Entertainment

We propose a diffusion-based framework for zero-shot image editing that unifies text-guided and reference-guided approaches without requiring fine-tuning. Our method leverages diffusion inversion and timestep-specific null-text embeddings to preserve the structural integrity of the source image. By introducing a stage-wise latent injection strategy-shape injection in early steps and attribute injection in later steps-we enable precise, fine-grained modifications while maintaining global consistency. Cross-attention with reference latents facilitates semantic alignment between the source and reference. Extensive experiments across expression transfer, texture transformation, and style infusion demonstrate state-of-the-art performance, confirming the method's scalability and adaptability to diverse image editing scenarios.

Energy-Guided Optimization for Personalized Image Editing with Pretrained Text-to-Image Diffusion Models

CV and Pattern Recognition

Changes pictures to match your exact ideas.

6 Mar 2025 1

88%

Contrastive Learning Guided Latent Diffusion Model for Image-to-Image Translation

CV and Pattern Recognition

Changes pictures using words without retraining.

26 Mar 2025 0

88%

Tuning-Free Image Editing with Fidelity and Editability via Unified Latent Diffusion Model

CV and Pattern Recognition

Makes editing pictures from words better.

8 Apr 2025 3

View PDF Login to Bookmark

Page Count

10 pages

Structure-Preserving Zero-Shot Image Editing via Stage-Wise Latent Injection in Diffusion Models

Changes pictures using words and other pictures.

Technical Abstract

Energy-Guided Optimization for Personalized Image Editing with Pretrained Text-to-Image Diffusion Models

Contrastive Learning Guided Latent Diffusion Model for Image-to-Image Translation

Tuning-Free Image Editing with Fidelity and Editability via Unified Latent Diffusion Model