Score: 3

Tuning-Free Image Editing with Fidelity and Editability via Unified Latent Diffusion Model

Published: April 8, 2025 | arXiv ID: 2504.05594v1

By: Qi Mao , Lan Chen , Yuchao Gu and more

Potential Business Impact:

Makes editing pictures from words better.

Business Areas:

Photo Editing Content and Publishing, Media and Entertainment

Balancing fidelity and editability is essential in text-based image editing (TIE), where failures commonly lead to over- or under-editing issues. Existing methods typically rely on attention injections for structure preservation and leverage the inherent text alignment capabilities of pre-trained text-to-image (T2I) models for editability, but they lack explicit and unified mechanisms to properly balance these two objectives. In this work, we introduce UnifyEdit, a tuning-free method that performs diffusion latent optimization to enable a balanced integration of fidelity and editability within a unified framework. Unlike direct attention injections, we develop two attention-based constraints: a self-attention (SA) preservation constraint for structural fidelity, and a cross-attention (CA) alignment constraint to enhance text alignment for improved editability. However, simultaneously applying both constraints can lead to gradient conflicts, where the dominance of one constraint results in over- or under-editing. To address this challenge, we introduce an adaptive time-step scheduler that dynamically adjusts the influence of these constraints, guiding the diffusion latent toward an optimal balance. Extensive quantitative and qualitative experiments validate the effectiveness of our approach, demonstrating its superiority in achieving a robust balance between structure preservation and text alignment across various editing tasks, outperforming other state-of-the-art methods. The source code will be available at https://github.com/CUC-MIPG/UnifyEdit.

Structure-Preserving Zero-Shot Image Editing via Stage-Wise Latent Injection in Diffusion Models

CV and Pattern Recognition

Changes pictures using words and other pictures.

22 Apr 2025 1

88%

LatentEdit: Adaptive Latent Control for Consistent Semantic Editing

Graphics

Changes pictures while keeping the background the same.

30 Aug 2025 1

88%

Energy-Guided Optimization for Personalized Image Editing with Pretrained Text-to-Image Diffusion Models

CV and Pattern Recognition

Changes pictures to match your exact ideas.

6 Mar 2025 1

View PDF Login to Bookmark

Country of Origin

🇨🇳 🇸🇬 🇺🇸 Singapore, China, United States

Repos / Data Links

github.com

Page Count

16 pages

Tuning-Free Image Editing with Fidelity and Editability via Unified Latent Diffusion Model

Makes editing pictures from words better.

Technical Abstract

Structure-Preserving Zero-Shot Image Editing via Stage-Wise Latent Injection in Diffusion Models

LatentEdit: Adaptive Latent Control for Consistent Semantic Editing

Energy-Guided Optimization for Personalized Image Editing with Pretrained Text-to-Image Diffusion Models