LooseRoPE: Content-aware Attention Manipulation for Semantic Harmonization
By: Etai Sella , Yoav Baron , Hadar Averbuch-Elor and more
Potential Business Impact:
Lets you move pictures into new photos.
Recent diffusion-based image editing methods commonly rely on text or high-level instructions to guide the generation process, offering intuitive but coarse control. In contrast, we focus on explicit, prompt-free editing, where the user directly specifies the modification by cropping and pasting an object or sub-object into a chosen location within an image. This operation affords precise spatial and visual control, yet it introduces a fundamental challenge: preserving the identity of the pasted object while harmonizing it with its new context. We observe that attention maps in diffusion-based editing models inherently govern whether image regions are preserved or adapted for coherence. Building on this insight, we introduce LooseRoPE, a saliency-guided modulation of rotational positional encoding (RoPE) that loosens the positional constraints to continuously control the attention field of view. By relaxing RoPE in this manner, our method smoothly steers the model's focus between faithful preservation of the input image and coherent harmonization of the inserted object, enabling a balanced trade-off between identity retention and contextual blending. Our approach provides a flexible and intuitive framework for image editing, achieving seamless compositional results without textual descriptions or complex user input.
Similar Papers
FreeFlux: Understanding and Exploiting Layer-Specific Roles in RoPE-Based MMDiT for Versatile Image Editing
CV and Pattern Recognition
Makes AI pictures change better.
DoPE: Denoising Rotary Position Embedding
Computation and Language
Makes AI understand longer texts better.
Rope to Nope and Back Again: A New Hybrid Attention Strategy
Computation and Language
Makes computers understand longer stories better and faster.