AlignVTOFF: Texture-Spatial Feature Alignment for High-Fidelity Virtual Try-Off
By: Yihan Zhu, Mengying Ge
Potential Business Impact:
Makes online clothes look real on models.
Virtual Try-Off (VTOFF) is a challenging multimodal image generation task that aims to synthesize high-fidelity flat-lay garments under complex geometric deformation and rich high-frequency textures. Existing methods often rely on lightweight modules for fast feature extraction, which struggles to preserve structured patterns and fine-grained details, leading to texture attenuation during generation.To address these issues, we propose AlignVTOFF, a novel parallel U-Net framework built upon a Reference U-Net and Texture-Spatial Feature Alignment (TSFA). The Reference U-Net performs multi-scale feature extraction and enhances geometric fidelity, enabling robust modeling of deformation while retaining complex structured patterns. TSFA then injects the reference garment features into a frozen denoising U-Net via a hybrid attention design, consisting of a trainable cross-attention module and a frozen self-attention module. This design explicitly aligns texture and spatial cues and alleviates the loss of high-frequency information during the denoising process.Extensive experiments across multiple settings demonstrate that AlignVTOFF consistently outperforms state-of-the-art methods, producing flat-lay garment results with improved structural realism and high-frequency detail fidelity.
Similar Papers
Inverse Virtual Try-On: Generating Multi-Category Product-Style Images from Clothed Individuals
CV and Pattern Recognition
Makes online clothes look perfect for selling.
MGT: Extending Virtual Try-Off to Multi-Garment Scenarios
CV and Pattern Recognition
Lets you see clothes on yourself from other photos.
Two-Way Garment Transfer: Unified Diffusion Framework for Dressing and Undressing Synthesis
CV and Pattern Recognition
Lets you take clothes off virtual people.