Rethinking Garment Conditioning in Diffusion-based Virtual Try-On
By: Kihyun Na, Jinyoung Choi, Injung Kim
Potential Business Impact:
Lets you try on clothes virtually with less computer power.
Virtual Try-On (VTON) is the task of synthesizing an image of a person wearing a target garment, conditioned on a person image and a garment image. While diffusion-based VTON models featuring a Dual UNet architecture demonstrate superior fidelity compared to single UNet models, they incur substantial computational and memory overhead due to their heavy structure. In this study, through visualization analysis and theoretical analysis, we derived three hypotheses regarding the learning of context features to condition the denoising process. Based on these hypotheses, we developed Re-CatVTON, an efficient single UNet model that achieves high performance. We further enhance the model by introducing a modified classifier-free guidance strategy tailored for VTON's spatial concatenation conditioning, and by directly injecting the ground-truth garment latent derived from the clean garment latent to prevent the accumulation of prediction error. The proposed Re-CatVTON significantly improves performance compared to its predecessor (CatVTON) and requires less computation and memory than the high-performance Dual UNet model, Leffa. Our results demonstrate improved FID, KID, and LPIPS scores, with only a marginal decrease in SSIM, establishing a new efficiency-performance trade-off for single UNet VTON models.
Similar Papers
Training-free Clothing Region of Interest Self-correction for Virtual Try-On
CV and Pattern Recognition
Lets you try on clothes virtually, perfectly.
Undress to Redress: A Training-Free Framework for Virtual Try-On
CV and Pattern Recognition
Lets you try on clothes virtually, even short sleeves.
MuGa-VTON: Multi-Garment Virtual Try-On via Diffusion Transformers with Prompt Customization
CV and Pattern Recognition
Lets you try on clothes virtually, perfectly.