Repulsor: Accelerating Generative Modeling with a Contrastive Memory Bank
By: Shaofeng Zhang , Xuanqi Chen , Ning Liao and more
The dominance of denoising generative models (e.g., diffusion, flow-matching) in visual synthesis is tempered by their substantial training costs and inefficiencies in representation learning. While injecting discriminative representations via auxiliary alignment has proven effective, this approach still faces key limitations: the reliance on external, pre-trained encoders introduces overhead and domain shift. A dispersed-based strategy that encourages strong separation among in-batch latent representations alleviates this specific dependency. To assess the effect of the number of negative samples in generative modeling, we propose {\mname}, a plug-and-play training framework that requires no external encoders. Our method integrates a memory bank mechanism that maintains a large, dynamically updated queue of negative samples across training iterations. This decouples the number of negatives from the mini-batch size, providing abundant and high-quality negatives for a contrastive objective without a multiplicative increase in computational cost. A low-dimensional projection head is used to further minimize memory and bandwidth overhead. {\mname} offers three principal advantages: (1) it is self-contained, eliminating dependency on pretrained vision foundation models and their associated forward-pass overhead; (2) it introduces no additional parameters or computational cost during inference; and (3) it enables substantially faster convergence, achieving superior generative quality more efficiently. On ImageNet-256, {\mname} achieves a state-of-the-art FID of \textbf{2.40} within 400k steps, significantly outperforming comparable methods.
Similar Papers
Supervised Contrastive Learning for Few-Shot AI-Generated Image Detection and Attribution
CV and Pattern Recognition
Finds fake pictures made by AI.
Supervised Contrastive Learning for Few-Shot AI-Generated Image Detection and Attribution
CV and Pattern Recognition
Finds fake pictures made by AI.
Prototype-Guided Diffusion: Visual Conditioning without External Memory
Machine Learning (CS)
Makes AI create pictures faster and cheaper.