Finetuning-Free Personalization of Text to Image Generation via Hypernetworks
By: Sagar Shrestha , Gopal Sharma , Luowei Zhou and more
Potential Business Impact:
Makes AI create pictures of anything you describe.
Personalizing text-to-image diffusion models has traditionally relied on subject-specific fine-tuning approaches such as DreamBooth~\cite{ruiz2023dreambooth}, which are computationally expensive and slow at inference. Recent adapter- and encoder-based methods attempt to reduce this overhead but still depend on additional fine-tuning or large backbone models for satisfactory results. In this work, we revisit an orthogonal direction: fine-tuning-free personalization via Hypernetworks that predict LoRA-adapted weights directly from subject images. Prior hypernetwork-based approaches, however, suffer from costly data generation or unstable attempts to mimic base model optimization trajectories. We address these limitations with an end-to-end training objective, stabilized by a simple output regularization, yielding reliable and effective hypernetworks. Our method removes the need for per-subject optimization at test time while preserving both subject fidelity and prompt alignment. To further enhance compositional generalization at inference time, we introduce Hybrid-Model Classifier-Free Guidance (HM-CFG), which combines the compositional strengths of the base diffusion model with the subject fidelity of personalized models during sampling. Extensive experiments on CelebA-HQ, AFHQ-v2, and DreamBench demonstrate that our approach achieves strong personalization performance and highlights the promise of hypernetworks as a scalable and effective direction for open-category personalization.
Similar Papers
LoFA: Learning to Predict Personalized Priors for Fast Adaptation of Visual Generative Models
CV and Pattern Recognition
Makes AI art models learn new styles instantly.
IMAGE-ALCHEMY: Advancing subject fidelity in personalised text-to-image generation
CV and Pattern Recognition
Makes AI draw any person or thing from a few pictures.
Steering Guidance for Personalized Text-to-Image Diffusion Models
CV and Pattern Recognition
Creates personalized images that match descriptions perfectly.