Stencil: Subject-Driven Generation with Context Guidance
By: Gordon Chen , Ziqi Huang , Cheston Tan and more
Potential Business Impact:
Makes AI draw the same thing in new pictures.
Recent text-to-image diffusion models can generate striking visuals from text prompts, but they often fail to maintain subject consistency across generations and contexts. One major limitation of current fine-tuning approaches is the inherent trade-off between quality and efficiency. Fine-tuning large models improves fidelity but is computationally expensive, while fine-tuning lightweight models improves efficiency but compromises image fidelity. Moreover, fine-tuning pre-trained models on a small set of images of the subject can damage the existing priors, resulting in suboptimal results. To this end, we present Stencil, a novel framework that jointly employs two diffusion models during inference. Stencil efficiently fine-tunes a lightweight model on images of the subject, while a large frozen pre-trained model provides contextual guidance during inference, injecting rich priors to enhance generation with minimal overhead. Stencil excels at generating high-fidelity, novel renditions of the subject in less than a minute, delivering state-of-the-art performance and setting a new benchmark in subject-driven generation.
Similar Papers
IMAGE-ALCHEMY: Advancing subject fidelity in personalised text-to-image generation
CV and Pattern Recognition
Makes AI draw any person or thing from a few pictures.
Fine-Tuning Visual Autoregressive Models for Subject-Driven Generation
CV and Pattern Recognition
Creates pictures from words much faster.
Less-to-More Generalization: Unlocking More Controllability by In-Context Generation
CV and Pattern Recognition
Makes computers create many pictures of the same thing.