CoAR: Concept Injection into Autoregressive Models for Personalized Text-to-Image Generation
By: Fangtai Wu , Mushui Liu , Weijie He and more
Potential Business Impact:
Makes AI draw any picture you imagine.
The unified autoregressive (AR) model excels at multimodal understanding and generation, but its potential for customized image generation remains underexplored. Existing customized generation methods rely on full fine-tuning or adapters, making them costly and prone to overfitting or catastrophic forgetting. In this paper, we propose \textbf{CoAR}, a novel framework for injecting subject concepts into the unified AR models while keeping all pre-trained parameters completely frozen. CoAR learns effective, specific subject representations with only a minimal number of parameters using a Layerwise Multimodal Context Learning strategy. To address overfitting and language drift, we further introduce regularization that preserves the pre-trained distribution and anchors context tokens to improve subject fidelity and re-contextualization. Additionally, CoAR supports training-free subject customization in a user-provided style. Experiments demonstrate that CoAR achieves superior performance on both subject-driven personalization and style personalization, while delivering significant gains in computational and memory efficiency. Notably, CoAR tunes less than \textbf{0.05\%} of the parameters while achieving competitive performance compared to recent Proxy-Tuning. Code: https://github.com/KZF-kzf/CoAR
Similar Papers
Semantic Context Matters: Improving Conditioning for Autoregressive Models
CV and Pattern Recognition
Makes AI better at changing pictures with words.
Context-Aware Autoregressive Models for Multi-Conditional Image Generation
CV and Pattern Recognition
Makes pictures from many different instructions.
TokenAR: Multiple Subject Generation via Autoregressive Token-level enhancement
CV and Pattern Recognition
Makes AI create pictures with matching people.