Score: 2

Less-to-More Generalization: Unlocking More Controllability by In-Context Generation

Published: April 2, 2025 | arXiv ID: 2504.02160v1

By: Shaojin Wu , Mengqi Huang , Wenxu Wu and more

BigTech Affiliations: ByteDance

Potential Business Impact:

Makes computers create many pictures of the same thing.

Business Areas:

Semantic Search Internet Services

Although subject-driven generation has been extensively explored in image generation due to its wide applications, it still has challenges in data scalability and subject expansibility. For the first challenge, moving from curating single-subject datasets to multiple-subject ones and scaling them is particularly difficult. For the second, most recent methods center on single-subject generation, making it hard to apply when dealing with multi-subject scenarios. In this study, we propose a highly-consistent data synthesis pipeline to tackle this challenge. This pipeline harnesses the intrinsic in-context generation capabilities of diffusion transformers and generates high-consistency multi-subject paired data. Additionally, we introduce UNO, which consists of progressive cross-modal alignment and universal rotary position embedding. It is a multi-image conditioned subject-to-image model iteratively trained from a text-to-image model. Extensive experiments show that our method can achieve high consistency while ensuring controllability in both single-subject and multi-subject driven generation.

VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning

CV and Pattern Recognition

Makes one AI create many kinds of pictures.

10 Apr 2025 0

88%

Stencil: Subject-Driven Generation with Context Guidance

CV and Pattern Recognition

Makes AI draw the same thing in new pictures.

21 Sep 2025 1

87%

OpenSubject: Leveraging Video-Derived Identity and Diversity Priors for Subject-driven Image Generation and Manipulation

CV and Pattern Recognition

Makes AI draw people better, even in crowds.

9 Dec 2025 1

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Repos / Data Links

github.com

Page Count

21 pages

Less-to-More Generalization: Unlocking More Controllability by In-Context Generation

Makes computers create many pictures of the same thing.

Technical Abstract

VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning

Stencil: Subject-Driven Generation with Context Guidance

OpenSubject: Leveraging Video-Derived Identity and Diversity Priors for Subject-driven Image Generation and Manipulation