Score: 0

Does Understanding Inform Generation in Unified Multimodal Models? From Analysis to Path Forward

Published: November 25, 2025 | arXiv ID: 2511.20561v1

By: Yuwei Niu , Weiyang Jin , Jiaqi Liao and more

Potential Business Impact:

Makes AI understand and create better.

Business Areas:

Semantic Web Internet Services

Recent years have witnessed significant progress in Unified Multimodal Models, yet a fundamental question remains: Does understanding truly inform generation? To investigate this, we introduce UniSandbox, a decoupled evaluation framework paired with controlled, synthetic datasets to avoid data leakage and enable detailed analysis. Our findings reveal a significant understanding-generation gap, which is mainly reflected in two key dimensions: reasoning generation and knowledge transfer. Specifically, for reasoning generation tasks, we observe that explicit Chain-of-Thought (CoT) in the understanding module effectively bridges the gap, and further demonstrate that a self-training approach can successfully internalize this ability, enabling implicit reasoning during generation. Additionally, for knowledge transfer tasks, we find that CoT assists the generative process by helping retrieve newly learned knowledge, and also discover that query-based architectures inherently exhibit latent CoT-like properties that affect this transfer. UniSandbox provides preliminary insights for designing future unified architectures and training strategies that truly bridge the gap between understanding and generation. Code and data are available at https://github.com/PKU-YuanGroup/UniSandBox

Understanding-in-Generation: Reinforcing Generative Capability of Unified Model via Infusing Understanding into Generation

CV and Pattern Recognition

Makes AI pictures better by thinking while drawing.

23 Sep 2025 1

89%

Can Understanding and Generation Truly Benefit Together -- or Just Coexist?

CV and Pattern Recognition

Makes computers draw pictures from descriptions.

11 Sep 2025 1

89%

Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities

CV and Pattern Recognition

Lets computers understand and create images together.

5 May 2025 1

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

19 pages

Does Understanding Inform Generation in Unified Multimodal Models? From Analysis to Path Forward

Makes AI understand and create better.

Technical Abstract

Understanding-in-Generation: Reinforcing Generative Capability of Unified Model via Infusing Understanding into Generation

Can Understanding and Generation Truly Benefit Together -- or Just Coexist?

Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities