Score: 0

MEPG:Multi-Expert Planning and Generation for Compositionally-Rich Image Generation

Published: September 4, 2025 | arXiv ID: 2509.04126v2

By: Yuan Zhao, Lin Liu

Potential Business Impact:

Makes AI draw better pictures with more details.

Business Areas:

Guides Media and Entertainment

Text-to-image diffusion models have achieved remarkable image quality, but they still struggle with complex, multiele ment prompts, and limited stylistic diversity. To address these limitations, we propose a Multi-Expert Planning and Gen eration Framework (MEPG) that synergistically integrates position- and style-aware large language models (LLMs) with spatial-semantic expert modules. The framework comprises two core components: (1) a Position-Style-Aware (PSA) module that utilizes a supervised fine-tuned LLM to decom pose input prompts into precise spatial coordinates and style encoded semantic instructions; and (2) a Multi-Expert Dif fusion (MED) module that implements cross-region genera tion through dynamic expert routing across both local regions and global areas. During the generation process for each lo cal region, specialized models (e.g., realism experts, styliza tion specialists) are selectively activated for each spatial par tition via attention-based gating mechanisms. The architec ture supports lightweight integration and replacement of ex pert models, providing strong extensibility. Additionally, an interactive interface enables real-time spatial layout editing and per-region style selection from a portfolio of experts. Ex periments show that MEPG significantly outperforms base line models with the same backbone in both image quality and style diversity.

MEPG:Multi-Expert Planning and Generation for Compositionally-Rich Image Generation

CV and Pattern Recognition

Creates better, more varied pictures from words.

4 Sep 2025 0

87%

Stepwise Schema-Guided Prompting Framework with Parameter Efficient Instruction Tuning for Multimedia Event Extraction

Multimedia

Helps computers understand news videos and text together.

2 Dec 2025 1

87%

MMMG: A Massive, Multidisciplinary, Multi-Tier Generation Benchmark for Text-to-Image Reasoning

CV and Pattern Recognition

Teaches computers to draw pictures from facts.

12 Jun 2025 4

View PDF Login to Bookmark

Page Count

10 pages

MEPG:Multi-Expert Planning and Generation for Compositionally-Rich Image Generation

Makes AI draw better pictures with more details.

Technical Abstract

MEPG:Multi-Expert Planning and Generation for Compositionally-Rich Image Generation

Stepwise Schema-Guided Prompting Framework with Parameter Efficient Instruction Tuning for Multimedia Event Extraction

MMMG: A Massive, Multidisciplinary, Multi-Tier Generation Benchmark for Text-to-Image Reasoning