Score: 0

MUSE: Manipulating Unified Framework for Synthesizing Emotions in Images via Test-Time Optimization

Published: November 26, 2025 | arXiv ID: 2511.21051v1

By: Yingjie Xia , Xi Wang , Jinglei Shi and more

Potential Business Impact:

Creates images that perfectly match feelings.

Business Areas:

Semantic Search Internet Services

Images evoke emotions that profoundly influence perception, often prioritized over content. Current Image Emotional Synthesis (IES) approaches artificially separate generation and editing tasks, creating inefficiencies and limiting applications where these tasks naturally intertwine, such as therapeutic interventions or storytelling. In this work, we introduce MUSE, the first unified framework capable of both emotional generation and editing. By adopting a strategy conceptually aligned with Test-Time Scaling (TTS) that widely used in both LLM and diffusion model communities, it avoids the requirement for additional updating diffusion model and specialized emotional synthesis datasets. More specifically, MUSE addresses three key questions in emotional synthesis: (1) HOW to stably guide synthesis by leveraging an off-the-shelf emotion classifier with gradient-based optimization of emotional tokens; (2) WHEN to introduce emotional guidance by identifying the optimal timing using semantic similarity as a supervisory signal; and (3) WHICH emotion to guide synthesis through a multi-emotion loss that reduces interference from inherent and similar emotions. Experimental results show that MUSE performs favorably against all methods for both generation and editing, improving emotional accuracy and semantic diversity while maintaining an optimal balance between desired content, adherence to text prompts, and realistic emotional expression. It establishes a new paradigm for emotion synthesis.

MUSE: Multi-Subject Unified Synthesis via Explicit Layout Semantic Expansion

CV and Pattern Recognition

Puts many things in pictures exactly where you want.

20 Aug 2025 1

89%

MUSE: A Simple Yet Effective Multimodal Search-Based Framework for Lifelong User Interest Modeling

Information Retrieval

Helps online stores show you better ads.

8 Dec 2025 1

89%

FlexMUSE: Multimodal Unification and Semantics Enhancement Framework with Flexible interaction for Creative Writing

CV and Pattern Recognition

Makes stories with pictures that make sense.

22 Aug 2025 0

View PDF Login to Bookmark

Page Count

11 pages

MUSE: Manipulating Unified Framework for Synthesizing Emotions in Images via Test-Time Optimization

Creates images that perfectly match feelings.

Technical Abstract

MUSE: Multi-Subject Unified Synthesis via Explicit Layout Semantic Expansion

MUSE: A Simple Yet Effective Multimodal Search-Based Framework for Lifelong User Interest Modeling

FlexMUSE: Multimodal Unification and Semantics Enhancement Framework with Flexible interaction for Creative Writing