Score: 0

CritiFusion: Semantic Critique and Spectral Alignment for Faithful Text-to-Image Generation

Published: December 27, 2025 | arXiv ID: 2512.22681v1

By: ZhenQi Chen, TsaiChing Ni, YuanFu Yang

Potential Business Impact:

Makes AI pictures match words better.

Business Areas:

Semantic Search Internet Services

Recent text-to-image diffusion models have achieved remarkable visual fidelity but often struggle with semantic alignment to complex prompts. We introduce CritiFusion, a novel inference-time framework that integrates a multimodal semantic critique mechanism with frequency-domain refinement to improve text-to-image consistency and detail. The proposed CritiCore module leverages a vision-language model and multiple large language models to enrich the prompt context and produce high-level semantic feedback, guiding the diffusion process to better align generated content with the prompt's intent. Additionally, SpecFusion merges intermediate generation states in the spectral domain, injecting coarse structural information while preserving high-frequency details. No additional model training is required. CritiFusion serves as a plug-in refinement stage compatible with existing diffusion backbones. Experiments on standard benchmarks show that our method notably improves human-aligned metrics of text-to-image correspondence and visual quality. CritiFusion consistently boosts performance on human preference scores and aesthetic evaluations, achieving results on par with state-of-the-art reward optimization approaches. Qualitative results further demonstrate superior detail, realism, and prompt fidelity, indicating the effectiveness of our semantic critique and spectral alignment strategy.

FUSE: Unifying Spectral and Semantic Cues for Robust AI-Generated Image Detection

CV and Pattern Recognition

Finds fake pictures made by computers.

25 Dec 2025 1

89%

High Fidelity Text to Image Generation with Contrastive Alignment and Structural Guidance

CV and Pattern Recognition

Makes pictures match words perfectly.

14 Aug 2025 0

88%

Towards Unified Semantic and Controllable Image Fusion: A Diffusion Transformer Approach

CV and Pattern Recognition

Combines pictures using words to make better images.

8 Dec 2025 1

View PDF Login to Bookmark

Country of Origin

🇹🇼 Taiwan, Province of China

Page Count

20 pages

CritiFusion: Semantic Critique and Spectral Alignment for Faithful Text-to-Image Generation

Makes AI pictures match words better.

Technical Abstract

FUSE: Unifying Spectral and Semantic Cues for Robust AI-Generated Image Detection

High Fidelity Text to Image Generation with Contrastive Alignment and Structural Guidance

Towards Unified Semantic and Controllable Image Fusion: A Diffusion Transformer Approach