Score: 2

IAR2: Improving Autoregressive Visual Generation with Semantic-Detail Associated Token Prediction

Published: October 8, 2025 | arXiv ID: 2510.06928v1

By: Ran Yi , Teng Hu , Zihan Su and more

Potential Business Impact:

Makes AI draw pictures with more detail.

Business Areas:

Augmented Reality Hardware, Software

Autoregressive models have emerged as a powerful paradigm for visual content creation, but often overlook the intrinsic structural properties of visual data. Our prior work, IAR, initiated a direction to address this by reorganizing the visual codebook based on embedding similarity, thereby improving generation robustness. However, it is constrained by the rigidity of pre-trained codebooks and the inaccuracies of hard, uniform clustering. To overcome these limitations, we propose IAR2, an advanced autoregressive framework that enables a hierarchical semantic-detail synthesis process. At the core of IAR2 is a novel Semantic-Detail Associated Dual Codebook, which decouples image representations into a semantic codebook for global semantic information and a detail codebook for fine-grained refinements. It expands the quantization capacity from a linear to a polynomial scale, significantly enhancing expressiveness. To accommodate this dual representation, we propose a Semantic-Detail Autoregressive Prediction scheme coupled with a Local-Context Enhanced Autoregressive Head, which performs hierarchical prediction-first the semantic token, then the detail token-while leveraging a local context window to enhance spatial coherence. Furthermore, for conditional generation, we introduce a Progressive Attention-Guided Adaptive CFG mechanism that dynamically modulates the guidance scale for each token based on its relevance to the condition and its temporal position in the generation sequence, improving conditional alignment without sacrificing realism. Extensive experiments demonstrate that IAR2 sets a new state-of-the-art for autoregressive image generation, achieving a FID of 1.50 on ImageNet. Our model not only surpasses previous methods in performance but also demonstrates superior computational efficiency, highlighting the effectiveness of our structured, coarse-to-fine generation strategy.

SpectralAR: Spectral Autoregressive Visual Generation

CV and Pattern Recognition

Makes pictures by predicting sound waves.

12 Jun 2025 0

89%

REAR: Rethinking Visual Autoregressive Models via Generator-Tokenizer Consistency Regularization

CV and Pattern Recognition

Makes AI create better pictures from words.

6 Oct 2025 1

89%

Understand Before You Generate: Self-Guided Training for Autoregressive Image Generation

CV and Pattern Recognition

Makes AI better at understanding and creating pictures.

18 Sep 2025 1

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Repos / Data Links

github.com

Page Count

22 pages

IAR2: Improving Autoregressive Visual Generation with Semantic-Detail Associated Token Prediction

Makes AI draw pictures with more detail.

Technical Abstract

SpectralAR: Spectral Autoregressive Visual Generation

REAR: Rethinking Visual Autoregressive Models via Generator-Tokenizer Consistency Regularization

Understand Before You Generate: Self-Guided Training for Autoregressive Image Generation