Spanning Tree Autoregressive Visual Generation
By: Sangkyu Lee , Changho Lee , Janghoon Han and more
Potential Business Impact:
Lets computers create and edit pictures better.
We present Spanning Tree Autoregressive (STAR) modeling, which can incorporate prior knowledge of images, such as center bias and locality, to maintain sampling performance while also providing sufficiently flexible sequence orders to accommodate image editing at inference. Approaches that expose randomly permuted sequence orders to conventional autoregressive (AR) models in visual generation for bidirectional context either suffer from a decline in performance or compromise the flexibility in sequence order choice at inference. Instead, STAR utilizes traversal orders of uniform spanning trees sampled in a lattice defined by the positions of image patches. Traversal orders are obtained through breadth-first search, allowing us to efficiently construct a spanning tree whose traversal order ensures that the connected partial observation of the image appears as a prefix in the sequence through rejection sampling. Through the tailored yet structured randomized strategy compared to random permutation, STAR preserves the capability of postfix completion while maintaining sampling performance without any significant changes to the model architecture widely adopted in the language AR modeling.
Similar Papers
Understand Before You Generate: Self-Guided Training for Autoregressive Image Generation
CV and Pattern Recognition
Makes AI better at understanding and creating pictures.
STARCaster: Spatio-Temporal AutoRegressive Video Diffusion for Identity- and View-Aware Talking Portraits
CV and Pattern Recognition
Makes talking videos from a picture and voice.
InfinityStar: Unified Spacetime AutoRegressive Modeling for Visual Generation
CV and Pattern Recognition
Creates realistic videos from text, faster than before.