Iterative Inference-time Scaling with Adaptive Frequency Steering for Image Super-Resolution
By: Hexin Zhang , Dong Li , Jie Huang and more
Potential Business Impact:
Makes blurry pictures sharp and clear.
Diffusion models have become a leading paradigm for image super-resolution (SR), but existing methods struggle to guarantee both the high-frequency perceptual quality and the low-frequency structural fidelity of generated images. Although inference-time scaling can theoretically improve this trade-off by allocating more computation, existing strategies remain suboptimal: reward-driven particle optimization often causes perceptual over-smoothing, while optimal-path search tends to lose structural consistency. To overcome these difficulties, we propose Iterative Diffusion Inference-Time Scaling with Adaptive Frequency Steering (IAFS), a training-free framework that jointly leverages iterative refinement and frequency-aware particle fusion. IAFS addresses the challenge of balancing perceptual quality and structural fidelity by progressively refining the generated image through iterative correction of structural deviations. Simultaneously, it ensures effective frequency fusion by adaptively integrating high-frequency perceptual cues with low-frequency structural information, allowing for a more accurate and balanced reconstruction across different image details. Extensive experiments across multiple diffusion-based SR models show that IAFS effectively resolves the perception-fidelity conflict, yielding consistently improved perceptual detail and structural accuracy, and outperforming existing inference-time scaling methods.
Similar Papers
FRAMER: Frequency-Aligned Self-Distillation with Adaptive Modulation Leveraging Diffusion Priors for Real-World Image Super-Resolution
CV and Pattern Recognition
Makes blurry pictures sharp and clear.
Inference-time Scaling for Diffusion-based Audio Super-resolution
Sound
Makes bad audio sound much better.
Frequency-Integrated Transformer for Arbitrary-Scale Super-Resolution
Machine Learning (CS)
Makes blurry pictures sharp and clear.