Score: 1

OSDEnhancer: Taming Real-World Space-Time Video Super-Resolution with One-Step Diffusion

Published: January 28, 2026 | arXiv ID: 2601.20308v1

By: Shuoyan Wei , Feng Li , Chen Zhou and more

Potential Business Impact:

Makes blurry, slow videos clear and fast.

Business Areas:

Smart Cities Real Estate

Diffusion models (DMs) have demonstrated exceptional success in video super-resolution (VSR), showcasing a powerful capacity for generating fine-grained details. However, their potential for space-time video super-resolution (STVSR), which necessitates not only recovering realistic visual content from low-resolution to high-resolution but also improving the frame rate with coherent temporal dynamics, remains largely underexplored. Moreover, existing STVSR methods predominantly address spatiotemporal upsampling under simplified degradation assumptions, which often struggle in real-world scenarios with complex unknown degradations. Such a high demand for reconstruction fidelity and temporal consistency makes the development of a robust STVSR framework particularly non-trivial. To address these challenges, we propose OSDEnhancer, a novel framework that, to the best of our knowledge, represents the first method to achieve real-world STVSR through an efficient one-step diffusion process. OSDEnhancer initializes essential spatiotemporal structures through a linear pre-interpolation strategy and pivots on training temporal refinement and spatial enhancement mixture of experts (TR-SE MoE), which allows distinct expert pathways to progressively learn robust, specialized representations for temporal coherence and spatial detail, further collaboratively reinforcing each other during inference. A bidirectional deformable variational autoencoder (VAE) decoder is further introduced to perform recurrent spatiotemporal aggregation and propagation, enhancing cross-frame reconstruction fidelity. Experiments demonstrate that the proposed method achieves state-of-the-art performance while maintaining superior generalization capability in real-world scenarios.

OS-DiffVSR: Towards One-step Latent Diffusion Model for High-detailed Real-world Video Super-Resolution

CV and Pattern Recognition

Makes blurry videos clear, fast.

20 Sep 2025 0

92%

UltraVSR: Achieving Ultra-Realistic Video Super-Resolution with Efficient One-Step Diffusion Space

CV and Pattern Recognition

Makes blurry videos sharp and smooth instantly.

26 May 2025 2

92%

DiffVSR: Revealing an Effective Recipe for Taming Robust Video Super-Resolution Against Complex Degradations

CV and Pattern Recognition

Fixes blurry videos, making them clear and smooth.

17 Jan 2025 0

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

17 pages

OSDEnhancer: Taming Real-World Space-Time Video Super-Resolution with One-Step Diffusion

Makes blurry, slow videos clear and fast.

Technical Abstract

OS-DiffVSR: Towards One-step Latent Diffusion Model for High-detailed Real-world Video Super-Resolution

UltraVSR: Achieving Ultra-Realistic Video Super-Resolution with Efficient One-Step Diffusion Space

DiffVSR: Revealing an Effective Recipe for Taming Robust Video Super-Resolution Against Complex Degradations