Score: 1

Semantic and Temporal Integration in Latent Diffusion Space for High-Fidelity Video Super-Resolution

Published: August 1, 2025 | arXiv ID: 2508.00471v1

By: Yiwen Wang , Xinning Chai , Yuhong Zhang and more

BigTech Affiliations: Tencent

Potential Business Impact:

Makes blurry videos look sharp and smooth.

Recent advancements in video super-resolution (VSR) models have demonstrated impressive results in enhancing low-resolution videos. However, due to limitations in adequately controlling the generation process, achieving high fidelity alignment with the low-resolution input while maintaining temporal consistency across frames remains a significant challenge. In this work, we propose Semantic and Temporal Guided Video Super-Resolution (SeTe-VSR), a novel approach that incorporates both semantic and temporal-spatio guidance in the latent diffusion space to address these challenges. By incorporating high-level semantic information and integrating spatial and temporal information, our approach achieves a seamless balance between recovering intricate details and ensuring temporal coherence. Our method not only preserves high-reality visual content but also significantly enhances fidelity. Extensive experiments demonstrate that SeTe-VSR outperforms existing methods in terms of detail recovery and perceptual quality, highlighting its effectiveness for complex video super-resolution tasks.

Country of Origin
🇨🇳 China

Page Count
7 pages

Category
Computer Science:
CV and Pattern Recognition