Trajectory-aware Shifted State Space Models for Online Video Super-Resolution
By: Qiang Zhu , Xiandong Meng , Yuxian Jiang and more
Potential Business Impact:
Makes blurry videos sharp using past frames.
Online video super-resolution (VSR) is an important technique for many real-world video processing applications, which aims to restore the current high-resolution video frame based on temporally previous frames. Most of the existing online VSR methods solely employ one neighboring previous frame to achieve temporal alignment, which limits long-range temporal modeling of videos. Recently, state space models (SSMs) have been proposed with linear computational complexity and a global receptive field, which significantly improve computational efficiency and performance. In this context, this paper presents a novel online VSR method based on Trajectory-aware Shifted SSMs (TS-Mamba), leveraging both long-term trajectory modeling and low-complexity Mamba to achieve efficient spatio-temporal information aggregation. Specifically, TS-Mamba first constructs the trajectories within a video to select the most similar tokens from the previous frames. Then, a Trajectory-aware Shifted Mamba Aggregation (TSMA) module consisting of proposed shifted SSMs blocks is employed to aggregate the selected tokens. The shifted SSMs blocks are designed based on Hilbert scannings and corresponding shift operations to compensate for scanning losses and strengthen the spatial continuity of Mamba. Additionally, we propose a trajectory-aware loss function to supervise the trajectory generation, ensuring the accuracy of token selection when training our model. Extensive experiments on three widely used VSR test datasets demonstrate that compared with six online VSR benchmark models, our TS-Mamba achieves state-of-the-art performance in most cases and over 22.7\% complexity reduction (in MACs). The source code for TS-Mamba will be available at https://github.com.
Similar Papers
MambaVSR: Content-Aware Scanning State Space Model for Video Super-Resolution
CV and Pattern Recognition
Makes blurry videos sharp and clear.
Self-supervised ControlNet with Spatio-Temporal Mamba for Real-world Video Super-resolution
CV and Pattern Recognition
Makes blurry videos clear without weird glitches.
First-order State Space Model for Lightweight Image Super-resolution
CV and Pattern Recognition
Makes pictures clearer with smarter computer math.