INT-DTT+: Low-Complexity Data-Dependent Transforms for Video Coding
By: Samuel Fernández-Menduiña , Eduardo Pavez , Antonio Ortega and more
Potential Business Impact:
Makes video compression faster and better.
Discrete trigonometric transforms (DTTs), such as the DCT-2 and the DST-7, are widely used in video codecs for their balance between coding performance and computational efficiency. In contrast, data-dependent transforms, such as the Karhunen-Loève transform (KLT) and graph-based separable transforms (GBSTs), offer better energy compaction but lack symmetries that can be exploited to reduce computational complexity. This paper bridges this gap by introducing a general framework to design low-complexity data-dependent transforms. Our approach builds on DTT+, a family of GBSTs derived from rank-one updates of the DTT graphs, which can adapt to signal statistics while retaining a structure amenable to fast computation. We first propose a graph learning algorithm for DTT+ that estimates the rank-one updates for rows and column graphs jointly, capturing the statistical properties of the overall block. Then, we exploit the progressive structure of DTT+ to decompose the kernel into a base DTT and a structured Cauchy matrix. By leveraging low-complexity integer DTTs and sparsifying the Cauchy matrix, we construct an integer approximation to DTT+, termed INT-DTT+. This approximation significantly reduces both computational and memory complexities with respect to the separable KLT with minimal performance loss. We validate our approach in the context of mode-dependent transforms for the VVC standard, following a rate-distortion optimized transform (RDOT) design approach. Integrated into the explicit multiple transform selection (MTS) framework of VVC in a rate-distortion optimization setup, INT-DTT+ achieves more than 3% BD-rate savings over the VVC MTS baseline, with complexity comparable to the integer DCT-2 once the base DTT coefficients are available.
Similar Papers
DVD-Quant: Data-free Video Diffusion Transformers Quantization
CV and Pattern Recognition
Makes video creation faster without losing quality.
IDT: A Physically Grounded Transformer for Feed-Forward Multi-View Intrinsic Decomposition
CV and Pattern Recognition
Makes pictures look the same from different angles.
IDT: A Physically Grounded Transformer for Feed-Forward Multi-View Intrinsic Decomposition
CV and Pattern Recognition
Makes pictures look the same from different angles.