Score: 0

TT-Edge: A Hardware-Software Co-Design for Energy-Efficient Tensor-Train Decomposition on Edge AI

Published: November 7, 2025 | arXiv ID: 2511.13738v1

By: Hyunseok Kwak , Kyeongwon Lee , Kyeongpil Min and more

Potential Business Impact:

Makes AI models smaller and faster on phones.

Business Areas:

Electronic Design Automation (EDA) Hardware, Software

The growing demands of distributed learning on resource constrained edge devices underscore the importance of efficient on device model compression. Tensor Train Decomposition (TTD) offers high compression ratios with minimal accuracy loss, yet repeated singular value decompositions (SVDs) and matrix multiplications can impose significant latency and energy costs on low power processors. In this work, we present TT-Edge, a hardware software co designed framework aimed at overcoming these challenges. By splitting SVD into two phases--bidiagonalization and diagonalization--TT-Edge offloads the most compute intensive tasks to a specialized TTD Engine. This engine integrates tightly with an existing GEMM accelerator, thereby curtailing the frequent matrix vector transfers that often undermine system performance and energy efficiency. Implemented on a RISC-V-based edge AI processor, TT-Edge achieves a 1.7x speedup compared to a GEMM only baseline when compressing a ResNet 32 model via TTD, while reducing overall energy usage by 40.2 percent. These gains come with only a 4 percent increase in total power and minimal hardware overhead, enabled by a lightweight design that reuses GEMM resources and employs a shared floating point unit. Our experimental results on both FPGA prototypes and post-synthesis power analysis at 45 nm demonstrate that TT-Edge effectively addresses the latency and energy bottlenecks of TTD based compression in edge environments.

Comprehensive Design Space Exploration for Tensorized Neural Network Hardware Accelerators

Hardware Architecture

Makes AI run much faster on small devices.

22 Nov 2025 0

88%

Comprehensive Design Space Exploration for Tensorized Neural Network Hardware Accelerators

Hardware Architecture

Makes smart devices run faster and use less power.

22 Nov 2025 0

87%

Efficient Edge Test-Time Adaptation via Latent Feature Coordinate Correction

Machine Learning (CS)

Makes smart devices learn faster with less power.

13 Oct 2025 1

View PDF Login to Bookmark

Country of Origin

🇰🇷 Korea, Republic of

Page Count

8 pages

TT-Edge: A Hardware-Software Co-Design for Energy-Efficient Tensor-Train Decomposition on Edge AI

Makes AI models smaller and faster on phones.

Technical Abstract

Comprehensive Design Space Exploration for Tensorized Neural Network Hardware Accelerators

Comprehensive Design Space Exploration for Tensorized Neural Network Hardware Accelerators

Efficient Edge Test-Time Adaptation via Latent Feature Coordinate Correction