Score: 1

Pagoda: An Energy and Time Roofline Study for DNN Workloads on Edge Accelerators

Published: September 24, 2025 | arXiv ID: 2509.20189v1

By: Prashanthi S. K. , Kunal Kumar Sahoo , Amartya Ranjan Saikia and more

Potential Business Impact:

Makes AI run faster and use less power.

Business Areas:

Power Grid Energy

Edge accelerators such as Nvidia Jetsons are becoming an integral part of the computing continuum, and are often used for DNN inferencing and training. Nvidia Jetson edge devices have $2000$+ CUDA cores within a $70$W power envelope and offer $1000$s of power modes to customize CPU, GPU and memory frequencies. Their widely varying power--performance trade-offs can be exploited for energy and power-constrained deployments. While data-driven methods to predict the power and latency of DNN workloads for edge devices exist, there is a lack of principled study to understand why edge accelerators and their power modes perform the way they do. We develop a time roofline and a novel energy roofline model for the Jetson Orin AGX for diverse power modes, and couple it with an analytical model of the compute (FLOP) and memory access (bytes) for DNN inference workloads to analyze them from first principles. These reveal unique, sometimes counter-intuitive, insights into the power and performance behavior of DNN workloads on edge accelerators, e.g., the default power mode MAXN is not the most energy efficient and time efficiency implies energy efficiency for all power modes. We also extend our analytical roofline models to DNN training. Finally, we apply these methods to tune the power mode (and hence the roofline) of the edge device to optimize the latency and energy for DNN inference, with up to $15\%$ lower energy and minimal degradation in inference time.

Fulcrum: Optimizing Concurrent DNN Training and Inferencing on Edge Accelerators

Distributed, Parallel, and Cluster Computing

Lets smart devices run two jobs at once.

24 Sep 2025 0

88%

Characterizing the Performance of Accelerated Jetson Edge Devices for Training Deep Learning Models

Distributed, Parallel, and Cluster Computing

Trains smart computer programs on small gadgets.

24 Sep 2025 1

87%

Understanding the Performance and Power of LLM Inferencing on Edge Accelerators

Distributed, Parallel, and Cluster Computing

Runs smart AI on small computers, not just big ones.

11 Jun 2025 2

View PDF Login to Bookmark

Country of Origin

🇮🇳 India

Repos / Data Links

github.com

Page Count

36 pages

Pagoda: An Energy and Time Roofline Study for DNN Workloads on Edge Accelerators

Makes AI run faster and use less power.

Technical Abstract

Fulcrum: Optimizing Concurrent DNN Training and Inferencing on Edge Accelerators

Characterizing the Performance of Accelerated Jetson Edge Devices for Training Deep Learning Models

Understanding the Performance and Power of LLM Inferencing on Edge Accelerators