Score: 0

Evaluating Multi-Instance DNN Inferencing on Multiple Accelerators of an Edge Device

Published: March 12, 2025 | arXiv ID: 2503.09546v1

By: Mumuksh Tayal, Yogesh Simmhan

Potential Business Impact:

Makes smart devices run faster using all their parts.

Business Areas:

GPU Hardware

Edge devices like Nvidia Jetson platforms now offer several on-board accelerators -- including GPU CUDA cores, Tensor Cores, and Deep Learning Accelerators (DLA) -- which can be concurrently exploited to boost deep neural network (DNN) inferencing. In this paper, we extend previous work by evaluating the performance impacts of running multiple instances of the ResNet50 model concurrently across these heterogeneous components. We detail the effects of varying batch sizes and hardware combinations on throughput and latency. Our expanded analysis highlights not only the benefits of combining CUDA and Tensor Cores, but also the performance degradation from resource contention when integrating DLAs. These findings, together with insights on precision constraints and workload allocation challenges, motivate further exploration of intelligent scheduling mechanisms to optimize resource utilization on edge platforms.

Characterizing the Performance of Accelerated Jetson Edge Devices for Training Deep Learning Models

Distributed, Parallel, and Cluster Computing

Trains smart computer programs on small gadgets.

24 Sep 2025 1

89%

Fulcrum: Optimizing Concurrent DNN Training and Inferencing on Edge Accelerators

Distributed, Parallel, and Cluster Computing

Lets smart devices run two jobs at once.

24 Sep 2025 0

88%

Scheduling Techniques of AI Models on Modern Heterogeneous Edge GPU -- A Critical Review

Distributed, Parallel, and Cluster Computing

Makes smart gadgets run AI faster and better.

2 Jun 2025 0

View PDF Login to Bookmark

Country of Origin

🇮🇳 India

Page Count

5 pages

Evaluating Multi-Instance DNN Inferencing on Multiple Accelerators of an Edge Device

Makes smart devices run faster using all their parts.

Technical Abstract

Characterizing the Performance of Accelerated Jetson Edge Devices for Training Deep Learning Models

Fulcrum: Optimizing Concurrent DNN Training and Inferencing on Edge Accelerators

Scheduling Techniques of AI Models on Modern Heterogeneous Edge GPU -- A Critical Review