Score: 2

Puzzle: Scheduling Multiple Deep Learning Models on Mobile Device with Heterogeneous Processors

Published: August 25, 2025 | arXiv ID: 2508.17764v1

By: Duseok Kang, Yunseong Lee, Junghoon Kim

BigTech Affiliations: Qualcomm

Potential Business Impact:

Lets phones run many AI tasks faster.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

As deep learning models are increasingly deployed on mobile devices, modern mobile devices incorporate deep learning-specific accelerators to handle the growing computational demands, thus increasing their hardware heterogeneity. However, existing works on scheduling deep learning workloads across these processors have significant limitations: most studies focus on single-model scenarios rather than realistic multi-model scenarios, overlook performance variations from different hardware/software configurations, and struggle with accurate execution time estimation. To address these challenges, we propose a novel genetic algorithm-based methodology for scheduling multiple deep learning networks on heterogeneous processors by partitioning the networks into multiple subgraphs. Our approach incorporates three different types of chromosomes for partition/mapping/priority exploration, and leverages device-in-the-loop profiling and evaluation for accurate execution time estimation. Based on this methodology, our system, Puzzle, demonstrates superior performance in extensive evaluations with randomly generated scenarios involving nine state-of-the-art networks. The results demonstrate Puzzle can support 3.7 and 2.2 times higher request frequency on average compared to the two heuristic baselines, NPU Only and Best Mapping, respectively, while satisfying the equivalent level of real-time requirements.

Scheduling Techniques of AI Models on Modern Heterogeneous Edge GPU -- A Critical Review

Distributed, Parallel, and Cluster Computing

Makes smart gadgets run AI faster and better.

2 Jun 2025 0

87%

Resource Heterogeneity-Aware and Utilization-Enhanced Scheduling for Deep Learning Clusters

Distributed, Parallel, and Cluster Computing

Makes computer learning faster and better.

13 Mar 2025 1

87%

Accelerating Mobile Inference through Fine-Grained CPU-GPU Co-Execution

Machine Learning (CS)

Lets phones run smart programs much faster.

24 Oct 2025 1

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

20 pages

Puzzle: Scheduling Multiple Deep Learning Models on Mobile Device with Heterogeneous Processors

Lets phones run many AI tasks faster.

Technical Abstract

Scheduling Techniques of AI Models on Modern Heterogeneous Edge GPU -- A Critical Review

Resource Heterogeneity-Aware and Utilization-Enhanced Scheduling for Deep Learning Clusters

Accelerating Mobile Inference through Fine-Grained CPU-GPU Co-Execution