Where to Split? A Pareto-Front Analysis of DNN Partitioning for Edge Inference
By: Adiba Masud , Nicholas Foley , Pragathi Durga Rajarajan and more
The deployment of deep neural networks (DNNs) on resource-constrained edge devices is frequently hindered by their significant computational and memory requirements. While partitioning and distributing a DNN across multiple devices is a well-established strategy to mitigate this challenge, prior research has largely focused on single-objective optimization, such as minimizing latency or maximizing throughput. This paper challenges that view by reframing DNN partitioning as a multi-objective optimization problem. We argue that in real-world scenarios, a complex trade-off between latency and throughput exists, which is further complicated by network variability. To address this, we introduce ParetoPipe, an open-source framework that leverages Pareto front analysis to systematically identify optimal partitioning strategies that balance these competing objectives. Our contributions are threefold: we benchmark pipeline partitioned inference on a heterogeneous testbed of Raspberry Pis and a GPU-equipped edge server; we identify Pareto-optimal points to analyze the latency-throughput trade-off under varying network conditions; and we release a flexible, open-source framework to facilitate distributed inference and benchmarking. This toolchain features dual communication backends, PyTorch RPC and a custom lightweight implementation, to minimize overhead and support broad experimentation.
Similar Papers
Rethinking Inference Placement for Deep Learning across Edge and Cloud Platforms: A Multi-Objective Optimization Perspective and Future Directions
Distributed, Parallel, and Cluster Computing
Makes smart apps run faster and safer.
Collaborative Inference Acceleration with Non-Penetrative Tensor Partitioning
Distributed, Parallel, and Cluster Computing
Makes smart gadgets process big pictures faster.
Privacy-Aware Joint DNN Model Deployment and Partitioning Optimization for Collaborative Edge Inference Services
Machine Learning (CS)
Makes smart devices answer faster, privately.