Peformance Isolation for Inference Processes in Edge GPU Systems
By: Juan José Martín, José Flich, Carles Hernández
Potential Business Impact:
Makes AI run reliably in important jobs.
This work analyzes the main isolation mechanisms available in modern NVIDIA GPUs: MPS, MIG, and the recent Green Contexts, to ensure predictable inference time in safety-critical applications using deep learning models. The experimental methodology includes performance tests, evaluation of partitioning impact, and analysis of temporal isolation between processes, considering both the NVIDIA A100 and Jetson Orin platforms. It is observed that MIG provides a high level of isolation. At the same time, Green Contexts represent a promising alternative for edge devices by enabling fine-grained SM allocation with low overhead, albeit without memory isolation. The study also identifies current limitations and outlines potential research directions to improve temporal predictability in shared GPUs.
Similar Papers
On the Partitioning of GPU Power among Multi-Instances
Distributed, Parallel, and Cluster Computing
Tracks computer chip power use per task.
An Online Fragmentation-Aware Scheduler for Managing GPU-Sharing Workloads on Multi-Instance GPUs
Distributed, Parallel, and Cluster Computing
Lets many computer jobs share one graphics chip.
An Online Fragmentation-Aware GPU Scheduler for Multi-Tenant MIG-based Clouds
Distributed, Parallel, and Cluster Computing
Makes more AI programs run on shared computer chips.