Flex-MIG: Enabling Distributed Execution on MIG
By: Myeongsu Kim, Ikjun Yeom, Younghoon Kim
Potential Business Impact:
Lets many computers share one powerful graphics chip.
GPU clusters in multi-tenant settings often suffer from underutilization, making GPU-sharing technologies essential for efficient resource use. Among them, NVIDIA Multi-Instance GPU (MIG) has gained traction for providing hardware-level isolation that enables concurrent workloads without interference. However, MIG's hardware rigidity and the conventional one-to-one allocation model jointly lead to severe fragmentation and cluster-wide underutilization. We present Flex-MIG, a software-only framework that replaces one-to-one with a one-to-many allocation model and enables host-shared-memory collectives across MIG instances without hardware modification. Flex-MIG eliminates drain-required reconfiguration, reduces fragmentation, and improves makespan by up to 17% across diverse traces, showing that rethinking MIG's operational model as a software-coordinated layer substantially improves cluster efficiency.
Similar Papers
An Online Fragmentation-Aware GPU Scheduler for Multi-Tenant MIG-based Clouds
Distributed, Parallel, and Cluster Computing
Makes more AI programs run on shared computer chips.
On the Partitioning of GPU Power among Multi-Instances
Distributed, Parallel, and Cluster Computing
Tracks computer chip power use per task.
Predictable LLM Serving on GPU Clusters
Distributed, Parallel, and Cluster Computing
Makes computer programs run faster on shared machines.