Score: 2

Automatic Operator-level Parallelism Planning for Distributed Deep Learning -- A Mixed-Integer Programming Approach

Published: March 12, 2025 | arXiv ID: 2503.09357v1

By: Ruifeng She , Bowen Pang , Kai Li and more

BigTech Affiliations: Huawei

Potential Business Impact:

Makes big AI models train faster and smarter.

Business Areas:

Machine Learning Artificial Intelligence, Data and Analytics, Software

As the artificial intelligence community advances into the era of large models with billions of parameters, distributed training and inference have become essential. While various parallelism strategies-data, model, sequence, and pipeline-have been successfully implemented for popular neural networks on main-stream hardware, optimizing the distributed deployment schedule requires extensive expertise and manual effort. Further more, while existing frameworks with most simple chain-like structures, they struggle with complex non-linear architectures. Mixture-of-experts and multi-modal models feature intricate MIMO and branch-rich topologies that require fine-grained operator-level parallelization beyond the capabilities of existing frameworks. We propose formulating parallelism planning as a scheduling optimization problem using mixed-integer programming. We propose a bi-level solution framework balancing optimality with computational efficiency, automatically generating effective distributed plans that capture both the heterogeneous structure of modern neural networks and the underlying hardware constraints. In experiments comparing against expert-designed strategies like DeepSeek's DualPipe, our framework achieves comparable or superior performance, reducing computational bubbles by half under the same memory constraints. The framework's versatility extends beyond throughput optimization to incorporate hardware utilization maximization, memory capacity constraints, and other considerations or potential strategies. Such capabilities position our solution as both a valuable research tool for exploring optimal parallelization strategies and a practical industrial solution for large-scale AI deployment.

Integrated Planning and Machine-Level Scheduling for High-Mix Discrete Manufacturing: A Profit-Driven Heuristic Framework

Computational Engineering, Finance, and Science

Makes factories finish jobs on time, every time.

11 Dec 2025 0

88%

Rethinking Dynamic Networks and Heterogeneous Computing with Automatic Parallelization

Distributed, Parallel, and Cluster Computing

Trains big computer brains faster on mixed computers.

3 Jun 2025 1

88%

Automatic MILP Model Construction for Multi-Robot Task Allocation and Scheduling Based on Large Language Models

Artificial Intelligence

Lets robots build things by talking to them.

18 Mar 2025 0

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

10 pages

Automatic Operator-level Parallelism Planning for Distributed Deep Learning -- A Mixed-Integer Programming Approach

Makes big AI models train faster and smarter.

Technical Abstract

Integrated Planning and Machine-Level Scheduling for High-Mix Discrete Manufacturing: A Profit-Driven Heuristic Framework

Rethinking Dynamic Networks and Heterogeneous Computing with Automatic Parallelization

Automatic MILP Model Construction for Multi-Robot Task Allocation and Scheduling Based on Large Language Models