Score: 1

CrossPipe: Towards Optimal Pipeline Schedules for Cross-Datacenter Training

Published: June 30, 2025 | arXiv ID: 2507.00217v1

By: Tiancheng Chen , Ales Kubicek , Langwen Huang and more

Potential Business Impact:

Trains big computer brains faster across different places.

Business Areas:

Content Delivery Network Content and Publishing

Training large language models (LLMs) now requires resources that exceed a single datacenter, making cross-datacenter strategies increasingly crucial. We present CrossPipe, a framework designed to optimize model training across geographically distributed datacenters by explicitly modeling and mitigating the impact of network latency and limited bandwidth. It enables unified analysis and optimization incorporating both pipeline parallelism (PP) and opportunities for overlapping data parallelism (DP) communication. CrossPipe generates optimized pipeline schedules using either solver-based optimal or fast near-optimal greedy algorithms, built upon a flexible execution engine that separates scheduling logic from communication details. Our evaluation shows that CrossPipe reduces training time by up to 33.6\% compared to traditional pipeline schedules under identical memory constraints. When memory constraints are relaxed, CrossPipe maintains strong performance despite communication delays, approaching the efficiency of idealized schedules without delays. CrossPipe offers improved scalability and resource utilization, particularly in environments with high network latency or limited bandwidth.

A Flexible Programmable Pipeline Parallelism Framework for Efficient DNN Training

Distributed, Parallel, and Cluster Computing

Makes AI models train much faster.

27 Sep 2025 1

89%

A Flexible Programmable Pipeline Parallelism Framework for Efficient DNN Training

Distributed, Parallel, and Cluster Computing

Makes AI models train much faster.

27 Sep 2025 1

89%

CollaPipe: Adaptive Segment-Optimized Pipeline Parallelism for Collaborative LLM Training in Heterogeneous Edge Networks

Systems and Control

Makes smart phones learn faster, use less power.

24 Sep 2025 1

View PDF Login to Bookmark

Repos / Data Links

github.com

Page Count

20 pages

CrossPipe: Towards Optimal Pipeline Schedules for Cross-Datacenter Training

Trains big computer brains faster across different places.

Technical Abstract

A Flexible Programmable Pipeline Parallelism Framework for Efficient DNN Training

A Flexible Programmable Pipeline Parallelism Framework for Efficient DNN Training

CollaPipe: Adaptive Segment-Optimized Pipeline Parallelism for Collaborative LLM Training in Heterogeneous Edge Networks