Score: 1

CollaPipe: Adaptive Segment-Optimized Pipeline Parallelism for Collaborative LLM Training in Heterogeneous Edge Networks

Published: September 24, 2025 | arXiv ID: 2509.19855v1

By: Jiewei Chen , Xiumei Deng , Zehui Xiong and more

Potential Business Impact:

Makes smart phones learn faster, use less power.

Business Areas:

Peer to Peer Collaboration

The increasing demand for intelligent mobile applications has made multi-agent collaboration with Transformer-based large language models (LLMs) essential in mobile edge computing (MEC) networks. However, training LLMs in such environments remains challenging due to heavy computation, high end-to-end latency, and limited model generalization. We introduce CollaPipe, a hybrid distributed learning framework that integrates collaborative pipeline parallelism with federated aggregation to support self-evolving intelligent networks. In CollaPipe, the encoder part is adaptively partitioned into variable-sized segments and deployed across mobile devices for pipeline-parallel training, while the decoder is deployed on edge servers to handle generative tasks. Then we perform global model update via federated aggregation. To enhance training efficiency, we formulate a joint optimization problem that adaptively allocates model segments, micro-batches, bandwidth, and transmission power. We derive and use a closed-form convergence bound to design an Dynamic Segment Scheduling and Resource Allocation (DSSDA) algorithm based on Lyapunov optimization, ensuring system stability under long-term constraints. Extensive experiments on downstream tasks with Transformer and BERT models show that CollaPipe improves computation efficiency by up to 15.09%, reduces end-to-end latency by at least 48.98%, and cuts single device memory usage by more than half, enabling online learning in heterogeneous and dynamic communication environments.

FlexPipe: Adapting Dynamic LLM Serving Through Inflight Pipeline Refactoring in Fragmented Serverless Clusters

Distributed, Parallel, and Cluster Computing

Makes AI models run faster and cheaper.

13 Oct 2025 2

88%

TD-Pipe: Temporally-Disaggregated Pipeline Parallelism Architecture for High-Throughput LLM Inference

Distributed, Parallel, and Cluster Computing

Makes AI answer questions much faster.

12 Jun 2025 0

87%

A Flexible Programmable Pipeline Parallelism Framework for Efficient DNN Training

Distributed, Parallel, and Cluster Computing

Makes AI models train much faster.

27 Sep 2025 1

View PDF Login to Bookmark

Country of Origin

🇨🇳 🇸🇬 🇬🇧 🇨🇦 Singapore, China, Canada, United Kingdom

Page Count

19 pages

CollaPipe: Adaptive Segment-Optimized Pipeline Parallelism for Collaborative LLM Training in Heterogeneous Edge Networks

Makes smart phones learn faster, use less power.

Technical Abstract

FlexPipe: Adapting Dynamic LLM Serving Through Inflight Pipeline Refactoring in Fragmented Serverless Clusters

TD-Pipe: Temporally-Disaggregated Pipeline Parallelism Architecture for High-Throughput LLM Inference

A Flexible Programmable Pipeline Parallelism Framework for Efficient DNN Training