Score: 2

FlexPipe: Adapting Dynamic LLM Serving Through Inflight Pipeline Refactoring in Fragmented Serverless Clusters

Published: October 13, 2025 | arXiv ID: 2510.11938v1

By: Yanying Lin , Shijie Peng , Chengzhi Lu and more

Potential Business Impact:

Makes AI models run faster and cheaper.

Business Areas:

PaaS Software

Serving Large Language Models (LLMs) in production faces significant challenges from highly variable request patterns and severe resource fragmentation in serverless clusters. Current systems rely on static pipeline configurations that struggle to adapt to dynamic workload conditions, leading to substantial inefficiencies. We present FlexPipe, a novel system that dynamically reconfigures pipeline architectures during runtime to address these fundamental limitations. FlexPipe decomposes models into fine-grained stages and intelligently adjusts pipeline granularity based on real-time request pattern analysis, implementing three key innovations: fine-grained model partitioning with preserved computational graph constraints, inflight pipeline refactoring with consistent cache transitions, and topology-aware resource allocation that navigates GPU fragmentation. Comprehensive evaluation on an 82-GPU cluster demonstrates that FlexPipe achieves up to 8.5x better resource efficiency while maintaining 38.3% lower latency compared to state-of-the-art systems, reducing GPU reservation requirements from 75% to 30% of peak capacity.

A Flexible Programmable Pipeline Parallelism Framework for Efficient DNN Training

Distributed, Parallel, and Cluster Computing

Makes AI models train much faster.

27 Sep 2025 1

88%

CollaPipe: Adaptive Segment-Optimized Pipeline Parallelism for Collaborative LLM Training in Heterogeneous Edge Networks

Systems and Control

Makes smart phones learn faster, use less power.

24 Sep 2025 1

88%

A Flexible Programmable Pipeline Parallelism Framework for Efficient DNN Training

Distributed, Parallel, and Cluster Computing

Makes AI models train much faster.

27 Sep 2025 1

View PDF Login to Bookmark

Repos / Data Links

github.com

Page Count

17 pages

FlexPipe: Adapting Dynamic LLM Serving Through Inflight Pipeline Refactoring in Fragmented Serverless Clusters

Makes AI models run faster and cheaper.

Technical Abstract

A Flexible Programmable Pipeline Parallelism Framework for Efficient DNN Training

CollaPipe: Adaptive Segment-Optimized Pipeline Parallelism for Collaborative LLM Training in Heterogeneous Edge Networks

A Flexible Programmable Pipeline Parallelism Framework for Efficient DNN Training