Score: 0

PruneX: A Hierarchical Communication-Efficient System for Distributed CNN Training with Structured Pruning

Published: December 16, 2025 | arXiv ID: 2512.14628v1

By: Alireza Olama , Andreas Lundell , Izzat El Hajj and more

Inter-node communication bandwidth increasingly constrains distributed training at scale on multi-node GPU clusters. While compact models are the ultimate deployment target, conventional pruning-aware distributed training systems typically fail to reduce communication overhead because unstructured sparsity cannot be efficiently exploited by highly optimized dense collective primitives. We present PruneX, a distributed data-parallel training system that co-designs pruning algorithms with cluster hierarchy to reduce inter-node bandwidth usage. PruneX introduces the Hierarchical Structured ADMM (H-SADMM) algorithm, which enforces node-level structured sparsity before inter-node synchronization, enabling dynamic buffer compaction that eliminates both zero-valued transmissions and indexing overhead. The system adopts a leader-follower execution model with separated intra-node and inter-node process groups, performing dense collectives on compacted tensors over bandwidth-limited links while confining full synchronization to high-bandwidth intra-node interconnects. Evaluation on ResNet architectures across 64 GPUs demonstrates that PruneX reduces inter-node communication volume by approximately 60% and achieves 6.75x strong scaling speedup, outperforming the dense baseline (5.81x) and Top-K gradient compression (3.71x) on the Puhti supercomputer at CSC - IT Center for Science (Finland).

HierarchicalPrune: Position-Aware Compression for Large-Scale Diffusion Models

CV and Pattern Recognition

Makes big AI art programs run on phones.

6 Aug 2025 2

87%

Enhanced Pruning for Distributed Closeness Centrality under Multi-Packet Messaging

Distributed, Parallel, and Cluster Computing

Lets computers find important parts of networks faster.

12 Dec 2025 0

87%

Compressing CNN models for resource-constrained systems by channel and layer pruning

Machine Learning (CS)

Makes smart computer programs smaller and faster.

10 Sep 2025 0

View PDF Login to Bookmark

PruneX: A Hierarchical Communication-Efficient System for Distributed CNN Training with Structured Pruning

Technical Abstract

HierarchicalPrune: Position-Aware Compression for Large-Scale Diffusion Models

Enhanced Pruning for Distributed Closeness Centrality under Multi-Packet Messaging

Compressing CNN models for resource-constrained systems by channel and layer pruning