Score: 0

Compute Can't Handle the Truth: Why Communication Tax Prioritizes Memory and Interconnects in Modern AI Infrastructure

Published: July 9, 2025 | arXiv ID: 2507.07223v2

By: Myoungsoo Jung

Potential Business Impact:

Builds super-fast AI by connecting computer parts better.

Modern AI workloads such as large language models (LLMs) and retrieval-augmented generation (RAG) impose severe demands on memory, communication bandwidth, and resource flexibility. Traditional GPU-centric architectures struggle to scale due to growing inter-GPU communication overheads. This report introduces key AI concepts and explains how Transformers revolutionized data representation in LLMs. We analyze large-scale AI hardware and data center designs, identifying scalability bottlenecks in hierarchical systems. To address these, we propose a modular data center architecture based on Compute Express Link (CXL) that enables disaggregated scaling of memory, compute, and accelerators. We further explore accelerator-optimized interconnects-collectively termed XLink (e.g., UALink, NVLink, NVLink Fusion)-and introduce a hybrid CXL-over-XLink design to reduce long-distance data transfers while preserving memory coherence. We also propose a hierarchical memory model that combines local and pooled memory, and evaluate lightweight CXL implementations, HBM, and silicon photonics for efficient scaling. Our evaluations demonstrate improved scalability, throughput, and flexibility in AI infrastructure.

AI Accelerators for Large Language Model Inference: Architecture Analysis and Scaling Strategies

Hardware Architecture

Finds best computer chips for AI tasks.

13 May 2025 0

90%

Scaling Intelligence: Designing Data Centers for Next-Gen Language Models

Hardware Architecture

Builds faster, cheaper computer centers for giant AI.

17 Jun 2025 1

90%

Characterizing Communication Patterns in Distributed Large Language Model Inference

Distributed, Parallel, and Cluster Computing

Makes AI talk faster by fixing how computers share info.

18 Jul 2025 0

View PDF Login to Bookmark

Page Count

76 pages

Compute Can't Handle the Truth: Why Communication Tax Prioritizes Memory and Interconnects in Modern AI Infrastructure

Builds super-fast AI by connecting computer parts better.

Technical Abstract

AI Accelerators for Large Language Model Inference: Architecture Analysis and Scaling Strategies

Scaling Intelligence: Designing Data Centers for Next-Gen Language Models

Characterizing Communication Patterns in Distributed Large Language Model Inference