Score: 0

CXLAimPod: CXL Memory is all you need in AI era

Published: August 21, 2025 | arXiv ID: 2508.15980v1

By: Yiwei Yang , Yusheng Zheng , Yiqi Chen and more

Potential Business Impact:

Makes computers faster with mixed tasks.

Business Areas:

Cloud Computing Internet Services, Software

The proliferation of data-intensive applications, ranging from large language models to key-value stores, increasingly stresses memory systems with mixed read-write access patterns. Traditional half-duplex architectures such as DDR5 are ill-suited for such workloads, suffering bus turnaround penalties that reduce their effective bandwidth under mixed read-write patterns. Compute Express Link (CXL) offers a breakthrough with its full-duplex channels, yet this architectural potential remains untapped as existing software stacks are oblivious to this capability. This paper introduces CXLAimPod, an adaptive scheduling framework designed to bridge this software-hardware gap through system support, including cgroup-based hints for application-aware optimization. Our characterization quantifies the opportunity, revealing that CXL systems achieve 55-61% bandwidth improvement at balanced read-write ratios compared to flat DDR5 performance, demonstrating the benefits of full-duplex architecture. To realize this potential, the CXLAimPod framework integrates multiple scheduling strategies with a cgroup-based hint mechanism to navigate the trade-offs between throughput, latency, and overhead. Implemented efficiently within the Linux kernel via eBPF, CXLAimPod delivers significant performance improvements over default CXL configurations. Evaluation on diverse workloads shows 7.4% average improvement for Redis (with up to 150% for specific sequential patterns), 71.6% improvement for LLM text generation, and 9.1% for vector databases, demon-strating that duplex-aware scheduling can effectively exploit CXL's architectural advantages.

A Full-System Simulation Framework for CXL-Based SSD Memory System

Hardware Architecture

Makes computer storage faster and bigger.

5 Jan 2025 1

88%

Amplifying Effective CXL Memory Bandwidth for LLM Inference via Transparent Near-Data Processing

Hardware Architecture

Makes AI models run faster and use less memory.

3 Sep 2025 0

88%

Amplifying Effective CXL Memory Bandwidth for LLM Inference via Transparent Near-Data Processing

Hardware Architecture

Makes AI models faster and use less memory.

3 Sep 2025 0

View PDF Login to Bookmark

Page Count

12 pages

CXLAimPod: CXL Memory is all you need in AI era

Makes computers faster with mixed tasks.

Technical Abstract

A Full-System Simulation Framework for CXL-Based SSD Memory System

Amplifying Effective CXL Memory Bandwidth for LLM Inference via Transparent Near-Data Processing

Amplifying Effective CXL Memory Bandwidth for LLM Inference via Transparent Near-Data Processing