Score: 2

A4: Microarchitecture-Aware LLC Management for Datacenter Servers with Emerging I/O Devices

Published: June 12, 2025 | arXiv ID: 2506.11329v1

By: Haneul Park , Jiaqi Lou , Sangjin Lee and more

BigTech Affiliations: Meta

Potential Business Impact:

Makes computers run faster by fixing a memory problem.

Business Areas:

Data Center Hardware, Information Technology

In modern server CPUs, the Last-Level Cache (LLC) serves not only as a victim cache for higher-level private caches but also as a buffer for low-latency DMA transfers between CPU cores and I/O devices through Direct Cache Access (DCA). However, prior work has shown that high-bandwidth network-I/O devices can rapidly flood the LLC with packets, often causing significant contention with co-running workloads. One step further, this work explores hidden microarchitectural properties of the Intel Xeon CPUs, uncovering two previously unrecognized LLC contentions triggered by emerging high-bandwidth I/O devices. Specifically, (C1) DMA-written cache lines in LLC ways designated for DCA (referred to as DCA ways) are migrated to certain LLC ways (denoted as inclusive ways) when accessed by CPU cores, unexpectedly contending with non-I/O cache lines within the inclusive ways. In addition, (C2) high-bandwidth storage-I/O devices, which are increasingly common in datacenter servers, benefit little from DCA while contending with (latency-sensitive) network-I/O devices within DCA ways. To this end, we present \design, a runtime LLC management framework designed to alleviate both (C1) and (C2) among diverse co-running workloads, using a hidden knob and other hardware features implemented in those CPUs. Additionally, we demonstrate that \design can also alleviate other previously known network-I/O-driven LLC contentions. Overall, it improves the performance of latency-sensitive, high-priority workloads by 51\% without notably compromising that of low-priority workloads.

Optimization and Benchmarking of Monolithically Stackable Gain Cell Memory for Last-Level Cache

Emerging Technologies

Makes computer memory smaller and faster.

8 Mar 2025 1

86%

DCO: Dynamic Cache Orchestration for LLM Accelerators through Predictive Management

Hardware Architecture

Makes AI faster by sharing computer memory.

8 Dec 2025 0

85%

Optimizing CPU Cache Utilization in Cloud VMs with Accurate Cache Abstraction

Distributed, Parallel, and Cluster Computing

Makes cloud computers run faster by managing memory better.

13 Nov 2025 0

View PDF Login to Bookmark

Country of Origin

🇰🇷 🇺🇸 United States, Korea, Republic of

Page Count

15 pages

A4: Microarchitecture-Aware LLC Management for Datacenter Servers with Emerging I/O Devices

Makes computers run faster by fixing a memory problem.

Technical Abstract

Optimization and Benchmarking of Monolithically Stackable Gain Cell Memory for Last-Level Cache

DCO: Dynamic Cache Orchestration for LLM Accelerators through Predictive Management

Optimizing CPU Cache Utilization in Cloud VMs with Accurate Cache Abstraction