Score: 0

PIM or CXL-PIM? Understanding Architectural Trade-offs Through Large-Scale Benchmarking

Published: November 18, 2025 | arXiv ID: 2511.14400v2

By: I-Ting Lee , Bao-Kai Wang , Liang-Chi Chen and more

Potential Business Impact:

Makes computers faster by moving work closer to memory.

Business Areas:

Application Performance Management Data and Analytics, Software

Processing-in-memory (PIM) reduces data movement by executing near memory, but our large-scale characterization on real PIM hardware shows that end-to-end performance is often limited by disjoint host and device address spaces that force explicit staging transfers. In contrast, CXL-PIM provides a unified address space and cache-coherent access at the cost of higher access latency. These opposing interface models create workload-dependent tradeoffs that are not captured by small-scale studies. This work presents a side-by-side, large-scale comparison of PIM and CXL-PIM using measurements from real PIM hardware and trace-driven CXL modeling. We identify when unified-address access amortizes link latency enough to overcome transfer bottlenecks, and when tightly coupled PIM remains preferable. Our results reveal phase- and dataset-size regimes in which the relative ranking between the two architectures reverses, offering practical guidance for future near-memory system design.