PIM or CXL-PIM? Understanding Architectural Trade-offs Through Large-Scale Benchmarking
By: I-Ting Lee , Bao-Kai Wang , Liang-Chi Chen and more
Potential Business Impact:
Makes computers faster by moving work closer to memory.
Processing-in-memory (PIM) reduces data movement by executing near memory, but our large-scale characterization on real PIM hardware shows that end-to-end performance is often limited by disjoint host and device address spaces that force explicit staging transfers. In contrast, CXL-PIM provides a unified address space and cache-coherent access at the cost of higher access latency. These opposing interface models create workload-dependent tradeoffs that are not captured by small-scale studies. This work presents a side-by-side, large-scale comparison of PIM and CXL-PIM using measurements from real PIM hardware and trace-driven CXL modeling. We identify when unified-address access amortizes link latency enough to overcome transfer bottlenecks, and when tightly coupled PIM remains preferable. Our results reveal phase- and dataset-size regimes in which the relative ranking between the two architectures reverses, offering practical guidance for future near-memory system design.
Similar Papers
PIM or CXL-PIM? Understanding Architectural Trade-offs Through Large-Scale Benchmarking
Emerging Technologies
Makes computers faster by moving work closer to data.
DL-PIM: Improving Data Locality in Processing-in-Memory Systems
Hardware Architecture
Moves computer data closer for faster work.
Modeling and Simulation Frameworks for Processing-in-Memory Architectures
Hardware Architecture
Makes computers faster by doing math inside memory.