New Tools, Programming Models, and System Support for Processing-in-Memory Architectures
By: Geraldo F. Oliveira
Potential Business Impact:
Makes computer chips work faster inside memory.
Our goal in this dissertation is to provide tools, programming models, and system support for PIM architectures (with a focus on DRAM-based solutions), to ease the adoption of PIM in current and future systems. To this end, we make at least four new major contributions. First, we introduce DAMOV, the first rigorous methodology to characterize memory-related data movement bottlenecks in modern workloads, and the first data movement benchmark suite. Second, we introduce MIMDRAM, a new hardware/software co-designed substrate that addresses the major current programmability and flexibility limitations of the bulk bitwise execution model of processing-using-DRAM (PUD) architectures. MIMDRAM enables the allocation and control of only the needed computing resources inside DRAM for PUD computing. Third, we introduce Proteus, the first hardware framework that addresses the high execution latency of bulk bitwise PUD operations in state-of-the-art PUD architectures by implementing a data-aware runtime engine for PUD. Proteus reduces the latency of PUD operations in three different ways: (i) Proteus concurrently executes independent in-DRAM primitives belong to a single PUD operation across DRAM arrays. (ii) Proteus dynamically reduces the bit-precision (and consequentially the latency and energy consumption) of PUD operations by exploiting narrow values (i.e., values with many leading zeros or ones). (iii) Proteus chooses and uses the most appropriate data representation and arithmetic algorithm implementation for a given PUD instruction transparently to the programmer. Fourth, we introduce DaPPA (data-parallel processing-in-memory architecture), a new programming framework that eases programmability for general-purpose PNM architectures by allowing the programmer to write efficient PIM-friendly code without the need to manage hardware resources explicitly.
Similar Papers
Modeling and Simulation Frameworks for Processing-in-Memory Architectures
Hardware Architecture
Makes computers faster by doing math inside memory.
DL-PIM: Improving Data Locality in Processing-in-Memory Systems
Hardware Architecture
Moves computer data closer for faster work.
UMDAM: A Unified Data Layout and DRAM Address Mapping for Heterogenous NPU-PIM
Distributed, Parallel, and Cluster Computing
Makes AI on phones run much faster.