Re-thinking Memory-Bound Limitations in CGRAs
By: Xiangfeng Liu , Zhe Jiang , Anzhen Zhu and more
Potential Business Impact:
Makes computers run complex tasks much faster.
Coarse-Grained Reconfigurable Arrays (CGRAs) are specialized accelerators commonly employed to boost performance in workloads with iterative structures. Existing research typically focuses on compiler or architecture optimizations aimed at improving CGRA performance, energy efficiency, flexibility, and area utilization, under the idealistic assumption that kernels can access all data from Scratchpad Memory (SPM). However, certain complex workloads-particularly in fields like graph analytics, irregular database operations, and specialized forms of high-performance computing (e.g., unstructured mesh simulations)-exhibit irregular memory access patterns that hinder CGRA utilization, sometimes dropping below 1.5%, making the CGRA memory-bound. To address this challenge, we conduct a thorough analysis of the underlying causes of performance degradation, then propose a redesigned memory subsystem and refine the memory model. With both microarchitectural and theoretical optimization, our solution can effectively manage irregular memory accesses through CGRA-specific runahead execution mechanism and cache reconfiguration techniques. Our results demonstrate that we can achieve performance comparable to the original SPM-only system while requiring only 1.27% of the storage size. The runahead execution mechanism achieves an average 3.04x speedup (up to 6.91x), with cache reconfiguration technique providing an additional 6.02% improvement, significantly enhancing CGRA performance for irregular memory access patterns.
Similar Papers
Re-thinking Memory-Bound Limitations in CGRAs
Hardware Architecture
Makes slow computers run much faster on tricky tasks.
Monomorphism-based CGRA Mapping via Space and Time Decoupling
Hardware Architecture
Makes computer chips faster and use less power.
An MLIR-based Compilation Framework for Control Flow Management on CGRAs
Software Engineering
Makes flexible chips run complex code faster