An MLIR-Based Compilation Framework for Control Flow Management on Coarse Grained Reconfigurable Arrays
By: Yuxuan Wang , Cristian Tirelli , Giovanni Ansaloni and more
Potential Business Impact:
Makes computers run programs much faster.
Coarse Grained Reconfigurable Arrays (CGRAs) present both high flexibility and efficiency, making them well-suited for the acceleration of intensive workloads. Nevertheless, a key barrier towards their widespread adoption is posed by CGRA compilation, which must cope with a multi-dimensional space spanning both the spatial and the temporal domains. Indeed, state-of-the-art compilers are limited in scope as they mostly deal with the data flow of applications, while having little or no support for control flow. Hence, they mostly target the mapping of single loops and/or delegate the management of control flow divergences to ad-hoc hardware units. Conversely, in this paper we show that control flow can be effectively managed and optimized at the compilation level, allowing for a broad set of applications to be targeted while being hardware-agnostic and achieving high performance. We embody our methodology in a modular compilation framework consisting of transformation and optimization passes, enabling support for applications with arbitrary control flows running on abstract CGRA meshes. We also introduce a novel mapping methodology that acts as a compilation back-end, addressing the limitations in available CGRA hardware resources and guaranteeing a feasible solution in the compilation process. Our framework achieves up to 2.1X speedups over state-of-the-art approaches, purely through compilation optimizations.
Similar Papers
An MLIR-based Compilation Framework for Control Flow Management on CGRAs
Software Engineering
Makes flexible chips run complex code faster
Monomorphism-based CGRA Mapping via Space and Time Decoupling
Hardware Architecture
Makes computer chips faster and use less power.
Re-thinking Memory-Bound Limitations in CGRAs
Hardware Architecture
Makes computers run complex tasks much faster.