LEGO: Layout Expression for Generating One-to-one Mapping
By: Amir Mohammad Tavakkoli, Cosmin Oancea, Mary Hall
Potential Business Impact:
Makes computer programs run much faster.
We describe LEGO, a new approach to optimizing data movement whereby code is expressed as a layout-independent computation and composed with layouts for data and computation. This code generator organization derives complex indexing expressions associated with hierarchical parallel code and data movement for GPUs. LEGO maps from layout specification to indexing expressions, and can be integrated into existing compilers and code templates. It facilitates the exploration of data layouts in combination with other optimizations. We demonstrate LEGO's integration with the MLIR and Triton compilers, and with CUDA templates. We show that LEGO is capable of deriving performance competitive with Triton, and shows broad applicability in its integration with MLIR and CUDA.
Similar Papers
LEGO: Spatial Accelerator Generation and Optimization for Tensor Applications
Hardware Architecture
Builds AI faster and uses less power.
LEGO-Compiler: Enhancing Neural Compilation Through Translation Composability
Programming Languages
Makes computers understand and translate long, tricky code.
Linear Layouts: Robust Code Generation of Efficient Tensor Computation Using $\mathbb{F}_2$
Programming Languages
Makes AI faster by organizing its data better.