Learned Cost Model for Placement on Reconfigurable Dataflow Hardware
By: Etash Guha , Tianxiao Jiang , Andrew Deng and more
Potential Business Impact:
Makes computer models run faster and smarter.
Mapping a dataflow-graph of an ML model onto a reconfigurable system is difficult, as different mappings have different throughputs and consume resource constraints differently. To solve this, a model to evaluate the throughput of mappings is necessary as measuring throughput completely is expensive. Many use a hand-designed analytical model, relying on proxy features or intuition, introducing error. We provide a Learned Approach that predicts throughput 31%-52% more accurately over a variety of graphs. In addition, our approach shows no accuracy degradation after removing performance annotations. We show that using this approach results in 5.6% faster compiled graphs.
Similar Papers
A Bring-Your-Own-Model Approach for ML-Driven Storage Placement in Warehouse-Scale Computers
Distributed, Parallel, and Cluster Computing
Makes computer storage cheaper and faster.
Learning-Augmented Performance Model for Tensor Product Factorization in High-Order FEM
Distributed, Parallel, and Cluster Computing
Helps supercomputers run math problems faster.
Optimizing Resource Allocation for Geographically-Distributed Inference by Large Language Models
Distributed, Parallel, and Cluster Computing
Makes big AI models run on many computers.