PerfDojo: Automated ML Library Generation for Heterogeneous Architectures
By: Andrei Ivanov , Siyuan Shen , Gioele Gottardo and more
Potential Business Impact:
Makes computer programs run much faster everywhere.
The increasing complexity of machine learning models and the proliferation of diverse hardware architectures (CPUs, GPUs, accelerators) make achieving optimal performance a significant challenge. Heterogeneity in instruction sets, specialized kernel requirements for different data types and model features (e.g., sparsity, quantization), and architecture-specific optimizations complicate performance tuning. Manual optimization is resource-intensive, while existing automatic approaches often rely on complex hardware-specific heuristics and uninterpretable intermediate representations, hindering performance portability. We introduce PerfLLM, a novel automatic optimization methodology leveraging Large Language Models (LLMs) and Reinforcement Learning (RL). Central to this is PerfDojo, an environment framing optimization as an RL game using a human-readable, mathematically-inspired code representation that guarantees semantic validity through transformations. This allows effective optimization without prior hardware knowledge, facilitating both human analysis and RL agent training. We demonstrate PerfLLM's ability to achieve significant performance gains across diverse CPU (x86, Arm, RISC-V) and GPU architectures.
Similar Papers
Beyond Single LLMs: Enhanced Code Generation via Multi-Stage Performance-Guided LLM Orchestration
Software Engineering
Makes AI write better computer code, faster.
PerfCoder: Large Language Models for Interpretable Code Performance Optimization
Software Engineering
Makes computer programs run much faster.
Do Large Language Models Understand Performance Optimization?
Distributed, Parallel, and Cluster Computing
Computers write faster, but sometimes make mistakes.