Score: 0

MaxCode: A Max-Reward Reinforcement Learning Framework for Automated Code Optimization

Published: January 9, 2026 | arXiv ID: 2601.05475v1

By: Jiefu Ou , Sapana Chaudhary , Kaj Bostrom and more

Potential Business Impact:

Makes computer code run much faster.

Business Areas:

Machine Learning Artificial Intelligence, Data and Analytics, Software

Large Language Models (LLMs) demonstrate strong capabilities in general coding tasks but encounter two key challenges when optimizing code: (i) the complexity of writing optimized code (such as performant CUDA kernels and competition-level CPU code) requires expertise in systems, algorithms and specific languages and (ii) requires interpretation of performance metrics like timing and device utilization beyond binary correctness. In this work, we explore inference-time search algorithms that guide the LLM to discover better solutions through iterative refinement based on execution feedback. Our approach, called MaxCode unifies existing search methods under a max-reward reinforcement learning framework, making the observation and action-value functions modular for modification. To enhance the observation space, we integrate a natural language critique model that converts raw execution feedback into diagnostic insights about errors and performance bottlenecks, and the best-discounted reward seen so far. Together, these provide richer input to the code proposal function. To improve exploration during search, we train a generative reward-to-go model using action values from rollouts to rerank potential solutions. Testing on the KernelBench (CUDA) and PIE (C++) optimization benchmarks shows that MaxCode improves optimized code performance compared to baselines, achieving 20.3% and 10.1% relative improvements in absolute speedup value and relative speedup ranking, respectively.

CodeBoost: Boosting Code LLMs by Squeezing Knowledge from Code Snippets with RL

Computation and Language

Teaches computers to write better code automatically.

7 Aug 2025 1

88%

Improving Assembly Code Performance with Large Language Models via Reinforcement Learning

Computation and Language

Makes computer code run much faster.

16 May 2025 2

87%

CodeRL+: Improving Code Generation via Reinforcement with Execution Semantics Alignment

Software Engineering

Teaches computers to write code that works.

21 Oct 2025 1

View PDF Login to Bookmark

Page Count

22 pages

MaxCode: A Max-Reward Reinforcement Learning Framework for Automated Code Optimization

Makes computer code run much faster.

Technical Abstract

CodeBoost: Boosting Code LLMs by Squeezing Knowledge from Code Snippets with RL

Improving Assembly Code Performance with Large Language Models via Reinforcement Learning

CodeRL+: Improving Code Generation via Reinforcement with Execution Semantics Alignment