Score: 0

Small Language Models as Compiler Experts: Auto-Parallelization for Heterogeneous Systems

Published: December 22, 2025 | arXiv ID: 2512.19250v1

By: Prathamesh Devadiga

Traditional auto-parallelizing compilers, reliant on rigid heuristics, struggle with the complexity of modern heterogeneous systems. This paper presents a comprehensive evaluation of small (approximately 1B parameter) language-model-driven compiler auto-parallelization. We evaluate three models: gemma3, llama3.2, and qwen2.5, using six reasoning strategies across 11 real-world kernels drawn from scientific computing, graph algorithms, and machine learning. Our system is benchmarked against strong compiler baselines, including LLVM Polly, TVM, and Triton. Across 376 total evaluations, the proposed approach achieves an average speedup of 6.81x and a peak performance of 43.25x on convolution operations. We analyze scalability, verify correctness using multiple sanitizers, and confirm robustness across diverse compilers and hardware platforms. Our results demonstrate that small, efficient language models can serve as powerful reasoning engines for complex compiler optimization tasks.

Evaluating Large Language Models for Workload Mapping and Scheduling in Heterogeneous HPC Systems

Distributed, Parallel, and Cluster Computing

Lets computers solve hard scheduling puzzles from words.

4 Nov 2025 0

88%

Automated Design Optimization via Strategic Search with Large Language Models

Machine Learning (CS)

Helps computers design better code faster and cheaper.

27 Nov 2025 0

88%

Cross-Task Benchmarking and Evaluation of General-Purpose and Code-Specific Large Language Models

Software Engineering

Makes computers better at understanding language and code.

4 Dec 2025 1

View PDF Login to Bookmark

Small Language Models as Compiler Experts: Auto-Parallelization for Heterogeneous Systems

Technical Abstract

Evaluating Large Language Models for Workload Mapping and Scheduling in Heterogeneous HPC Systems

Automated Design Optimization via Strategic Search with Large Language Models

Cross-Task Benchmarking and Evaluation of General-Purpose and Code-Specific Large Language Models