Score: 2

Evaluating the Generalization Capabilities of Large Language Models on Code Reasoning

Published: April 7, 2025 | arXiv ID: 2504.05518v1

By: Rem Yang , Julian Dai , Nikos Vasilakis and more

BigTech Affiliations: Massachusetts Institute of Technology

Potential Business Impact:

Helps computers understand and write computer code better.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

We assess how the code reasoning abilities of large language models (LLMs) generalize to different kinds of programs. We present techniques for obtaining in- and out-of-distribution programs with different characteristics: code sampled from a domain-specific language, code automatically generated by an LLM, code collected from competitive programming contests, and mutated versions of these programs. We also present an experimental methodology for evaluating LLM generalization by comparing their performance on these programs. We perform an extensive evaluation across 10 state-of-the-art models from the past year, obtaining insights into their generalization capabilities over time and across different classes of programs. Our results highlight that while earlier models exhibit behavior consistent with pattern matching, the latest models exhibit strong generalization abilities on code reasoning.

Cross-Task Benchmarking and Evaluation of General-Purpose and Code-Specific Large Language Models

Software Engineering

Makes computers better at understanding language and code.

4 Dec 2025 1

92%

How Does LLM Reasoning Work for Code? A Survey and a Call to Action

Software Engineering

Helps computers fix and write computer code.

16 Jun 2025 1

91%

Evaluating Intermediate Reasoning of Code-Assisted Large Language Models for Mathematics

Computation and Language

Helps computers solve math problems more logically.

24 Apr 2025 0

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

28 pages

Evaluating the Generalization Capabilities of Large Language Models on Code Reasoning

Helps computers understand and write computer code better.

Technical Abstract

Cross-Task Benchmarking and Evaluation of General-Purpose and Code-Specific Large Language Models

How Does LLM Reasoning Work for Code? A Survey and a Call to Action

Evaluating Intermediate Reasoning of Code-Assisted Large Language Models for Mathematics