Score: 2

A Multi-Language Perspective on the Robustness of LLM Code Generation

Published: April 27, 2025 | arXiv ID: 2504.19108v2

By: Fazle Rabbi, Zishuo Ding, Jinqiu Yang

Potential Business Impact:

Tests AI code writers to make them better.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Large language models have gained significant traction and popularity in recent times, extending their usage to code-generation tasks. While this field has garnered considerable attention, the exploration of testing and evaluating the robustness of code generation models remains an ongoing endeavor. Previous studies have primarily focused on code generation models specifically for the Python language, overlooking other widely used programming languages. In this research, we conduct a comprehensive comparative analysis to assess the robustness performance of several prominent code generation models. Furthermore, we investigate how their performance varies across different programming languages. To accomplish this, we introduce perturbations in four key areas of the prompt: DocString, function name, syntax, and format. We have compiled and released a dedicated dataset for this purpose. This work presents our experimental findings, shedding light on the performance of code generation models in various scenarios.