A Multi-Language Perspective on the Robustness of LLM Code Generation
By: Fazle Rabbi, Zishuo Ding, Jinqiu Yang
Potential Business Impact:
Tests AI code writers to make them better.
Large language models have gained significant traction and popularity in recent times, extending their usage to code-generation tasks. While this field has garnered considerable attention, the exploration of testing and evaluating the robustness of code generation models remains an ongoing endeavor. Previous studies have primarily focused on code generation models specifically for the Python language, overlooking other widely used programming languages. In this research, we conduct a comprehensive comparative analysis to assess the robustness performance of several prominent code generation models. Furthermore, we investigate how their performance varies across different programming languages. To accomplish this, we introduce perturbations in four key areas of the prompt: DocString, function name, syntax, and format. We have compiled and released a dedicated dataset for this purpose. This work presents our experimental findings, shedding light on the performance of code generation models in various scenarios.
Similar Papers
Large Language Models for Code Generation: The Practitioners Perspective
Software Engineering
Tests AI code to help programmers build better software.
Evaluating Programming Language Confusion
Software Engineering
Fixes computer programs that accidentally switch languages.
Adversarial Attack Classification and Robustness Testing for Large Language Models for Code
Software Engineering
Makes computer code safer from tricky words.