On the Evaluation of Large Language Models in Multilingual Vulnerability Repair
By: Dong wang , Junji Yu , Honglin Shu and more
Potential Business Impact:
Fixes computer code bugs in many languages.
Various Deep Learning-based approaches with pre-trained language models have been proposed for automatically repairing software vulnerabilities. However, these approaches are limited to a specific programming language (C/C++). Recent advances in large language models (LLMs) offer language-agnostic capabilities and strong semantic understanding, exhibiting potential to overcome multilingual vulnerability limitations. Although some work has begun to explore LLMs' repair performance, their effectiveness is unsatisfactory. To address these limitations, we conducted a large-scale empirical study to investigate the performance of automated vulnerability repair approaches and state-of-the-art LLMs across seven programming languages. Results show GPT-4o, instruction-tuned with few-shot prompting, performs competitively against the leading approach, VulMaster. Additionally, the LLM-based approach shows superior performance in repairing unique vulnerabilities and is more likely to repair the most dangerous vulnerabilities. Instruction-tuned GPT-4o demonstrates strong generalization on vulnerabilities in previously unseen language, outperforming existing approaches. Analysis shows Go consistently achieves the highest effectiveness across all model types, while C/C++ performs the worst. Based on findings, we discuss the promise of LLM on multilingual vulnerability repair and the reasons behind LLM's failed cases. This work takes the first look at repair approaches and LLMs across multiple languages, highlighting the promising future of adopting LLMs for multilingual vulnerability repair.
Similar Papers
Large Language Models for Multilingual Vulnerability Detection: How Far Are We?
Software Engineering
Finds hidden computer bugs in many languages.
A Preliminary Study of Large Language Models for Multilingual Vulnerability Detection
Software Engineering
Finds computer bugs in many languages.
Evaluating LLMs for One-Shot Patching of Real and Artificial Vulnerabilities
Cryptography and Security
Fixes computer bugs automatically, better on real ones.