Prompt Variability Effects On LLM Code Generation
By: Andrei Paleyes , Radzim Sendyka , Diana Robinson and more
Potential Business Impact:
Helps computers write better code for different people.
Code generation is one of the most active areas of application of Large Language Models (LLMs). While LLMs lower barriers to writing code and accelerate development process, the overall quality of generated programs depends on the quality of given prompts. Specifically, functionality and quality of generated code can be sensitive to user's background and familiarity with software development. It is therefore important to quantify LLM's sensitivity to variations in the input. To this end we propose a synthetic evaluation pipeline for code generation with LLMs, as well as a systematic persona-based evaluation approach to expose qualitative differences of LLM responses dependent on prospective user background. Both proposed methods are completely independent from specific programming tasks and LLMs, and thus are widely applicable. We provide experimental evidence illustrating utility of our methods and share our code for the benefit of the community.
Similar Papers
Experimental Analysis of Productive Interaction Strategy with ChatGPT: User Study on Function and Project-level Code Generation Tasks
Software Engineering
Helps computers write better code, faster.
Evaluating Large Language Models for Code Translation: Effects of Prompt Language and Prompt Design
Software Engineering
Helps computers rewrite code between languages.
Prompt engineering and framework: implementation to increase code reliability based guideline for LLMs
Software Engineering
Makes computers write better, faster, and cheaper code.