Diffusion is a code repair operator and generator
By: Mukul Singh , Gust Verbruggen , Vu Le and more
Potential Business Impact:
Fixes broken computer code automatically.
Code diffusion models generate code by iteratively removing noise from the latent representation of a code snippet. During later steps of the diffusion process, when the code snippet has almost converged, differences between discrete representations of these snippets look like last-mile repairs applied to broken or incomplete code. We evaluate the extent to which this resemblance can be exploited to leverage pre-trained code diffusion models for the problem of last-mile repair by considering two applications with significant potential. First, we can leverage the diffusion model for last-mile repair by adding noise to a broken code snippet and resuming the diffusion process. Second, we can leverage the diffusion model to generate arbitrary amount of training data for last-mile repair tasks (that are computationally more efficient) by sampling an intermediate program (input) and the final program (output) from the diffusion process. We perform experiments on 3 domains (Python, Excel and PowerShell) to evaluate applications, as well as analyze properties.
Similar Papers
Understanding Diffusion Models via Code Execution
CV and Pattern Recognition
Shows how computer art programs actually work.
The Principles of Diffusion Models
Machine Learning (CS)
Creates new pictures and sounds from noise.
TreeDiff: AST-Guided Code Generation with Diffusion LLMs
Computation and Language
Helps computers write correct computer code.