Correctness-Guaranteed Code Generation via Constrained Decoding
By: Lingxiao Li, Salar Rahili, Yiwei Zhao
Potential Business Impact:
Makes computer code work perfectly the first time.
Language Models (LMs) are increasingly being used for code generation, but ensuring the correctness of generated programs remains a significant challenge. Although imperfect code may be acceptable during software development with human oversight, domains such as video games and robotics require one-shot correctness for runtime-critical components. We present a constrained decoding algorithm for generating semantically correct programs that incorporates a context-sensitive parser, which, at each step, outputs a regular expression that satisfies a critical non-extensible property to guide the generation of the next token sequence that can continue to a correct program. To build such a context-sensitive parser, we propose a framework of a dynamic tree of parsers (ToP) during parsing, where each parser corresponds to a modular context-free grammar enriched with contextual information such as variable scopes and type constraints, with tree branches representing ambiguity in the future code segment. We demonstrate our approach through sLua, a strongly typed variant of Lua, showing that our method can generate semantically correct programs conforming to any prescribed scripting API. We further show that, with careful design, our semantic guarantees extend to runtime correctness, as validated in the application of generating game mechanics for a roguelike video game.
Similar Papers
TreeCoder: Systematic Exploration and Optimisation of Decoding and Constraints for LLM Code Generation
Machine Learning (CS)
Makes computer code write itself correctly.
Type-Constrained Code Generation with Language Models
Machine Learning (CS)
Makes computer code write itself correctly.
ChopChop: a Programmable Framework for Semantically Constraining the Output of Language Models
Programming Languages
Makes computer code work correctly every time.