A Matter of Representation: Towards Graph-Based Abstract Code Generation
By: Nyx Iskandar, Hisham Bedri, Andy Tsen
Potential Business Impact:
Lets computers build programs visually, not just text.
Most large language models (LLMs) today excel at generating raw, sequential code with minimal abstractions and custom structures. However, there has been little work on graph-based abstract code generation, where significant logic is encapsulated in predefined nodes and execution flow is determined by edges. This is relevant for visual programming languages, and in cases where raw source code is inaccessible to users and LLM training sets. In this work, we propose and evaluate JSON representations for graphs to enable high accuracy graph-based abstract code generation. We evaluate these representations on ScratchTest, a mini-benchmark based on our custom Python re-implementation of Scratch, which tests the LLM in code graph space. Our findings demonstrate that LLMs can indeed perform the aforementioned generation task in a single pass without relying on specialized or complex pipelines, given the correct graph representations. We also show that different representations induce significantly different accuracies, highlighting the instrumental role of representations in this generation task. All in all, this work establishes the first steps towards representation learning for graph-based abstract code generation.
Similar Papers
On Code-Induced Reasoning in LLMs
Computation and Language
Code's structure helps computers think better than its meaning.
Code Evolution Graphs: Understanding Large Language Model Driven Design of Algorithms
Neural and Evolutionary Computing
Helps computers understand how they write code.
Accurate and Consistent Graph Model Generation from Text with Large Language Models
Software Engineering
Makes computer drawings follow rules better.