Score: 0

TreeDiff: AST-Guided Code Generation with Diffusion LLMs

Published: August 2, 2025 | arXiv ID: 2508.01473v2

By: Yiming Zeng , Jinghan Cao , Zexin Li and more

Potential Business Impact:

Helps computers write correct computer code.

Recent advances in diffusion-based language models have opened new possibilities for controllable and bidirectional sequence generation. These models provide an alternative to traditional autoregressive approaches by framing text generation as an iterative denoising process. However, applying diffusion models to structured domains such as source code remains a significant challenge. Programming languages differ from natural language in that they follow strict syntactic and semantic rules, with hierarchical organization that must be preserved for correctness. Standard token-level corruption techniques used during training often ignore this structure, which may hinder the model's ability to learn meaningful representations of code. To address this limitation, we propose a syntax-aware diffusion framework that incorporates structural priors from Abstract Syntax Trees (ASTs) into the denoising process. Instead of masking individual tokens at random, we selectively corrupt syntactically meaningful code spans derived from AST subtrees. This enables the model to reconstruct programs in a way that respects grammatical boundaries and captures long-range dependencies. Experimental results demonstrate that syntax-aware corruption significantly improves syntactic correctness, reconstruction accuracy, and generalization to unseen code patterns. These findings highlight the potential of incorporating structural information into diffusion-based training and suggest that syntax-guided denoising is a promising direction for advancing diffusion-based language models in code generation tasks.

Syntax-Guided Diffusion Language Models with User-Integrated Personalization

Computation and Language

Writes stories that sound like you wrote them.

1 Oct 2025 0

88%

Beyond Autoregression: An Empirical Study of Diffusion Large Language Models for Code Generation

Software Engineering

Makes computers write code much faster and better.

14 Sep 2025 1

88%

From Denoising to Refining: A Corrective Framework for Vision-Language Diffusion Model

Computation and Language

Fixes computer mistakes when writing stories.

22 Oct 2025 0

View PDF Login to Bookmark

Page Count

10 pages

TreeDiff: AST-Guided Code Generation with Diffusion LLMs

Helps computers write correct computer code.

Technical Abstract

Syntax-Guided Diffusion Language Models with User-Integrated Personalization

Beyond Autoregression: An Empirical Study of Diffusion Large Language Models for Code Generation

From Denoising to Refining: A Corrective Framework for Vision-Language Diffusion Model