Score: 1

Iterative Distillation for Reward-Guided Fine-Tuning of Diffusion Models in Biomolecular Design

Published: July 1, 2025 | arXiv ID: 2507.00445v2

By: Xingyu Su , Xiner Li , Masatoshi Uehara and more

Potential Business Impact:

Designs new proteins and medicines faster.

Business Areas:
A/B Testing Data and Analytics

We address the problem of fine-tuning diffusion models for reward-guided generation in biomolecular design. While diffusion models have proven highly effective in modeling complex, high-dimensional data distributions, real-world applications often demand more than high-fidelity generation, requiring optimization with respect to potentially non-differentiable reward functions such as physics-based simulation or rewards based on scientific knowledge. Although RL methods have been explored to fine-tune diffusion models for such objectives, they often suffer from instability, low sample efficiency, and mode collapse due to their on-policy nature. In this work, we propose an iterative distillation-based fine-tuning framework that enables diffusion models to optimize for arbitrary reward functions. Our method casts the problem as policy distillation: it collects off-policy data during the roll-in phase, simulates reward-based soft-optimal policies during roll-out, and updates the model by minimizing the KL divergence between the simulated soft-optimal policy and the current model policy. Our off-policy formulation, combined with KL divergence minimization, enhances training stability and sample efficiency compared to existing RL-based methods. Empirical results demonstrate the effectiveness and superior reward optimization of our approach across diverse tasks in protein, small molecule, and regulatory DNA design.

Country of Origin
🇺🇸 United States

Repos / Data Links

Page Count
22 pages

Category
Computer Science:
Machine Learning (CS)