g-DPO: Scalable Preference Optimization for Protein Language Models
By: Constance Ferragu , Jonathan D. Ziegler , Nicolas Deutschmann and more
Potential Business Impact:
Makes protein design computers learn faster.
Direct Preference Optimization (DPO) is an effective approach for aligning protein language models with experimental design goals. However, DPO faces a scalability bottleneck: the number of possible training pairs grows quadratically with the number of labeled sequences, leading to prohibitive training times even for modestly sized datasets. We introduce g-DPO, a framework that (i) uses sequence space clustering to prune redundant pairs while preserving training signal, and (ii) amortizes likelihood computations with group-based approximations. Across three protein engineering tasks, g-DPO maintains in-silico and in-vitro performance that is statistically indistinguishable from standard DPO, while converging 1.8 to 3.7 times faster, with greater gains expected as the size of the dataset increases.
Similar Papers
SGDPO: Self-Guided Direct Preference Optimization for Language Model Alignment
Machine Learning (CS)
Makes AI understand what you like better.
Lightweight Robust Direct Preference Optimization
Machine Learning (CS)
Makes AI learn better from messy human feedback.
Margin Adaptive DPO: Leveraging Reward Model for Granular Control in Preference Optimization
Machine Learning (CS)
Teaches AI to write better by learning from mistakes.