All You Need Is Synthetic Task Augmentation
By: Guillaume Godin
Potential Business Impact:
Teaches computers to guess molecule traits better.
Injecting rule-based models like Random Forests into differentiable neural network frameworks remains an open challenge in machine learning. Recent advancements have demonstrated that pretrained models can generate efficient molecular embeddings. However, these approaches often require extensive pretraining and additional techniques, such as incorporating posterior probabilities, to boost performance. In our study, we propose a novel strategy that jointly trains a single Graph Transformer neural network on both sparse multitask molecular property experimental targets and synthetic targets derived from XGBoost models trained on Osmordred molecular descriptors. These synthetic tasks serve as independent auxiliary tasks. Our results show consistent and significant performance improvement across all 19 molecular property prediction tasks. For 16 out of 19 targets, the multitask Graph Transformer outperforms the XGBoost single-task learner. This demonstrates that synthetic task augmentation is an effective method for enhancing neural model performance in multitask molecular property prediction without the need for feature injection or pretraining.
Similar Papers
Multitask finetuning and acceleration of chemical pretrained models for small molecule drug property prediction
Machine Learning (CS)
Finds new medicines faster.
Data Fusion of Deep Learned Molecular Embeddings for Property Prediction
Machine Learning (CS)
Improves computer predictions with less data.
Template-Free Retrosynthesis with Graph-Prior Augmented Transformers
Machine Learning (CS)
Helps chemists invent new medicines faster.