Efficient Data Selection for Training Genomic Perturbation Models
By: George Panagopoulos , Johannes F. Lutzeyer , Sofiane Ennadir and more
Potential Business Impact:
Finds best gene changes faster, more reliably.
Genomic studies, including CRISPR-based Perturb-seq analyses, face a vast hypothesis space, while gene perturbations remain costly and time-consuming. Gene perturbation models based on graph neural networks are trained to predict the outcomes of gene perturbations to facilitate such experiments. Due to the cost of genomic experiments, active learning is often employed to train these models, alternating between wet-lab experiments and model updates. However, the operational constraints of the wet-lab and the iterative nature of active learning significantly increase the total training time. Furthermore, the inherent sensitivity to model initialization can lead to markedly different sets of gene perturbations across runs, which undermines the reproducibility, interpretability, and reusability of the method. To this end, we propose a graph-based data filtering method that, unlike active learning, selects the gene perturbations in one shot and in a model-free manner. The method optimizes a criterion that maximizes the supervision signal from the graph neural network to enhance generalization. The criterion is defined over the input graph and is optimized with submodular maximization. We compare it empirically to active learning, and the results demonstrate that despite yielding months of acceleration, it also improves the stability of the selected perturbation experiments while achieving comparable test error.
Similar Papers
GRAPE: Heterogeneous Graph Representation Learning for Genetic Perturbation with Coding and Non-Coding Biotype
Quantitative Methods
Finds important genes before experiments start.
Contextualizing biological perturbation experiments through language
Artificial Intelligence
Helps scientists understand how cells change.
Modeling Gene Expression Distributional Shifts for Unseen Genetic Perturbations
Genomics
Predicts how genes change to find new medicines.