Modeling Gene Expression Distributional Shifts for Unseen Genetic Perturbations
By: Kalyan Ramakrishnan , Jonathan G. Hedley , Sisi Qu and more
Potential Business Impact:
Predicts how genes change to find new medicines.
We train a neural network to predict distributional responses in gene expression following genetic perturbations. This is an essential task in early-stage drug discovery, where such responses can offer insights into gene function and inform target identification. Existing methods only predict changes in the mean expression, overlooking stochasticity inherent in single-cell data. In contrast, we offer a more realistic view of cellular responses by modeling expression distributions. Our model predicts gene-level histograms conditioned on perturbations and outperforms baselines in capturing higher-order statistics, such as variance, skewness, and kurtosis, at a fraction of the training cost. To generalize to unseen perturbations, we incorporate prior knowledge via gene embeddings from large language models (LLMs). While modeling a richer output space, the method remains competitive in predicting mean expression changes. This work offers a practical step towards more expressive and biologically informative models of perturbation effects.
Similar Papers
Representation Learning for Distributional Perturbation Extrapolation
Machine Learning (Stat)
Predicts how cells change with new treatments.
Departures: Distributional Transport for Single-Cell Perturbation Prediction with Neural Schrödinger Bridges
Machine Learning (CS)
Predicts how cells change when medicine is used.
Efficient Data Selection for Training Genomic Perturbation Models
Quantitative Methods
Finds best gene changes faster, more reliably.