Causal integration of chemical structures improves representations of microscopy images for morphological profiling
By: Yemin Yu , Neil Tenenholtz , Lester Mackey and more
Potential Business Impact:
Helps find drug effects by linking cell pictures and drug shapes.
Recent advances in self-supervised deep learning have improved our ability to quantify cellular morphological changes in high-throughput microscopy screens, a process known as morphological profiling. However, most current methods only learn from images, despite many screens being inherently multimodal, as they involve both a chemical or genetic perturbation as well as an image-based readout. We hypothesized that incorporating chemical compound structure during self-supervised pre-training could improve learned representations of images in high-throughput microscopy screens. We introduce a representation learning framework, MICON (Molecular-Image Contrastive Learning), that models chemical compounds as treatments that induce counterfactual transformations of cell phenotypes. MICON significantly outperforms classical hand-crafted features such as CellProfiler and existing deep-learning-based representation learning methods in challenging evaluation settings where models must identify reproducible effects of drugs across independent replicates and data-generating centers. We demonstrate that incorporating chemical compound information into the learning process provides consistent improvements in our evaluation setting and that modeling compounds specifically as treatments in a causal framework outperforms approaches that directly align images and compounds in a single representation space. Our findings point to a new direction for representation learning in morphological profiling, suggesting that methods should explicitly account for the multimodal nature of microscopy screening data.
Similar Papers
Learning Cell-Aware Hierarchical Multi-Modal Representations for Robust Molecular Modeling
Machine Learning (CS)
Predicts drug effects better by looking at cells.
Combining GCN Structural Learning with LLM Chemical Knowledge for Enhanced Virtual Screening
Machine Learning (CS)
Finds new medicines faster by understanding molecule shapes.
Curiosity Driven Exploration to Optimize Structure-Property Learning in Microscopy
Materials Science
Finds new materials faster by guessing where to look.