Language-Enhanced Representation Learning for Single-Cell Transcriptomics
By: Yaorui Shi , Jiaqi Yang , Changhao Nai and more
Potential Business Impact:
Helps understand cells by combining gene data and text.
Single-cell RNA sequencing (scRNA-seq) offers detailed insights into cellular heterogeneity. Recent advancements leverage single-cell large language models (scLLMs) for effective representation learning. These models focus exclusively on transcriptomic data, neglecting complementary biological knowledge from textual descriptions. To overcome this limitation, we propose scMMGPT, a novel multimodal framework designed for language-enhanced representation learning in single-cell transcriptomics. Unlike existing methods, scMMGPT employs robust cell representation extraction, preserving quantitative gene expression data, and introduces an innovative two-stage pre-training strategy combining discriminative precision with generative flexibility. Extensive experiments demonstrate that scMMGPT significantly outperforms unimodal and multimodal baselines across key downstream tasks, including cell annotation and clustering, and exhibits superior generalization in out-of-distribution scenarios.
Similar Papers
Contrastive Learning Enhances Language Model Based Cell Embeddings for Low-Sample Single Cell Transcriptomics
Genomics
Finds rare cell types for disease research.
Bridging Large Language Models and Single-Cell Transcriptomics in Dissecting Selective Motor Neuron Vulnerability
Genomics
Helps scientists understand what cells are doing.
Cell2Text: Multimodal LLM for Generating Single-Cell Descriptions from RNA-Seq Data
Machine Learning (CS)
Explains what cells are doing in plain English.