Score: 1

Language-Enhanced Representation Learning for Single-Cell Transcriptomics

Published: March 12, 2025 | arXiv ID: 2503.09427v4

By: Yaorui Shi , Jiaqi Yang , Changhao Nai and more

Potential Business Impact:

Helps understand cells by combining gene data and text.

Business Areas:
Bioinformatics Biotechnology, Data and Analytics, Science and Engineering

Single-cell RNA sequencing (scRNA-seq) offers detailed insights into cellular heterogeneity. Recent advancements leverage single-cell large language models (scLLMs) for effective representation learning. These models focus exclusively on transcriptomic data, neglecting complementary biological knowledge from textual descriptions. To overcome this limitation, we propose scMMGPT, a novel multimodal framework designed for language-enhanced representation learning in single-cell transcriptomics. Unlike existing methods, scMMGPT employs robust cell representation extraction, preserving quantitative gene expression data, and introduces an innovative two-stage pre-training strategy combining discriminative precision with generative flexibility. Extensive experiments demonstrate that scMMGPT significantly outperforms unimodal and multimodal baselines across key downstream tasks, including cell annotation and clustering, and exhibits superior generalization in out-of-distribution scenarios.

Country of Origin
πŸ‡ΈπŸ‡¬ Singapore

Repos / Data Links

Page Count
30 pages

Category
Computer Science:
Machine Learning (CS)