Score: 1

Bridging Large Language Models and Single-Cell Transcriptomics in Dissecting Selective Motor Neuron Vulnerability

Published: May 12, 2025 | arXiv ID: 2505.07896v1

By: Douglas Jiang , Zilin Dai , Luxuan Zhang and more

Potential Business Impact:

Helps scientists understand what cells are doing.

Business Areas:

Bioinformatics Biotechnology, Data and Analytics, Science and Engineering

Understanding cell identity and function through single-cell level sequencing data remains a key challenge in computational biology. We present a novel framework that leverages gene-specific textual annotations from the NCBI Gene database to generate biologically contextualized cell embeddings. For each cell in a single-cell RNA sequencing (scRNA-seq) dataset, we rank genes by expression level, retrieve their NCBI Gene descriptions, and transform these descriptions into vector embedding representations using large language models (LLMs). The models used include OpenAI text-embedding-ada-002, text-embedding-3-small, and text-embedding-3-large (Jan 2024), as well as domain-specific models BioBERT and SciBERT. Embeddings are computed via an expression-weighted average across the top N most highly expressed genes in each cell, providing a compact, semantically rich representation. This multimodal strategy bridges structured biological data with state-of-the-art language modeling, enabling more interpretable downstream applications such as cell-type clustering, cell vulnerability dissection, and trajectory inference.

Contrastive Learning Enhances Language Model Based Cell Embeddings for Low-Sample Single Cell Transcriptomics

Genomics

Finds rare cell types for disease research.

28 Sep 2025 0

91%

Cell2Text: Multimodal LLM for Generating Single-Cell Descriptions from RNA-Seq Data

Machine Learning (CS)

Explains what cells are doing in plain English.

29 Sep 2025 1

90%

Language-Enhanced Representation Learning for Single-Cell Transcriptomics

Machine Learning (CS)

Helps understand cells by combining gene data and text.

12 Mar 2025 1

View PDF Login to Bookmark

Page Count

11 pages

Bridging Large Language Models and Single-Cell Transcriptomics in Dissecting Selective Motor Neuron Vulnerability

Helps scientists understand what cells are doing.

Technical Abstract

Contrastive Learning Enhances Language Model Based Cell Embeddings for Low-Sample Single Cell Transcriptomics

Cell2Text: Multimodal LLM for Generating Single-Cell Descriptions from RNA-Seq Data

Language-Enhanced Representation Learning for Single-Cell Transcriptomics