Score: 0

ChromFound: Towards A Universal Foundation Model for Single-Cell Chromatin Accessibility Data

Published: May 19, 2025 | arXiv ID: 2505.12638v2

By: Yifeng Jiao , Yuchen Liu , Yu Zhang and more

Potential Business Impact:

Finds hidden gene links to understand diseases.

Business Areas:
Bioinformatics Biotechnology, Data and Analytics, Science and Engineering

The advent of single-cell Assay for Transposase-Accessible Chromatin using sequencing (scATAC-seq) offers an innovative perspective for deciphering regulatory mechanisms by assembling a vast repository of single-cell chromatin accessibility data. While foundation models have achieved significant success in single-cell transcriptomics, there is currently no foundation model for scATAC-seq that supports zero-shot high-quality cell identification and comprehensive multi-omics analysis simultaneously. Key challenges lie in the high dimensionality and sparsity of scATAC-seq data, as well as the lack of a standardized schema for representing open chromatin regions (OCRs). Here, we present ChromFound, a foundation model tailored for scATAC-seq. ChromFound utilizes a hybrid architecture and genome-aware tokenization to effectively capture genome-wide long contexts and regulatory signals from dynamic chromatin landscapes. Pretrained on 1.97 million cells from 30 tissues and 6 disease conditions, ChromFound demonstrates broad applicability across 6 diverse tasks. Notably, it achieves robust zero-shot performance in generating universal cell representations and exhibits excellent transferability in cell type annotation and cross-omics prediction. By uncovering enhancer-gene links undetected by existing computational methods, ChromFound offers a promising framework for understanding disease risk variants in the noncoding genome.

Country of Origin
🇨🇳 China

Page Count
28 pages

Category
Quantitative Biology:
Genomics