SAE-RNA: A Sparse Autoencoder Model for Interpreting RNA Language Model Representations
By: Taehan Kim, Sangdae Nam
Potential Business Impact:
Finds hidden patterns in RNA for new discoveries.
Deep learning, particularly with the advancement of Large Language Models, has transformed biomolecular modeling, with protein advances (e.g., ESM) inspiring emerging RNA language models such as RiNALMo. Yet how and what these RNA Language Models internally encode about messenger RNA (mRNA) or non-coding RNA (ncRNA) families remains unclear. We present SAE- RNA, interpretability model that analyzes RiNALMo representations and maps them to known human-level biological features. Our work frames RNA interpretability as concept discovery in pretrained embeddings, without end-to-end retraining, and provides practical tools to probe what RNA LMs may encode about ncRNA families. The model can be extended to close comparisons between RNA groups, and supporting hypothesis generation about previously unrecognized relationships.
Similar Papers
A Comparative Review of RNA Language Models
Biomolecules
Helps understand RNA's job and shape better.
From Sentences to Sequences: Rethinking Languages in Biological System
Biomolecules
Helps understand how body parts fold using language.
Contrastive Learning Enhances Language Model Based Cell Embeddings for Low-Sample Single Cell Transcriptomics
Genomics
Finds rare cell types for disease research.