Score: 0

BioLangFusion: Multimodal Fusion of DNA, mRNA, and Protein Language Models

Published: June 10, 2025 | arXiv ID: 2506.08936v1

By: Amina Mollaysa , Artem Moskale , Pushpak Pati and more

Potential Business Impact:

Reads DNA, RNA, and protein together to find cures.

Business Areas:

Bioinformatics Biotechnology, Data and Analytics, Science and Engineering

We present BioLangFusion, a simple approach for integrating pre-trained DNA, mRNA, and protein language models into unified molecular representations. Motivated by the central dogma of molecular biology (information flow from gene to transcript to protein), we align per-modality embeddings at the biologically meaningful codon level (three nucleotides encoding one amino acid) to ensure direct cross-modal correspondence. BioLangFusion studies three standard fusion techniques: (i) codon-level embedding concatenation, (ii) entropy-regularized attention pooling inspired by multiple-instance learning, and (iii) cross-modal multi-head attention -- each technique providing a different inductive bias for combining modality-specific signals. These methods require no additional pre-training or modification of the base models, allowing straightforward integration with existing sequence-based foundation models. Across five molecular property prediction tasks, BioLangFusion outperforms strong unimodal baselines, showing that even simple fusion of pre-trained models can capture complementary multi-omic information with minimal overhead.