BioLangFusion: Multimodal Fusion of DNA, mRNA, and Protein Language Models
By: Amina Mollaysa , Artem Moskale , Pushpak Pati and more
Potential Business Impact:
Reads DNA, RNA, and protein together to find cures.
We present BioLangFusion, a simple approach for integrating pre-trained DNA, mRNA, and protein language models into unified molecular representations. Motivated by the central dogma of molecular biology (information flow from gene to transcript to protein), we align per-modality embeddings at the biologically meaningful codon level (three nucleotides encoding one amino acid) to ensure direct cross-modal correspondence. BioLangFusion studies three standard fusion techniques: (i) codon-level embedding concatenation, (ii) entropy-regularized attention pooling inspired by multiple-instance learning, and (iii) cross-modal multi-head attention -- each technique providing a different inductive bias for combining modality-specific signals. These methods require no additional pre-training or modification of the base models, allowing straightforward integration with existing sequence-based foundation models. Across five molecular property prediction tasks, BioLangFusion outperforms strong unimodal baselines, showing that even simple fusion of pre-trained models can capture complementary multi-omic information with minimal overhead.
Similar Papers
Bidirectional Hierarchical Protein Multi-Modal Representation Learning
Machine Learning (CS)
Helps predict how proteins work by combining two views.
FUSION: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding
CV and Pattern Recognition
Lets computers understand pictures and words together.
CodonMoE: DNA Language Models for mRNA Analyses
Genomics
Makes DNA computers understand RNA code better.