VitaGraph: Building a Knowledge Graph for Biologically Relevant Learning Tasks
By: Francesco Madeddu , Lucia Testa , Gianluca De Carlo and more
Potential Business Impact:
Helps find new medicines by connecting biology facts.
The intrinsic complexity of human biology presents ongoing challenges to scientific understanding. Researchers collaborate across disciplines to expand our knowledge of the biological interactions that define human life. AI methodologies have emerged as powerful tools across scientific domains, particularly in computational biology, where graph data structures effectively model biological entities such as protein-protein interaction (PPI) networks and gene functional networks. Those networks are used as datasets for paramount network medicine tasks, such as gene-disease association prediction, drug repurposing, and polypharmacy side effect studies. Reliable predictions from machine learning models require high-quality foundational data. In this work, we present a comprehensive multi-purpose biological knowledge graph constructed by integrating and refining multiple publicly available datasets. Building upon the Drug Repurposing Knowledge Graph (DRKG), we define a pipeline tasked with a) cleaning inconsistencies and redundancies present in DRKG, b) coalescing information from the main available public data sources, and c) enriching the graph nodes with expressive feature vectors such as molecular fingerprints and gene ontologies. Biologically and chemically relevant features improve the capacity of machine learning models to generate accurate and well-structured embedding spaces. The resulting resource represents a coherent and reliable biological knowledge graph that serves as a state-of-the-art platform to advance research in computational biology and precision medicine. Moreover, it offers the opportunity to benchmark graph-based machine learning and network medicine models on relevant tasks. We demonstrate the effectiveness of the proposed dataset by benchmarking it against the task of drug repurposing, PPI prediction, and side-effect prediction, modeled as link prediction problems.
Similar Papers
Rewarding Explainability in Drug Repurposing with Knowledge Graphs
Artificial Intelligence
Finds new uses for old medicines.
RNA-KG v2.0: An RNA-centered Knowledge Graph with Properties
Databases
Finds new RNA connections with context.
Causal knowledge graph analysis identifies adverse drug effects
Artificial Intelligence
Finds new drug side effects by connecting medical facts.