Gaussian copula correlation network analysis with application to multi-omics data
By: Ekaterina Tomilina, Florence Jaffrézic, Gildas Mazo
Potential Business Impact:
Finds how genes work together in sickness.
Reconstructing gene regulatory networks from large-scale heterogeneous data is a key challenge in biology. In multi-omics data analysis, networks based on pairwise statistical association measures remain popular, as they are easy to build and understand. In the presence of mixed-type (discrete and continuous) data, however, the choice of good association measures remains an important issue. We propose here a novel approach based on the Gaussian copula, the parameters of which represent the links of the network. Novel properties of the model are obtained to guide the interpretation of the network. To estimate the copula parameters, we calculated a semiparametric pairwise likelihood for mixed data. In an extensive simulation study, we showed that the proposed estimation procedure was able to accurately estimate the copula correlation matrix. The proposed methodology was also applied to a real ICGC dataset on breast cancer, and is implemented in a freely available R package heterocop.
Similar Papers
Gaussian mixture copulas for flexible dependence modelling in the body and tails of joint distributions
Methodology
Predicts pollution risks better by looking at all data.
Modeling Dependence in Omics Association Analysis via Structured Co-Expression Networks to Improve Power and Replicability
Methodology
Finds hidden links in body data to predict health.
Generalized probabilistic canonical correlation analysis for multi-modal data integration with full or partial observations
Machine Learning (Stat)
Combines different data types, even with missing parts.