A U-Statistic-based random forest approach for genetic interaction study
By: Ming Li , Ruo-Sin Peng , Changshuai Wei and more
Potential Business Impact:
Finds hidden gene links to health problems.
Variations in complex traits are influenced by multiple genetic variants, environmental risk factors, and their interactions. Though substantial progress has been made in identifying single genetic variants associated with complex traits, detecting the gene-gene and gene-environment interactions remains a great challenge. When a large number of genetic variants and environmental risk factors are involved, searching for interactions is limited to pair-wise interactions due to the exponentially increased feature space and computational intensity. Alternatively, recursive partitioning approaches, such as random forests, have gained popularity in high-dimensional genetic association studies. In this article, we propose a U-Statistic-based random forest approach, referred to as Forest U-Test, for genetic association studies with quantitative traits. Through simulation studies, we showed that the Forest U-Test outperformed existing methods. The proposed method was also applied to study Cannabis Dependence CD, using three independent datasets from the Study of Addiction: Genetics and Environment. A significant joint association was detected with an empirical p-value less than 0.001. The finding was also replicated in two independent datasets with p-values of 5.93e-19 and 4.70e-17, respectively.
Similar Papers
A statistical framework for comparing epidemic forests
Quantitative Methods
Compares how diseases spread to find best cures.
Detecting gene-environment interactions to guide personalized intervention: boosting distributional regression for polygenic scores
Applications
Finds who benefits most from medicine or lifestyle changes.
A Generalized Genetic Random Field Method for the Genetic Association Analysis of Sequencing Data
Methodology
Finds hidden gene links to diseases.