Signature-Informed Selection Detection: A Novel Method for Multi-Locus Temporal Population Genetic Model with Recombination
By: Ritabrata Dutta , Yuehao Xu , Sherman Khoo and more
In population genetics, there is often interest in inferring selection coefficients. This task becomes more challenging if multiple linked selected loci are considered simultaneously. For such a situation, we propose a novel generalized Bayesian framework where we compute a scoring rule posterior for the selection coefficients in multi-locus temporal population genetics models. As we consider trajectories of allele frequencies over time as our data, we choose to use a signature kernel scoring rule - a kernel scoring rule defined for high-dimensional time-series data using iterated path integrals of a path (called signatures). We can compute an unbiased estimate of the signature kernel score using model simulations. This enables us to sample asymptotically from the signature kernel scoring rule posterior of the selection coefficients using pseudo-marginal MCMC-type algorithms. Through a simulation study, we were able to show the inferential efficacy of our method compared to existing benchmark methods for two and three selected locus scenarios under the standard Wright-Fisher model with recombination and selection. We also consider a negative frequency-dependent selection model for one and two locus scenarios, and also joint inference of selection coefficients and initial haplotype frequencies under the standard Wright-Fisher model. Finally, we illustrate the application of our inferential method for two real-life dataset. More specifically, we consider a data set on Yeast, as well as data from an Evolve and Resequence (E\&R) experiment on {\em Drosophila simulans}.
Similar Papers
Extending a Phylogeny-based Method for Detecting Signatures of Multi-level Selection for Applications in Artificial Life
Populations and Evolution
Finds when selfish choices hurt the group.
Simulation-based Methods for Optimal Sampling Design in Systems Biology
Machine Learning (Stat)
Finds best times to test sick cells.
Investigating new, signature-based, spatial autoregressive models for functional covariates
Methodology
Finds health risks faster than old ways.