MitoFREQ: A Novel Approach for Mitogenome Frequency Estimation from Top-level Haplogroups and Single Nucleotide Variants
By: Mikkel Meyer Andersen , Nicole Huber , Kimberly S Andreaggi and more
Lineage marker population frequencies can serve as one way to express evidential value in forensic genetics. However, for high-quality whole mitochondrial DNA genome sequences (mitogenomes), population data remain limited. In this paper, we offer a new method, MitoFREQ, for estimating the population frequencies of mitogenomes. MitoFREQ uses the mitogenome resources HelixMTdb and gnomAD, harbouring information from 195,983 and 56,406 mitogenomes, respectively. Neither HelixMTdb nor gnomAD can be queried directly for individual mitogenome frequencies, but offers single nucleotide variant (SNV) allele frequencies for each of 30 "top-level" haplogroups (TLHG). We propose using the HelixMTdb and gnomAD resources by classifying a given mitogenome within the TLHG scheme and subsequently using the frequency of its rarest SNV within that TLHG weighted by the TLHG frequency. We show that this method is guaranteed to provide a higher population frequency estimate than if a refined haplogroup and its SNV frequencies were used. Further, we show that top-level haplogrouping can be achieved by using only 227 specific positions for 99.9% of the tested mitogenomes, potentially making the method available for low-quality samples. The method was tested on two types of datasets: high-quality forensic reference datasets and a diverse collection of scrutinised mitogenomes from GenBank. This dual evaluation demonstrated that the approach is robust across both curated forensic data and broader population-level sequences. This method produced likelihood ratios in the range of 100-100,000, demonstrating its potential to strengthen the statistical evaluation of forensic mtDNA evidence. We have developed an open-source R package `mitofreq` that implements our method, including a Shiny app where custom TLHG frequencies can be supplied.
Similar Papers
Deep Generative Models of Evolution: SNP-level Population Adaptation by Genomic Linkage Incorporation
Machine Learning (CS)
Helps track how genes change over time.
Bayesian inference from time series of allele frequency data using exact simulation techniques
Populations and Evolution
Tracks how genes change over time.
Algorithms for Reconstructing B Cell Lineages in the Presence of Context-Dependent Somatic Hypermutation
Populations and Evolution
Helps map how diseases change over time.