Efficient Chromosome Parallelization for Precision Medicine Genomic Workflows
By: Daniel Mas Montserrat , Ray Verma , Míriam Barrabés and more
Potential Business Impact:
Makes DNA analysis faster and uses less computer memory.
Large-scale genomic workflows used in precision medicine can process datasets spanning tens to hundreds of gigabytes per sample, leading to high memory spikes, intensive disk I/O, and task failures due to out-of-memory errors. Simple static resource allocation methods struggle to handle the variability in per-chromosome RAM demands, resulting in poor resource utilization and long runtimes. In this work, we propose multiple mechanisms for adaptive, RAM-efficient parallelization of chromosome-level bioinformatics workflows. First, we develop a symbolic regression model that estimates per-chromosome memory consumption for a given task and introduces an interpolating bias to conservatively minimize over-allocation. Second, we present a dynamic scheduler that adaptively predicts RAM usage with a polynomial regression model, treating task packing as a Knapsack problem to optimally batch jobs based on predicted memory requirements. Additionally, we present a static scheduler that optimizes chromosome processing order to minimize peak memory while preserving throughput. Our proposed methods, evaluated on simulations and real-world genomic pipelines, provide new mechanisms to reduce memory overruns and balance load across threads. We thereby achieve faster end-to-end execution, showcasing the potential to optimize large-scale genomic workflows.
Similar Papers
Processing-in-memory for genomics workloads
Genomics
Reads DNA faster, using less power.
Parallelizing Drug Discovery: HPC Pipelines for Alzheimer's Molecular Docking and Simulation
Distributed, Parallel, and Cluster Computing
Finds new medicines for brain diseases faster.
Optimizing the Variant Calling Pipeline Execution on Human Genomes Using GPU-Enabled Machines
Distributed, Parallel, and Cluster Computing
Speeds up finding gene differences in DNA.