Score: 0

Efficient Chromosome Parallelization for Precision Medicine Genomic Workflows

Published: November 20, 2025 | arXiv ID: 2511.15977v1

By: Daniel Mas Montserrat , Ray Verma , Míriam Barrabés and more

Potential Business Impact:

Makes DNA analysis faster and uses less computer memory.

Business Areas:

Bioinformatics Biotechnology, Data and Analytics, Science and Engineering

Large-scale genomic workflows used in precision medicine can process datasets spanning tens to hundreds of gigabytes per sample, leading to high memory spikes, intensive disk I/O, and task failures due to out-of-memory errors. Simple static resource allocation methods struggle to handle the variability in per-chromosome RAM demands, resulting in poor resource utilization and long runtimes. In this work, we propose multiple mechanisms for adaptive, RAM-efficient parallelization of chromosome-level bioinformatics workflows. First, we develop a symbolic regression model that estimates per-chromosome memory consumption for a given task and introduces an interpolating bias to conservatively minimize over-allocation. Second, we present a dynamic scheduler that adaptively predicts RAM usage with a polynomial regression model, treating task packing as a Knapsack problem to optimally batch jobs based on predicted memory requirements. Additionally, we present a static scheduler that optimizes chromosome processing order to minimize peak memory while preserving throughput. Our proposed methods, evaluated on simulations and real-world genomic pipelines, provide new mechanisms to reduce memory overruns and balance load across threads. We thereby achieve faster end-to-end execution, showcasing the potential to optimize large-scale genomic workflows.

Processing-in-memory for genomics workloads

Genomics

Reads DNA faster, using less power.

31 May 2025 1

85%

Parallelizing Drug Discovery: HPC Pipelines for Alzheimer's Molecular Docking and Simulation

Distributed, Parallel, and Cluster Computing

Finds new medicines for brain diseases faster.

31 Aug 2025 0

85%

Optimizing the Variant Calling Pipeline Execution on Human Genomes Using GPU-Enabled Machines

Distributed, Parallel, and Cluster Computing

Speeds up finding gene differences in DNA.

10 Sep 2025 2

View PDF Login to Bookmark

Page Count

9 pages

Efficient Chromosome Parallelization for Precision Medicine Genomic Workflows

Makes DNA analysis faster and uses less computer memory.

Technical Abstract

Processing-in-memory for genomics workloads

Parallelizing Drug Discovery: HPC Pipelines for Alzheimer's Molecular Docking and Simulation

Optimizing the Variant Calling Pipeline Execution on Human Genomes Using GPU-Enabled Machines