Score: 0

Optimizing Prognostic Biomarker Discovery in Pancreatic Cancer Through Hybrid Ensemble Feature Selection and Multi-Omics Data

Published: September 2, 2025 | arXiv ID: 2509.02648v1

By: John Zobolas , Anne-Marie George , Alberto López and more

Potential Business Impact:

Finds key health clues to predict life span.

Business Areas:

Bioinformatics Biotechnology, Data and Analytics, Science and Engineering

Prediction of patient survival using high-dimensional multi-omics data requires systematic feature selection methods that ensure predictive performance, sparsity, and reliability for prognostic biomarker discovery. We developed a hybrid ensemble feature selection (hEFS) approach that combines data subsampling with multiple prognostic models, integrating both embedded and wrapper-based strategies for survival prediction. Omics features are ranked using a voting-theory-inspired aggregation mechanism across models and subsamples, while the optimal number of features is selected via a Pareto front, balancing predictive accuracy and model sparsity without any user-defined thresholds. When applied to multi-omics datasets from three pancreatic cancer cohorts, hEFS identifies significantly fewer and more stable biomarkers compared to the conventional, late-fusion CoxLasso models, while maintaining comparable discrimination performance. Implemented within the open-source mlr3fselect R package, hEFS offers a robust, interpretable, and clinically valuable tool for prognostic modelling and biomarker discovery in high-dimensional survival settings.

HeFS: Helper-Enhanced Feature Selection via Pareto-Optimized Genetic Search

Machine Learning (CS)

Finds hidden clues to make predictions better.

21 Oct 2025 1

87%

Integrated Transcriptomic-proteomic Biomarker Identification for Radiation Response Prediction in Non-small Cell Lung Cancer Cell Lines

Machine Learning (CS)

Finds lung cancer treatments that work best.

27 Nov 2025 0

87%

Improving Omics-Based Classification: The Role of Feature Selection and Synthetic Data Generation

Machine Learning (CS)

Helps doctors find diseases with less patient data.

6 May 2025 0

View PDF Login to Bookmark

Page Count

52 pages

Optimizing Prognostic Biomarker Discovery in Pancreatic Cancer Through Hybrid Ensemble Feature Selection and Multi-Omics Data

Finds key health clues to predict life span.

Technical Abstract

HeFS: Helper-Enhanced Feature Selection via Pareto-Optimized Genetic Search

Integrated Transcriptomic-proteomic Biomarker Identification for Radiation Response Prediction in Non-small Cell Lung Cancer Cell Lines

Improving Omics-Based Classification: The Role of Feature Selection and Synthetic Data Generation