Score: 0

Automated Statistical and Machine Learning Platform for Biological Research

Published: November 25, 2025 | arXiv ID: 2511.21770v1

By: Luke Rimmo Lego, Samantha Gauthier, Denver Jn. Baptiste

Potential Business Impact:

Helps scientists find new medicines faster.

Business Areas:
Machine Learning Artificial Intelligence, Data and Analytics, Software

Research increasingly relies on computational methods to analyze experimental data and predict molecular properties. Current approaches often require researchers to use a variety of tools for statistical analysis and machine learning, creating workflow inefficiencies. We present an integrated platform that combines classical statistical methods with Random Forest classification for comprehensive data analysis that can be used in the biological sciences. The platform implements automated hyperparameter optimization, feature importance analysis, and a suite of statistical tests including t tests, ANOVA, and Pearson correlation analysis. Our methodology addresses the gap between traditional statistical software, modern machine learning frameworks and biology, by providing a unified interface accessible to researchers without extensive programming experience. The system achieves this through automatic data preprocessing, categorical encoding, and adaptive model configuration based on dataset characteristics. Initial testing protocols are designed to evaluate classification accuracy across diverse chemical datasets with varying feature distributions. This work demonstrates that integrating statistical rigor with machine learning interpretability can accelerate biological discovery workflows while maintaining methodological soundness. The platform's modular architecture enables future extensions to additional machine learning algorithms and statistical procedures relevant to bioinformatics.

Country of Origin
🇺🇸 United States

Page Count
7 pages

Category
Quantitative Biology:
Quantitative Methods