Score: 0

Machine Learning Methods for Small Data and Upstream Bioprocessing Applications: A Comprehensive Review

Published: June 14, 2025 | arXiv ID: 2506.12322v2

By: Johnny Peng , Thanh Tung Khuat , Katarzyna Musial and more

Potential Business Impact:

Helps drug makers use computers with less data.

Business Areas:
Bioinformatics Biotechnology, Data and Analytics, Science and Engineering

Data is crucial for machine learning (ML) applications, yet acquiring large datasets can be costly and time-consuming, especially in complex, resource-intensive fields like biopharmaceuticals. A key process in this industry is upstream bioprocessing, where living cells are cultivated and optimised to produce therapeutic proteins and biologics. The intricate nature of these processes, combined with high resource demands, often limits data collection, resulting in smaller datasets. This comprehensive review explores ML methods designed to address the challenges posed by small data and classifies them into a taxonomy to guide practical applications. Furthermore, each method in the taxonomy was thoroughly analysed, with a detailed discussion of its core concepts and an evaluation of its effectiveness in tackling small data challenges, as demonstrated by application results in the upstream bioprocessing and other related domains. By analysing how these methods tackle small data challenges from different perspectives, this review provides actionable insights, identifies current research gaps, and offers guidance for leveraging ML in data-constrained environments.

Country of Origin
🇦🇺 Australia

Page Count
74 pages

Category
Computer Science:
Machine Learning (CS)