Assessing the Feasibility of Early Cancer Detection Using Routine Laboratory Data: An Evaluation of Machine Learning Approaches on an Imbalanced Dataset
By: Shumin Li
Potential Business Impact:
Finds early dog cancer signs in blood tests.
The development of accessible screening tools for early cancer detection in dogs represents a significant challenge in veterinary medicine. Routine laboratory data offer a promising, low-cost source for such tools, but their utility is hampered by the non-specificity of individual biomarkers and the severe class imbalance inherent in screening populations. This study assesses the feasibility of cancer risk classification using the Golden Retriever Lifetime Study (GRLS) cohort under real-world constraints, including the grouping of diverse cancer types and the inclusion of post-diagnosis samples. A comprehensive benchmark evaluation was conducted, systematically comparing 126 analytical pipelines that comprised various machine learning models, feature selection methods, and data balancing techniques. Data were partitioned at the patient level to prevent leakage. The optimal model, a Logistic Regression classifier with class weighting and recursive feature elimination, demonstrated moderate ranking ability (AUROC = 0.815; 95% CI: 0.793-0.836) but poor clinical classification performance (F1-score = 0.25, Positive Predictive Value = 0.15). While a high Negative Predictive Value (0.98) was achieved, insufficient recall (0.79) precludes its use as a reliable rule-out test. Interpretability analysis with SHapley Additive exPlanations (SHAP) revealed that predictions were driven by non-specific features like age and markers of inflammation and anemia. It is concluded that while a statistically detectable cancer signal exists in routine lab data, it is too weak and confounded for clinically reliable discrimination from normal aging or other inflammatory conditions. This work establishes a critical performance ceiling for this data modality in isolation and underscores that meaningful progress in computational veterinary oncology will require integration of multi-modal data sources.
Similar Papers
Improving Oral Cancer Outcomes Through Machine Learning and Dimensionality Reduction
Machine Learning (CS)
Helps doctors find mouth cancer early.
On Aligning Prediction Models with Clinical Experiential Learning: A Prostate Cancer Case Study
Machine Learning (CS)
Makes AI doctors understand patient needs better.
Deep Learning-Based Multiclass Classification of Oral Lesions with Stratified Augmentation
CV and Pattern Recognition
Helps doctors spot mouth cancer early.