BarcodeMamba+: Advancing State-Space Models for Fungal Biodiversity Research
By: Tiancheng Gao , Scott C. Lowe , Brendan Furneaux and more
Accurate taxonomic classification from DNA barcodes is a cornerstone of global biodiversity monitoring, yet fungi present extreme challenges due to sparse labelling and long-tailed taxa distributions. Conventional supervised learning methods often falter in this domain, struggling to generalize to unseen species and to capture the hierarchical nature of the data. To address these limitations, we introduce BarcodeMamba+, a foundation model for fungal barcode classification built on a powerful and efficient state-space model architecture. We employ a pretrain and fine-tune paradigm, which utilizes partially labelled data and we demonstrate this is substantially more effective than traditional fully-supervised methods in this data-sparse environment. During fine-tuning, we systematically integrate and evaluate a suite of enhancements--including hierarchical label smoothing, a weighted loss function, and a multi-head output layer from MycoAI--to specifically tackle the challenges of fungal taxonomy. Our experiments show that each of these components yields significant performance gains. On a challenging fungal classification benchmark with distinct taxonomic distribution shifts from the broad training set, our final model outperforms a range of existing methods across all taxonomic levels. Our work provides a powerful new tool for genomics-based biodiversity research and establishes an effective and scalable training paradigm for this challenging domain. Our code is publicly available at https://github.com/bioscan-ml/BarcodeMamba.
Similar Papers
Minkowski-MambaNet: A Point Cloud Framework with Selective State Space Models for Forest Biomass Quantification
CV and Pattern Recognition
Measures tree weight in forests better.
State Space Models for Bioacoustics: A comparative Evaluation with Transformers
Sound
Helps computers identify animal sounds using less power.
Kinetic-Mamba: Mamba-Assisted Predictions of Stiff Chemical Kinetics
Machine Learning (CS)
Predicts how fires burn much faster and better.