Tile-Based ViT Inference with Visual-Cluster Priors for Zero-Shot Multi-Species Plant Identification
By: Murilo Gustineli , Anthony Miyaguchi , Adrian Cheung and more
Potential Business Impact:
Helps computers identify plants from pictures.
We describe DS@GT's second-place solution to the PlantCLEF 2025 challenge on multi-species plant identification in vegetation quadrat images. Our pipeline combines (i) a fine-tuned Vision Transformer ViTD2PC24All for patch-level inference, (ii) a 4x4 tiling strategy that aligns patch size with the network's 518x518 receptive field, and (iii) domain-prior adaptation through PaCMAP + K-Means visual clustering and geolocation filtering. Tile predictions are aggregated by majority vote and re-weighted with cluster-specific Bayesian priors, yielding a macro-averaged F1 of 0.348 (private leaderboard) while requiring no additional training. All code, configuration files, and reproducibility scripts are publicly available at https://github.com/dsgt-arc/plantclef-2025.
Similar Papers
Multi-Label Plant Species Prediction with Metadata-Enhanced Multi-Head Vision Transformers
CV and Pattern Recognition
Helps computers identify many plants in one picture.
Overview of PlantCLEF 2025: Multi-Species Plant Identification in Vegetation Quadrat Images
CV and Pattern Recognition
Helps computers identify plants in nature photos.
Transfer Learning and Mixup for Fine-Grained Few-Shot Fungi Classification
CV and Pattern Recognition
Helps computers tell different mushrooms apart.