PlantVillageVQA: A Visual Question Answering Dataset for Benchmarking Vision-Language Models in Plant Science
By: Syed Nazmus Sakib , Nafiul Haque , Mohammad Zabed Hossain and more
Potential Business Impact:
Helps farmers know what's wrong with plants.
PlantVillageVQA is a large-scale visual question answering (VQA) dataset derived from the widely used PlantVillage image corpus. It was designed to advance the development and evaluation of vision-language models for agricultural decision-making and analysis. The PlantVillageVQA dataset comprises 193,609 high-quality question-answer (QA) pairs grounded over 55,448 images spanning 14 crop species and 38 disease conditions. Questions are organised into 3 levels of cognitive complexity and 9 distinct categories. Each question category was phrased manually following expert guidance and generated via an automated two-stage pipeline: (1) template-based QA synthesis from image metadata and (2) multi-stage linguistic re-engineering. The dataset was iteratively reviewed by domain experts for scientific accuracy and relevancy. The final dataset was evaluated using three state-of-the-art models for quality assessment. Our objective remains to provide a publicly available, standardised and expert-verified database to enhance diagnostic accuracy for plant disease identifications and advance scientific research in the agricultural domain. Our dataset will be open-sourced at https://huggingface.co/datasets/SyedNazmusSakib/PlantVillageVQA.
Similar Papers
DisasterVQA: A Visual Question Answering Benchmark Dataset for Disaster Scenes
CV and Pattern Recognition
Helps computers understand disaster damage from photos.
Visual question answering: from early developments to recent advances -- a survey
CV and Pattern Recognition
Lets computers answer questions about pictures.
VQ-VA World: Towards High-Quality Visual Question-Visual Answering
CV and Pattern Recognition
Makes computers draw pictures from questions.