AgMMU: A Comprehensive Agricultural Multimodal Understanding Benchmark
By: Aruna Gauba , Irene Pi , Yunze Man and more
Potential Business Impact:
Helps AI understand farming questions better.
We present AgMMU, a challenging real-world benchmark for evaluating and advancing vision-language models (VLMs) in the knowledge-intensive domain of agriculture. Unlike prior datasets that rely on crowdsourced prompts, AgMMU is distilled from 116,231 authentic dialogues between everyday growers and USDA-authorized Cooperative Extension experts. Through a three-stage pipeline: automated knowledge extraction, QA generation, and human verification, we construct (i) AgMMU, an evaluation set of 746 multiple-choice questions (MCQs) and 746 open-ended questions (OEQs), and (ii) AgBase, a development corpus of 57,079 multimodal facts covering five high-stakes agricultural topics: insect identification, species identification, disease categorization, symptom description, and management instruction. Benchmarking 12 leading VLMs reveals pronounced gaps in fine-grained perception and factual grounding. Open-sourced models trail after proprietary ones by a wide margin. Simple fine-tuning on AgBase boosts open-sourced model performance on challenging OEQs for up to 11.6% on average, narrowing this gap and also motivating future research to propose better strategies in knowledge extraction and distillation from AgBase. We hope AgMMU stimulates research on domain-specific knowledge integration and trustworthy decision support in agriculture AI development.
Similar Papers
Can Large Multimodal Models Understand Agricultural Scenes? Benchmarking with AgroMind
CV and Pattern Recognition
Helps computers understand farm fields better.
A Multimodal Conversational Assistant for the Characterization of Agricultural Plots from Geospatial Open Data
Artificial Intelligence
Lets farmers ask questions about crops using normal words.
AgroBench: Vision-Language Model Benchmark in Agriculture
CV and Pattern Recognition
Helps AI tell sick plants from healthy ones.