Score: 0

Generalization and Feature Attribution in Machine Learning Models for Crop Yield and Anomaly Prediction in Germany

Published: December 17, 2025 | arXiv ID: 2512.15140v1

By: Roland Baatz

This study examines the generalization performance and interpretability of machine learning (ML) models used for predicting crop yield and yield anomalies in Germany's NUTS-3 regions. Using a high-quality, long-term dataset, the study systematically compares the evaluation and temporal validation behavior of ensemble tree-based models (XGBoost, Random Forest) and deep learning approaches (LSTM, TCN). While all models perform well on spatially split, conventional test sets, their performance degrades substantially on temporally independent validation years, revealing persistent limitations in generalization. Notably, models with strong test-set accuracy, but weak temporal validation performance can still produce seemingly credible SHAP feature importance values. This exposes a critical vulnerability in post hoc explainability methods: interpretability may appear reliable even when the underlying model fails to generalize. These findings underscore the need for validation-aware interpretation of ML predictions in agricultural and environmental systems. Feature importance should not be accepted at face value unless models are explicitly shown to generalize to unseen temporal and spatial conditions. The study advocates for domain-aware validation, hybrid modeling strategies, and more rigorous scrutiny of explainability methods in data-driven agriculture. Ultimately, this work addresses a growing challenge in environmental data science: how can we evaluate generalization robustly enough to trust model explanations?

Intrinsic Explainability of Multimodal Learning for Crop Yield Prediction

Artificial Intelligence

Helps farmers predict crop growth using different data.

9 Aug 2025 0

88%

Faithful and Interpretable Explanations for Complex Ensemble Time Series Forecasts using Surrogate Models and Forecastability Analysis

Machine Learning (CS)

Shows why computer predictions are right or wrong.

9 Oct 2025 3

87%

Exploring Machine Learning, Deep Learning, and Explainable AI Methods for Seasonal Precipitation Prediction in South America

Machine Learning (CS)

Predicts rain better using smart computer programs.

15 Dec 2025 0

View PDF Login to Bookmark

Generalization and Feature Attribution in Machine Learning Models for Crop Yield and Anomaly Prediction in Germany

Technical Abstract

Intrinsic Explainability of Multimodal Learning for Crop Yield Prediction

Faithful and Interpretable Explanations for Complex Ensemble Time Series Forecasts using Surrogate Models and Forecastability Analysis

Exploring Machine Learning, Deep Learning, and Explainable AI Methods for Seasonal Precipitation Prediction in South America