Soil Texture Prediction with Bayesian Generalized Additive Models for Spatial Compositional Data
By: Joaquín Martínez-Minaya, Lore Zumeta-Olaskoaga, Dae-Jin Lee
Potential Business Impact:
Models soil types using math and maps.
Compositional data (CoDa) plays an important role in many fields such as ecology, geology, or biology. The most widely used modeling approaches are based on the Dirichlet and the logistic-normal formulation under Aitchison geometry. Recent developments in the mathematical field on the simplex geometry allow to express the regression model in terms of coordinates and estimate its coefficients. Once the model is projected in the real space, we can employ a multivariate Gaussian regression to deal with it. However, most existing methods focus on linear models, and there is a lack of flexible alternatives such as additive or spatial models, especially within a Bayesian framework and with practical implementation details. In this work, we present a geoadditive regression model for CoDa from a Bayesian perspective using the brms package in R. The model applies the isometric log-ratio (ilr) transformation and penalized splines to incorporate nonlinear effects. We also propose two new Bayesian goodness-of-fit measures for CoDa regression: BR-CoDa-$R^2$ and BM-CoDa-$R^2$, extending the Bayesian $R^2$ to the compositional setting. These measures, alongside WAIC, support model selection and evaluation. The methodology is validated through simulation studies and applied to predict soil texture composition in the Basque Country. Results demonstrate good performance, interpretable spatial patterns, and reliable quantification of explained variability in compositional outcomes.
Similar Papers
The $α$--regression for compositional data: a unified framework for standard, spatially-lagged, and geographically-weighted regression models
Methodology
Helps scientists understand how parts make up a whole.
Additive Density Regression
Methodology
Helps understand income differences using math.
A novel generalized additive scalar-on-function regression model for partially observed multidimensional functional data: An application to air quality classification
Methodology
Helps computers understand messy data with missing parts.