Score: 0

A latent factor approach to hyperspectral time series data for multivariate genomic prediction of grain yield in wheat

Published: January 9, 2026 | arXiv ID: 2601.05842v1

By: Jonathan F. Kunst , Killian A. C. Melsen , Willem Kruijer and more

Potential Business Impact:

Helps farmers grow better wheat faster.

Business Areas:
Predictive Analytics Artificial Intelligence, Data and Analytics, Software

High-dimensional time series phenotypic data is becoming increasingly common within plant breeding programmes. However, analysing and integrating such data for genetic analysis and genomic prediction remains difficult. Here we show how factor analysis with Procrustes rotation on the genetic correlation matrix of hyperspectral secondary phenotype data can help in extracting relevant features for within-trial prediction. We use a subset of Centro Internacional de Mejoramiento de Maíz y Trigo (CIMMYT) elite yield wheat trial of 2014-2015, consisting of 1,033 genotypes. These were measured across three irrigation treatments at several timepoints during the season, using manned airplane flights with hyperspectral sensors capturing 62 bands in the spectrum of 385-850 nm. We perform multivariate genomic prediction using latent variables to improve within-trial genomic predictive ability (PA) of wheat grain yield within three distinct watering treatments. By integrating latent variables of the hyperspectral data in a multivariate genomic prediction model, we are able to achieve an absolute gain of .1 to .3 (on the correlation scale) in PA compared to univariate genomic prediction. Furthermore, we show which timepoints within a trial are important and how these relate to plant growth stages. This paper showcases how domain knowledge and data-driven approaches can be combined to increase PA and gain new insights from sensor data of high-throughput phenotyping platforms.

Page Count
20 pages

Category
Statistics:
Applications