Integration of genotypic, hyperspectral, and phenotypic data to improve biomass yield prediction in hybrid rye

  • PDF / 1,291,022 Bytes
  • 15 Pages / 595.276 x 790.866 pts Page_size
  • 20 Downloads / 198 Views

DOWNLOAD

REPORT


ORIGINAL ARTICLE

Integration of genotypic, hyperspectral, and phenotypic data to improve biomass yield prediction in hybrid rye Rodrigo José Galán1 · Angela‑Maria Bernal‑Vasquez2   · Christian Jebsen2 · Hans‑Peter Piepho3   · Patrick Thorwarth1,2   · Philipp Steffan4 · Andres Gordillo4 · Thomas Miedaner1  Received: 5 February 2020 / Accepted: 3 July 2020 © The Author(s) 2020

Abstract Key message  Hyperspectral and genomic data are effective predictors of biomass yield in winter rye. Variable selection procedures can improve the informativeness of reflectance data. Abstract  Integrating cutting-edge technologies is imperative to sustainably breed crops for a growing global population. To predict dry matter yield (DMY) in winter rye (Secale cereale L.), we tested single-kernel models based on genomic (GBLUP) and hyperspectral reflectance-derived (HBLUP) relationship matrices, a multi-kernel model combining both matrices and a bivariate model fitted with plant height as a secondary trait. In total, 274 elite rye lines were genotyped using a 10 k-SNP array and phenotyped as testcrosses for DMY and plant height at four locations in Germany in two years (eight environments). Spectral data consisted of 400 discrete narrow bands ranging between 410 and 993 nm collected by an unmanned aerial vehicle (UAV) on two dates on each environment. To reduce data dimensionality, variable selection of bands was performed, resulting in the least absolute shrinkage and selection operator (Lasso) as the best method in terms of predictive abilities. The mean heritability of reflectance data was moderate ( h2 = 0.72) and highly variable across the spectrum. Correlations between DMY and single bands were generally significant (p  n as in GS, regularization (penalized) models have shown to be suitable for incorporating thousands of predictors, including several unrelated to the trait of interest, or highly intercorrelated (Ogutu et al. 2012). A similar situation may be expected when analyzing hyperspectral data collected in several environments and on several dates. To reduce multicollinearity, increase prediction accuracy, minimize calculation time, and extract the most informative features, regularization methods such as the elastic net (Zou and Hastie 2005) or the least absolute shrinkage and selection operator (Lasso; Tibshirani 1996) are also preferred for facing high-dimensional spectral data (Liu and Li 2017). Alternatively, Krause et al. (2019) found that deriving relationship matrices from hyperspectral data was a suitable approach to integrate whole-spectrum reflectance

Theoretical and Applied Genetics

information into multi-kernel GS for predicting GY in wheat within multi-environment field trials. Multivariate models integrating correlated traits have demonstrated to be more precise than univariate models in GS (Jia and Jannink 2012). In wheat, for instance, GS prediction ability of GY was significantly enhanced by fitting traits derived from hyperspectral data (Sun et al. 2019; Rutkoski et al. 2016; Crain et al. 2018). Sim