Deep Learning for Predicting Complex Traits in Spring Wheat Breeding Program.

Published on Jan 5, 2021in Frontiers in Plant Science4.402
· DOI :10.3389/FPLS.2020.613325
Karansher S. Sandhu2
Estimated H-index: 2
(WSU: Washington State University),
Dennis N. Lozada1
Estimated H-index: 1
(NMSU: New Mexico State University)
+ 2 AuthorsArron H. Carter19
Estimated H-index: 19
(WSU: Washington State University)
Genomic selection (GS) is transforming the field of plant breeding and implementing models that improve prediction accuracy for complex traits is needed. Analytical methods for complex datasets traditionally used in other disciplines represent an opportunity for improving prediction accuracy in GS. Deep learning (DL) is a branch of machine learning which focuses on densely connected networks using artificial neural networks for training the models. The objective of this research was to evaluate the potential of DL models in the Washington State University spring wheat breeding program. We compared the performance of two DL algorithms, namely multilayer perceptron (MLP) and convolutional neural network (CNN), with ridge regression best linear unbiased predictor (rrBLUP), a commonly used GS model. The dataset consisted of 650 recombinant inbred lines from a spring wheat nested association mapping population planted from 2014-2016 growing seasons. We predicted five different quantitative traits with varying genetic architecture using cross-validations, independent validations, and different sets of SNP markers. Hyperparameters were optimized for DL models by lowering the root mean square in the training set, avoiding model overfitting using dropout and regularization. DL models gave 0 to 5% higher prediction accuracy than rrBLUP model under both cross and independent validations for all five traits used in this study. Furthermore, MLP produces 5% higher prediction accuracy than CNN for grain yield and grain protein content. Altogether, DL approaches obtained better prediction accuracy for each trait, and should be incorporated into a plant breeder’s toolkit for use in large scale breeding programs.
#1Jaafar Abdulridha (UF: University of Florida)H-Index: 9
#2Yiannis Ampatzidis (UF: University of Florida)H-Index: 18
Last. Sri Charan Kakarla (UF: University of Florida)H-Index: 3
view all 4 authors...
In this study hyperspectral imaging (380–1020 nm) and machine learning were utilised to develop a technique for detecting different disease development stages (asymptomatic, early, intermediate, and late disease stage) of powdery mildew (PM) in squash. Data were collected in the laboratory as well as in the field using an unmanned aerial vehicle (UAV). Radial basis function (RBF) was used to discriminate between healthy and diseased plants, and to classify the severity level (disease stage) of a...
10 CitationsSource
#1Hai Wang (Cornell University)H-Index: 13
#2Emre Cimen (Cornell University)H-Index: 3
Last. Edward S. Buckler (Cornell University)H-Index: 100
view all 4 authors...
Our era has witnessed tremendous advances in plant genomics, characterized by an explosion of high-throughput techniques to identify multi-dimensional genome-wide molecular phenotypes at low costs. More importantly, genomics is not merely acquiring molecular phenotypes, but also leveraging powerful data mining tools to predict and explain them. In recent years, deep learning has been found extremely effective in these tasks. This review highlights two prominent questions at the intersection of g...
25 CitationsSource
#1Rostam Abdollahi-Arpanahi (UF: University of Florida)H-Index: 11
#2Daniel Gianola (UW: University of Wisconsin-Madison)H-Index: 83
Last. Francisco Peñagaricano (UF: University of Florida)H-Index: 21
view all 3 authors...
BACKGROUND: Transforming large amounts of genomic data into valuable knowledge for predicting complex traits has been an important challenge for animal and plant breeders. Prediction of complex traits has not escaped the current excitement on machine-learning, including interest in deep learning algorithms such as multilayer perceptrons (MLP) and convolutional neural networks (CNN). The aim of this study was to compare the predictive performance of two deep learning methods (MLP and CNN), two en...
30 CitationsSource
#1Laura M. Zingaretti (CSIC: Spanish National Research Council)H-Index: 5
#2Salvador A. Gezan (UF: University of Florida)H-Index: 22
Last. Miguel Pérez-Enciso (CSIC: Spanish National Research Council)H-Index: 10
view all 8 authors...
Genomic Prediction (GP) is the procedure whereby the genetic merits of untested candidates are predicted using genome wide marker information. Although numerous examples of GP exist in plants and animals, applications to polyploid organisms are still scarce, partly due to limited genome resources and the complexity of this system. Deep Learning (DL) techniques comprise a heterogeneous collection of Machine Learning algorithms that have excelled at many prediction tasks. A potential advantage of ...
21 CitationsSource
#1José Crossa (CIMMYT: International Maize and Wheat Improvement Center)H-Index: 89
#2Johannes W. R. Martini (CIMMYT: International Maize and Wheat Improvement Center)H-Index: 5
Last. Jaime Cuevas (University of Quintana Roo)H-Index: 10
view all 8 authors...
Deep learning (DL) is a promising method for genomic-enabled prediction. However, the implementation of DL is difficult because many hyperparameters (number of hidden layers, number of neurons, learning rate, number of epochs, batch size, etc.) need to be tuned. For this reason, deep kernel methods, which only require defining the number of layers, may be an attractive alternative. Deep kernel methods emulate DL models with a large number of neurons, but are defined by relatively easily computed...
15 CitationsSource
#1Yang Liu (MU: University of Missouri)H-Index: 10
#2Duolin Wang (MU: University of Missouri)H-Index: 6
Last. Dong Xu (MU: University of Missouri)H-Index: 74
view all 6 authors...
Genomic selection uses single-nucleotide polymorphisms (SNPs) to predict quantitative phenotypes for enhancing traits in breeding populations, and it has been widely used to increase breeding efficiency for plants and animals. Existing statistical methods rely on a prior distribution assumption of imputed genotype effects, which may not fit experimental datasets. Emerging deep learning could serve as a powerful machine learning tool to predict quantitative phenotypes without imputation and also ...
18 CitationsSource
#1Jaime Cuevas (University of Quintana Roo)H-Index: 10
#2Osval A. Montesinos-López (University of Colima)H-Index: 18
Last. José Crossa (CIMMYT: International Maize and Wheat Improvement Center)H-Index: 89
view all 9 authors...
Kernel methods are flexible and easy to interpret and have been successfully used in genomic-enabled prediction of various plant species. Kernel methods used in genomic prediction comprise the linear genomic best linear unbiased predictor (GBLUP or GB) kernel, and the Gaussian kernel (GK). In general, these kernels have been used with two statistical models: single-environment and genomic × environment (GE) models. Recently near infrared spectroscopy (NIR) has been used as an inexpensive and non...
17 CitationsSource
#1Dennis N. LozadaH-Index: 9
#2Arron H. CarterH-Index: 19
Incorporating secondary correlated traits collected from high-throughput phenotyping in genomic selection (GS) models for complex traits has been demonstrated to improve accuracy. The prediction ability of different single and multiple trait partial least square (PLS) regression models for grain yield were assessed for winter wheat lines evaluated in US Pacific Northwest environments. Different populations including a diversity panel, F5, and double haploid breeding lines were evaluated in Lind ...
11 CitationsSource
#1Tahani Alkhudaydi (UEA: University of East Anglia)H-Index: 2
#2Daniel Reynolds (Norwich Research Park)H-Index: 9
Last. Beatriz de la Iglesia (UEA: University of East Anglia)H-Index: 11
view all 5 authors...
Wheat is one of the major crops in the world, with a global demand expected to reach 850 million tons by 2050 that is clearly outpacing current supply. The continual pressure to sustain wheat yield due to the world’s growing population under fluctuating climate conditions requires breeders to increase yield and yield stability across environments. We are working to integrate deep learning into field-based phenotypic analysis to assist breeders in this endeavour. We have utilised wheat images col...
8 CitationsSource
#1Miguel Pérez-Enciso (Catalan Institution for Research and Advanced Studies)H-Index: 39
Last. Laura M. Zingaretti (CSIC: Spanish National Research Council)H-Index: 5
view all 2 authors...
Deep learning (DL) has emerged as a powerful tool to make accurate predictions from complex data such as image, text, or video. However, its ability to predict phenotypic values from molecular data is less well studied. Here, we describe the theoretical foundations of DL and provide a generic code that can be easily modified to suit specific needs. DL comprises a wide variety of algorithms which depend on numerous hyperparameters. Careful optimization of hyperparameter values is critical to avoi...
34 CitationsSource
Cited By9
#1Meriem Aoun (WSU: Washington State University)H-Index: 1
#2Arron H. Carter (WSU: Washington State University)H-Index: 19
Last. Craig F. Morris (WSU: Washington State University)H-Index: 60
view all 5 authors...
End-use quality phenotyping is laborious and expensive, thus, testing may not occur until later generations in wheat breeding programs. We investigated the pattern of genotype × environment (G × E) interaction for end-use quality traits in soft white wheat (Triticum aestivum L.) and tested the effectiveness of implementing genomic selection to optimize breeding for these traits. We used a multi-environment unbalanced dataset comprised of 672 breeding lines and cultivars adapted to the Pacific No...
#1Abby Stylianou (SLU: Saint Louis University)H-Index: 1
#2Robert Pless (GW: George Washington University)H-Index: 37
Last. Todd C. MocklerH-Index: 63
view all 4 authors...
We introduce a simple approach to understanding the relationship between single nucleotide polymorphisms (SNPs), or groups of related SNPs, and the phenotypes they control. The pipeline involves training deep convolutional neural networks (CNNs) to differentiate between images of plants with reference and alternate versions of various SNPs, and then using visualization approaches to highlight what the classification networks key on. We demonstrate the capacity of deep CNNs at performing this cla...
#1Kajal Samantara (Centurion University of Technology and Management)H-Index: 1
#2Aalok Shiv (ICAR: Indian Council of Agricultural Research)H-Index: 1
Last. Sourav Ranjan Mohapatra (Forest Research Institute)H-Index: 1
view all 6 authors...
Abstract Erosion of genetic diversity due to excessive breeding applications is a major threat to crop species. Plants should be genetically diverse to cope with repercussions of changing climate. Of late, diversity made available through epigenetic changes now appearing to be a novel source for crop improvement. Epigenetics is a phenomenon that alters heritable gene expression without implicating any variation in the genomic DNA sequences. The mechanism of epigenetics involves three important e...
1 CitationsSource
#2Fei MaH-Index: 9
Last. Changwen DuH-Index: 19
view all 5 authors...
#1Karansher S. Sandhu (WSU: Washington State University)H-Index: 2
#2Meriem Aoun (WSU: Washington State University)H-Index: 1
Last. Arron H. Carter (WSU: Washington State University)H-Index: 19
view all 4 authors...
Breeding for grain yield, biotic and abiotic stress resistance, and end-use quality are important goals of wheat breeding programs. Screening for end-use quality traits is usually secondary to grain yield due to high labor needs, cost of testing, and large seed requirements for phenotyping. Hence, testing is delayed until later stages in the breeding program. Delayed phenotyping results in advancement of inferior end-use quality lines into the program. Genomic selection provides an alternative t...
#1Karansher S. Sandhu (WSU: Washington State University)H-Index: 2
#2Paul D. MihalyovH-Index: 3
Last. Arron H. CarterH-Index: 19
view all 5 authors...
Grain protein content (GPC) is controlled by complex genetic systems and their interactions, and is an important quality determinant for hard spring wheat as it has a positive effect on bread and pasta quality. GPC is variable among genotypes and strongly influenced by environment. Thus, understanding the genetic control of wheat GPC and identifying genotypes with improved stability is an important breeding goal. The objectives of this research were to identify genetic backgrounds with less vari...
1 CitationsSource
#1Merrick Lf (WSU: Washington State University)H-Index: 1
#2Arron H. CarterH-Index: 19
Last. Carter Ah
view all 2 authors...
Traits with a complex unknown genetic architecture are common in breeding programs. However, they pose a challenge for selection due to a combination of complex environmental and pleiotropic effects that impede the ability to create mapping populations to characterize the traits genetic basis. One such trait, seedling emergence of wheat (Triticum aestivum L.) from deep planting, presents a unique opportunity to explore the best method to use and implement GS models to predict a complex trait. 17...
3 CitationsSource
#1Karansher S. Sandhu (WSU: Washington State University)H-Index: 2
#2Patil SsH-Index: 1
Last. Arron H. CarterH-Index: 19
view all 4 authors...
Prediction of breeding values and phenotypes is central to plant breeding and has been revolutionized by the adoption of genomic selection (GS). Use of machine and deep learning algorithms applied to complex traits in plants can improve prediction accuracies in the context of GS. Spectral reflectance indices further provide information about various physiological parameters previously undetectable in plants. This research explores the potential of multi-trait (MT) machine and deep learning model...
2 CitationsSource
#1Zhiwu ZhangH-Index: 37
Last. Ben J. HayesH-Index: 83
view all 10 authors...
#1Karansher S. Sandhu (WSU: Washington State University)H-Index: 2
#2Paul D. MihalyovH-Index: 3
Last. Arron H. Carter (WSU: Washington State University)H-Index: 19
view all 5 authors...
Genomics and high throughput phenomics have the potential to revolutionize the field of wheat (Triticum aestivum L.) breeding. Genomic selection (GS) has been used for predicting various quantitative traits in wheat, especially grain yield. However, there are few GS studies for grain protein content (GPC), which is a crucial quality determinant. Incorporation of secondary correlated traits in GS models has been demonstrated to improve accuracy. The objectives of this research were to compare per...
4 CitationsSource