Genome-enabled prediction using probabilistic neural network classifiers

Background Multi-layer perceptron (MLP) and radial basis function neural networks (RBFNN) have been shown to be effective in genome-enabled prediction. Here, we evaluated and compared the classification performance of an MLP classifier versus that of a probabilistic neural network (PNN), to predict the probability of membership of one individual in a phenotypic class of interest, using genomic and phenotypic data as input variables. We used 16 maize and 17 wheat genomic and phenotypic datasets with different trait-environment combinations (sample sizes ranged from 290 to 300 individuals) with 1.4 k and 55 k SNP chips. Classifiers were tested using continuous traits that were categorized into three classes (upper, middle and lower) based on the empirical distribution of each trait, constructed on the basis of two percentiles (15–85 % and 30–70 %). We focused on the 15 and 30 % percentiles for the upper and lower classes for selecting the best individuals, as commonly done in genomic selection. Wheat datasets were also used with two classes. The criteria for assessing the predictive accuracy of the two classifiers were the area under the receiver operating characteristic curve (AUC) and the area under the precision-recall curve (AUCpr). Parameters of both classifiers were estimated by optimizing the AUC for a specific class of interest. Results The AUC and AUCpr criteria provided enough evidence to conclude that PNN was more accurate than MLP for assigning maize and wheat lines to the correct upper, middle or lower class for the complex traits analyzed. Results for the wheat datasets with continuous traits split into two and three classes showed that the performance of PNN with three classes was higher than with two classes when classifying individuals into the upper and lower (15 or 30 %) categories. Conclusions The PNN classifier outperformed the MLP classifier in all 33 (maize and wheat) datasets when using AUC and AUCpr for selecting individuals of a specific class. Use of PNN with Gaussian radial basis functions seems promising in genomic selection for identifying the best individuals. Categorizing continuous traits into three classes generally provided better classification than when using two classes, because classification accuracy improved when classes were balanced. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2553-1) contains supplementary material, which is available to authorized users.


Background
Complex traits of economic importance in animal and plant breeding seem to be affected by many quantitative trait loci (QTL), each having a small effect, and are greatly influenced by the environment. Predicting these complex traits using information from dense molecular markers exploits linkage disequilibrium (LD) between molecular markers and QTL. Basically, genomic selection works by capturing realized relationships between individuals and, to an extent, by capturing the effects of QTL via their linkage or LD with markers. Genomic selection (GS) regression models use all available molecular marker and phenotypic data from an observed base (training population) to predict the genetic values of yet unphenotyped candidates for selection (testing population) whose marker genotypes are known.
There is a vast literature describing statistical methods that use different functional forms on markers for predicting genetic values, e.g., [1,2], starting with the seminal work of [3], which proposed regressing phenotypes on all available markers using a Gaussian linear model with different prior distributions on marker effects. Several parametric and semi-parametric methods have been described and used thereafter for genomeenabled prediction in animals and plants [4][5][6][7][8][9][10][11].
The basic quantitative genetic model y i = g i + γ i (i = 1, … n individuals) describes the i th response or phenotype (y i ) expressed as a deviation from some general mean (μ) as the sum of an unknown genetic value (g i ) plus a model residual γ i . The unknown genetic value can be represented as a complex function of genotypes with a large number of genes. However, since the genes affecting a trait are unknown, this complex function can be approximated by a regression of phenotype on marker genotypes where a large number of markers {x i1 , …, x ip } (x ij is the number of copies of one of the two alleles observed in the i th individual at the j th marker) may be used as regressors for predicting the genetic value of the i th individual. Thus, for u(x i ) = u(x i1 , … x ip ), the basic model becomes y i = u i + γ i , where γ i includes errors due to unspecified environmental effects, imperfect linkage disequilibrium between markers and the QTL affecting the trait, and unaccounted gene × gene and gene × environment interactions.
In several applications, u(x i ) is a parametric linear regression with form u(x i1 , … x ip ) = ∑ j = 1 p x ij β j ,, where β j is the substitution effect of the allele coded as 'one' at the j th marker. The linear regression function becomes y i = ∑ j = 1 p x ij β j + γ i . The regression function u(x i ) can also be represented by semi-parametric models, such as reproducing kernel Hilbert space (RKHS) regressions or by different types of neural networks (NN) such as the multilayer perceptron or radial basis functions [5,8,[11][12][13][14]. Several penalized linear regression models and Bayesian shrinkage estimation methods have been applied to genome-enabled prediction [1]. Similarly, regularized machine learning has been used for predicting complex traits [15]. Recently, two-layer feed-forward NN with backpropagation were implemented in various forms using German Fleckvieh and Holstein-Friesian bull data and high prediction accuracies were achieved [16]. Likewise, a multi-layer NN classifier was applied to study genetic diversity in simulated experiments [17].
Nonparametric classification models are a branch of supervised machine learning that has been successfully applied in several fields of knowledge, e.g., text mining, bioinformatics and genomics [18,19]. Particularly in applied genomic breeding programs and depending on the trait under consideration, the objective of classification is to focus on candidates for selection contained in the upper or lower classes of the prediction space. A common classification problem arises when an input marker vector x i ∈ ℝ p is to be assigned to one of S classes by a classifier. The classifier is trained using a set of training pairs (x i , c i ), (i = 1, … n individuals), where c i describes the class label (k) to which x i belongs, (k = 1 … S), where S represents the number of classes. Usually, c i is transformed into a vector c i of dimension S × 1, with 1 in class k and 0 otherwise.
The multi-layer perceptron (MLP) classifier is a typical architecture of feed-forward NN with at least a hidden layer and an output layer, where both layers have nonlinear and differentiable transfer functions. The nonlinear transfer function in the hidden layer enables an NN to act as a universal approximation method. The training process of an MLP for each individual i, with input vector x i and target class c i , typically uses the error backpropagation learning algorithm [20]. This process requires a lot of computational time when the number of input variables is large.
The probabilistic neural network (PNN) was proposed by [21] and is widely used in pattern recognition and classification. PNN classifies an input vector x i into a specific k class such that the specific class has the maximum probability of being a correct assignment. PNN provides an optimum pattern classifier that minimizes the expected risk of wrongly classifying an object, and is a very efficient (in terms of computational time) classification method. The PNN training algorithm is simpler and faster than that of the MLP approach because PNN parameters are estimated directly from the input data and an iterative procedure is not required. Further, PNN guarantees convergence to a Bayes classifier if enough training examples are provided [22]. Several classification methods such as support vector machines and random forests have been applied in GS [23][24][25]. However, despite the apparent advantages of PNN, no PNN classifiers have been applied in GS so far.
The objective of this research was to assess the performance of two NN classifiers, MLP and PNN (based on Gaussian kernels), to select individuals belonging to a specific class of interest (target class). In an applied GS context, the problem should be formulated according to whether the focus is on selecting individuals into the upper, middle or lower classes, depending on the trait under selection. Then the question is how many of the predicted individuals classified in the target class are actually observed in that class. The problem is posed as follows: given an input vector x i of p markers for the i th individual, each individual i in the testing set must be classified in a class of interest of the phenotypic response. Classes were defined considering different percentiles of the target trait, specifically, 15 and 30 % for the upper and lower classes were analyzed.

Methods
This section has four parts: first the two datasets are described; second, the strategy for categorizing the datasets is explained; third, the multilayer perceptron neural network (MLP) and probabilistic neural network (PNN) are described, and finally, the criteria used to assess model accuracy for classifying the best individuals based on genomic information are described.

Maize datasets
The maize datasets include 16 trait-environment combinations measured on 300 tropical lines genotyped with 55,000 SNPs each; these datasets were previously used by [8]. Four datasets contain information on the complex trait grain yield (GY) evaluated under severe drought stress (GY-SS) and well-watered conditions (GY-WW), and in high yielding (GY-HI) and low yielding (GY-LO) environments. Another six datasets include information on days to anthesis or male flowering (MFL), on days to silking or female flowering (FFL), and the MFL to FFL interval (ASI) evaluated under severe drought stress (SS) and well-watered (WW) environments. The remaining six datasets contain information on gray leaf spot (GLS) resistance evaluated in six CIM-MYT international trials (GLS-1 to GLS-6). The number of individuals and the type and number of markers are presented in Table 1; for further details, see [8].

Wheat datasets
These datasets include 306 wheat lines from the CIM-MYT Global Wheat Program (GWP) that were genotyped with 1717 Diversity Array Technology (DArT) markers generated by Triticarte Pty. Ltd. (Canberra, Australia; http://www.diversityarrays.com), which is a whole-genome profiling service laboratory. Two traits were analyzed, grain yield (GY) and days to heading (DTH), which were evaluated in different environments (year-drought stressagronomic treatments). GY was measured in seven environments and DTH in ten environments. The number of individuals and the type and number of markers are presented in Table 2; for further details, see [11].

Transforming phenotypic responses into three or two classes
The continuous phenotypic responses y i for each stratified random partition in the datasets were grouped into three classes (upper, middle and lower), based on 15-85 % and 30-70 % percentiles of the response of each trait analyzed. For example, for 15-85 % percentiles, the quantiles q 0.15 and q 0.85 were used to split y i into three classes: y i ∈ upper class, if y i > q 0.85 ; y i ∈ middle class, if For the two species, the target classes were the upper 15 and 30 % classes (GY for maize and wheat); the middle 40 and 70 % classes (ASI for maize), and the lower 15 and 30 % classes (FFL, MFL, and GLS for maize and DTH for wheat).
Comparison of prediction accuracy of PNN based on two or three classes was performed only on the wheat datasets to simplify computations. Firstly, the phenotypic responses y i for each stratified random partition of the wheat datasets were grouped into two classes from the datasets previously grouped into three classes. The upper 15 % of the binary class was defined by using the upper 15 % of the trichotomous classes, and the lower class was the sum of the middle and lower classes of the trichotomous classes; a similar strategy was applied for the lower 15 % of the binary class. The same random partitions (training, testing sets) were used when comparing PNN with two classes versus PNN with three classes. Partitions of the wheat datasets into two classes for GY and DTH are shown in Table 3.

Multilayer perceptron neural network (MLP) classifier
An MLP can be trained to classify items into S different disjoint classes. Each target class c i is transformed into a target vector c i of zeroes except for a 1 in element k, (k = 1, …, S) the class to be represented. We arranged a set of n input vectors x i into a matrix X of dimension n × p. Then we arranged the n target vectors c i into a matrix C of dimension S × n. The rows of X correspond to columns of C, individual-by-individual. Statistical learning is inferred from the data only, with no assumption about the joint distribution of inputs and outcomes. This gives MLP great flexibility for capturing complex patterns frequently found in plant breeding [26].
We begin by describing a standard MLP for a categorical response (PNN is introduced subsequently). MLP is an NN that can be thought of as a two-stage regression (e.g., [18]). In the first stage (hidden layer), M data-derived basis functions, {z m } m = 1 m = M are inferred; in the second stage (the output layer has S neurons, S classes), each neuron's output is computed on the basis functions inferred in the hidden layer using a nonlinear transfer function ( Fig. 1).   In the hidden layer, one data-derived predictor is inferred at each of M neurons. These data-derived predictors are computed by first inferring a score (u mi ), which is a linear combination of the input weights and the input markers plus a bias (intercept) term. Subsequently, this score is transformed using a nonlinear transfer function, φ(⋅), that is, z mi = φ(w mo + ∑ j = 1 p w mj x ij ), where w mo is the bias term, and W m = {W mj } m = 1; j = 1 m = M; j = p is an input weight matrix. The transfer function maps from a score defined in the real line onto the interval [−1, 1] (e.g., a hyperbolic tangent sigmoid transfer function is tansig u ð Þ Þ . Subsequently, in the output layer, phenotypes are regressed on the data-derived features, Training of an MLP (given a fixed number of transfer functions in the hidden layer) involves estimating all of the classifier's parameters by means of an iterative backpropagation error algorithm, based on the scaled conjugate gradient algorithm described by [27]. To improve the generalization capacity of MLP, an early stopping ensemble strategy can be applied [28]; early stopping effects non-Bayesian shrinkage of coefficients. In this approach, we divided the available data into three subsets. The first subset is the training set, used for computing the gradient and updating network weights and biases. The second subset is the validation set, where the error in the set is monitored during the training process. The validation error normally decreases during the initial training phase, as does the training set error. However, when the network begins to over-fit the data, the error in the validation set typically begins to rise. When the validation error increases at some point in the iteration, the training is stopped, and the weights and biases at the minimum validation error are returned. The third subset is used as testing set.
The performance function to optimize an MLP is usually the mean squared error (mse), which is the average squared error between the predicted classes Ĉ and the target classes C. Ĉ is also a matrix of dimension S × n, where each column contains values in the [0,1] range. The index of the largest element in the column indicates which of the S classes that vector represents.

Probabilistic neural network (PNN) classifier
The architecture of a PNN is similar to that of a radial basis function NN [8]; a PNN has two layers, the pattern layer and the summation-output layer, as illustrated in Fig. 2. The pattern layer computes distances (using a Gaussian radial basis function (RBF)) between the input vector x i and the training (centers) input vectors c m ∈ ℝ p ; m = 1, …, M neurons (M = n individuals of the input data set) and returns an output vector u i ∈ ℝ M whose el- is a weight and h is the width of the Gaussian RBF, indicating how close the input vector x i is to c m [22]. Then each u mi is transformed into a vector z i ∈ ℝ M , whose elements are defined by the Gaussian operation z mi = exp(−u mi 2 ). The summation-output layer sums these contributions for each class k, that is, v ki = ∑ m = 1 M w km z mi , where w km are weights obtained from the target classes C matrix, to generate a vector of probabilities ĉ i = softmax(v i ) of dimension S × 1 as its net output, where the softmax transfer function σ(.) is given by where v i is a target vector of dimension S × 1 with elements v k . The softmax transfer function on the summation-output layer transforms the outputs of processing units for each k class in the interval [0,1].
The pattern layer of a PNN is a neural representation of a Bayes classifier, where the class density functions are approximated using a windows Parzen estimator [29]. The standard training method for a PNN (given a value of h for the Gaussian RBFs) requires a single pass over all the x i markers of the training set. For this reason, PNN requires short training time and produce as output (ĉ i ), posterior probabilities of class membership.

Criteria for assessing classifier prediction accuracy
The prediction accuracy of MLP and PNN was evaluated using a cross-validation procedure. For each data set, 50 random partitions stratified by classes were generated. Each partition randomly assigned 90 % of the data to the training set and the remaining 10 % to the testing set. We used stratified sampling by class to make sure there were no empty classes in the training and testing sets. For each data set, partition index matrices PINDX(n, 50) were generated, where n is the number of individuals in each data set analyzed; PINDX(i,j) has a value equal to 1 (training) or 2 (testing) for the i th individual in the j th partition. Each model was trained and evaluated with the same pair of training and testing sets of each partition. For MLP the training sets defined in PINDX(n, 50) were subdivided by stratified random sampling by class into two disjoint sets, one for training (88 %) and another for validation (12 %); this was done with the objective of applying the training early stopping ensemble strategy [28]. For each random partition, ten replications (random seeds) were used to evaluate the performance of MLP.
Two performance measures for assessing prediction accuracy of the two classifiers (averaged across 50 random partitions) were used: (1) the area under the receiver operating characteristic curve (AUC), and (2) the area under the precision-recall curve (AUCpr), or average precision.
For GY in both species, models were trained to maximize the AUC of the upper class; for FL, GLS, and DTH, models were trained to maximize the AUC of the lower class; for ASI, the target value is zero (perfect synchrony between anthesis and silking interval), models were trained to maximize the AUC of the middle class.
The area under the receiver operating characteristic curve (AUC) Rather than computing the recall (R) [also called sensitivity or true positive rate (tpr)] and the false positive rate (fpr) for a fixed threshold τ, a set of thresholds was defined and then tpr vs fpr(R vs f pr) was plotted as an implicit function of τ; this is called an ROC curve.
The recall or sensitivity is R ¼ tp tpþf n ; where tp is the number of positives predicted as positives and fn is the number of positives predicted as negatives. This measure evaluates the number of individuals that are correctly classified as a proportion of all the observed individuals in the target class. f pr ¼ f p f pþtn ; where fp is the number of negatives predicted as positives and tn is the number of negatives predicted as negatives (Table 4).
To compare the performance of classifiers, the receiver operating characteristic curve (ROC) has to be reduced to a single scalar value representing the expected performance. A common method is to compute the area under the ROC curve (AUC), which produces a value between 0 and 1. If AUC(a) > AUC(b), then classifier a has a better average performance than classifier b. AUC can be interpreted as the probability that a randomly chosen individual is ranked as more likely to be of the target class than a randomly chosen individual of another class. The ROC graphs are a useful tool for visualizing the performance of the classifiers because they provide a richer measure of classification performance than other scalar measures [30].

The area under the precision-recall curve (AUCpr)
A precision-recall curve is a plot of precision (P) vs R for a set of thresholds τ. P ¼ tp tpþf p is defined as the fraction of positives predicted as positives with respect to all predicted positives (Table 4). Thus P measures the fraction of the predicted positives that is really positive, while R measures the fraction of the predictive positives  that was in fact detected. This curve is summarized as a single number using the average precision (AUCpr), which approximates the area under the precision-recall curve [31]. This measure is recommended for classes of different sizes; upper or lower classes of 15 % had a lower number of individuals than the corresponding upper or lower classes of 85 %. AUC is commonly used to present results of binary decision problems in machine learning algorithms. However, when dealing with unbalanced classes, AUCpr curves give a more informative idea of a machine learning algorithm than AUC [32,33].

Software
Scripts for fitting models and performing cross-validations were written in MATLAB r2010b. All the analyses were performed in a Linux Workstation.

Results and discussion
Results of the value of AUC for classifiers MLP and PNN in each trait-environment combination are depicted in histograms in Fig. 3a-d (maize datasets) and Fig. 4a-b (wheat datasets) for the traits selected in the upper and lower (15 and 30 %) and middle (40 and 70 %) classes, respectively. The first clear trend using the AUC criterion is that PNN outperformed MLP for most of the individuals in the upper, middle and lower classes. Depending on the trait-environment combination, the PNN30% or PNN15% upper and lower and the PNN40% and PNN70% middle were usually larger than those of MLP; the only exception was PNN15% for GY-SS (Fig. 3a), which was lower than MLP15% (Additional file 1: Table S1).
We also describe AUC and AUCpr results of comparing the performance of PNN for wheat trait-environment combinations using two or three classes.  Table  S1 shows the results based on the AUC criterion for the upper, middle and lower classes.
When using the AUCpr criterion, which relates P and R for the upper class, PNN outperformed MLP, which is clearly shown in Table 5 (as shown for the AUC criterion in Fig. 3a). Also, AUCpr for PNN30% was always better than PNN15% for all the traits in the upper class. These results lead to the conclusion that PNN was more accurate than MLP for assigning maize lines to the correct upper class for GY under WW and SS conditions. Also under the AUC criterion, PNN30% was similar to PNN15% for GY-HI and GY-WW, but better than PNN15% for GY-LO and GY-SS. Under the criterion AUCpr, PNN30% was always better than PNN15% for all GY. Concerning the AUC criterion for the middle class based on ASI-SS and ASI-WW, Fig. 3b shows a slight superiority of PNN over MLP for both 40 and 70 %; however, PNN40% was, on average, slightly better than PNN70%. On the other hand, results using the AUCpr criterion also show a slight superiority of PNN over MLP for MLP40% for ASI-SS and MLP70% for both ASI-SS and ASI-WW (Table 5). For this middle class, the AUCpr results favored PNN as a better predictor than MLP for assigning maize lines to the correct middle class.

Lower classes (15 and 30 %)
For the lower class, Fig. 3c for FL and Fig. 3d for GLS (both traits in different environments) show a clear superiority in terms of the AUC criterion of PNN over MLP for both lower classes. The better prediction accuracy of classifier PNN is reflected in AUCpr prediction accuracy, where PNN outperformed MLP for both lower classes, and PNN30% was higher than PNN15% for all 10 traits (Table 5).

Comparing classifiers for selecting individuals in the upper and lower classes in the wheat datasets Upper classes (15 and 30 %)
Results of AUC for GY that were selected in the upper 15 and 30 % classes are presented in Fig. 4a and in Additional file 2: Table S2. PNN outperformed MLP for both upper classes for all GY. PNN30% gave better prediction accuracy than PNN15% in most traits, with the exception of GY-3 and GY-6, where PNN15% had better prediction than PNN30%.
Criterion AUCpr showed that PNN was better than MLP for both upper classes; PNN appeared as the best class predictive models in all GY traits. Furthermore, under the AUCpr criterion, PNN30% was higher than PNN15% in all wheat GY traits (Table 6). In summary, results of the upper 15 and 30 % classes show that PNN was a more accurate predictor than MLP when using the AUC and AUCpr criteria.

Lower classes (15 and 30 %)
For the lower classes involving wheat DTH, AUC of PNN was higher than MLP for both 15 and 30 % percentiles and all traits (Fig. 4b). In five instances (DTH-2, DTH-3, DTH-5, DTH-6 and DTH-9), the PNN15% model was slightly more accurate than PNN30% when classifying individuals in this lower class.
The best performance of PNN was reflected in the prediction accuracy given by the AUCpr criterion, where PNN was better than MLP in both lower classes for all DTH traits. Likewise, PNN30% was always higher than PNN15% (Table 6).

Prediction accuracy of PNN classifier with two and three classes in the wheat datasets
This section compares the performance of PNN in the upper and lower (15 and 30 %) classes for wheat GY and DTH traits, when two and three classes are formed and evaluated using the AUC (Table 7) and AUCpr (Table 8) criteria. For the AUC criterion, PNN with three classes was slightly better than PNN with two classes for most traits in the upper and lower 15 and 30 % classes (Table 7). For the AUCpr criterion, results were not as clear as for AUC; however, PNN with three classes was globally better than PNN with two classes (Table 8).
In summary, results for the wheat datasets comparing the performance of PNN for selecting individuals in the lower and upper 15 and 30 % classes, based on the splitting of continuous traits into two or three classes, showed that for the lower 15 %, the performance of PNN with three classes was better than PNN with two classes (in seven of ten traits). However, PNN with two  classes gave better predictions than PNN with three classes in the upper 15 % (four over seven traits). This is not the case when predicting individuals in the upper and lower 30 %, where PNN with three classes was a better predictor than PNN with two classes for most traits.

ROC and precision-recall curves for the maize and wheat datasets
Some results of the ROC and precision-recall curves for various maize and wheat datasets for upper and lower 15 and 30 %, with the middle class in maize for 40 and 70 %, are displayed in a series of figures (for the maize datasets, Fig. 5a-f; for the wheat datasets, Fig. 6a-d). For the maize and wheat datasets, it is clear that the ROC curves of PNN for the upper and lower 15 and 30 % and the middle 40 and 70 % dominated the corresponding curves of MLP. Also, AUC values for PNN were always greater than those for MLP. Furthermore, the P vs R graphs show that for all the maize and wheat datasets, PNN was better than MLP, indicating that the precision of PNN remains better than that of the MLP for all recall values. The precision of PNN started declining at higher values of R than the values of R for MLP.

Accuracy of the MLP and PNN classifiers for selecting the best individuals
Genomic selection aims to accurately predict genetic values with genome-wide marker data using a three-step process: 1) model training and validation, 2) predicting genetic values, and 3) selecting based on these predictions [34].
We evaluated the performance of classifiers MLP and PNN for selecting the best individuals in maize and wheat datasets (Tables 1 and 2). Results indicated that, overall, PNN was more precise in identifying individuals in the correct class than MLP. Previous studies using RBFNN and Bayesian regularized NN on the same wheat datasets [8,11] used in this study showed their prediction advantage over the linear parametric models for complex traits such as GY because these models can capture cryptic epistatic effects in gene × gene networks such as those usually present in wheat (e.g., additive × additive interactions). The good performance of PNN for selecting individuals in the correct classes may also be due to its ability for capturing small and complex interactions, while MLP may fail to do so.
The fact that these classifiers are trained to maximize the probability of membership of an individual to the target class, rather than searching for an overall performance, makes it attractive for applying these tools in GS. Results from MLP and PNN indicated that PNN was much more efficient in maximizing the probability of membership for the upper, middle, and lower classes than MLP.
From a practical genome-assisted plant breeding perspective, this study attempts to mimic the breeder's decision, for example when selecting the upper 15 or 30 % class candidates for GY, or when selecting the lower 15 or 30 % class candidates for DTH, GLS or FL. In maize breeding, ASI synchrony close to zero is a crucial "middle class trait" under SS conditions because it will ensure selecting plants that will simultaneously produce pollen and silk; thus grains can be harvested. Therefore, PNN should help genomic-assisted breeding select appropriate candidates in each class of interest.
Breeding values have two main components, parental average (accounting for between family variation) and Mendelian sampling (accounting for within family variation). Genomic prediction should account for these two main components and try to control potential population structures that could modify prediction accuracy between the selected training and testing populations. An important practical question is how well PNN and MLP predict the breeding value of individuals between    families and within families that were not phenotyped. Although the elite maize and wheat lines used in this study are not ideal as training sets, the cross-validation scheme used in this study (where 50 random partitions stratified by classes were generated for each data set) attempts to mimic the prediction of non phenotyped individuals belonging to different families (crosses) or to the same family. Although this cross-validation design may not have chosen individuals between and within families as precisely as they are in reality, it is likely that the 50 random partitions searched for all possible relationships between individuals in the training and testing sets such that some cross-validation partitions selected subsets of training data that had high correlations with the observed data, indicating a family relationship among individuals belonging to those training-testing subsets [11], whereas other random partitions chose subsets of training individuals that had no family relationship with those in the testing set, thus producing low correlations with the observed values. When applied to both classifiers, PNN consistently gave better average prediction accuracy (across the 50 random partitions) of the genetic values of the unobserved individuals than MLP in all 33 maize and wheat data sets.

AUC and AUCpr
For both datasets, the results of the AUCpr criterion showed that the values of the upper and lower PNN30% were higher than those for the upper and lower PNN15%. Also, the values of the middle PNN70% were higher than those for the PNN40% (Tables 5 and 6). These results were similar but not equal to those found by AUC (which does not account for imbalances in the number of individuals comprising the upper, middle and lower classes) in several instances. PNN15% was superior to PNN30% in the maize data (e.g., ASI-SS, ASI-WW, FFL-SS, MFL-SS, GL-1, GLS-4) and the wheat data (e.g., DTH-2, DTH-3, DTH5, DTH-6, DTH-9). Prediction accuracy of individuals was clearly hampered under biotic stress in the maize data, which was also found by [6,8,11,35].
Figures 5a-f and 6a-d showing the ROC curve clearly indicated the advantage of PNN over MLP. The R vs fpr graph indicates that, for most of the traits, the probability of correctly classifying an individual in the upper, lower or middle classes was very often 0.80 or higher, even with a small fpr. In most cases, at a value of fpr = 0, the probability of classifying an individual in the correct class was 0.80 or greater for PNN. For all traits, the AUC of PNN15% was always better than the AUC of MLP15% and the AUC of PNN30% was better than the AUC of MLP30%.
For the AUCpr curve, Figs. 5a-f and 6a-d indicate that, in most cases, PNN had higher precision than MLP at higher sensitivity values. This criterion also indicates the superior performance of PNN over MLP.

Prediction accuracy for 30 vs 15 % classes with binary and trichotomous classes
Based on the AUC criterion, it is clear that PNN gave better prediction accuracy than MLP when assigning maize and wheat individuals to the classes of interest. Using the AUCpr criterion, the results were equally clear for the wheat and the maize datasets.
For the wheat datasets, the AUC criterion showed the superiority of PNN30% with three classes over PNN30% with two classes, as well as the superiority of PNN15% with three classes over PNN15% with two classes (Table 7). However, the differences given by the AUC criterion were not as marked as those shown by the AUCpr criterion. The AUCpr criterion applied with PNN shows that for the upper 15 % classes (GY traits), partitioning the data into two classes assigned more wheat lines to the correct observed classes than partitioning the data into three classes. However, for the lower 15 % classes (DTH traits) and for PNN 30 % upper and lower classes, results indicate that three classes gave better prediction than two classes (Table 8).

Conclusions
We compared the performance of the multilayer perceptron (MLP) and the probabilistic neural network (PNN) classifiers for selecting the best individuals belonging to a class of interest (target class) in maize and wheat datasets using high-throughput molecular marker information (55 k and 1.4 k). PNN outperformed MLP in most of the datasets. The performance criteria used to judge the predictive accuracy of MLP and PNN for assigning individuals to the right observed class were the area under ROC curve, AUC, and the area under the precision-recall curve, AUCpr, PNN had better accuracy than MLP. In genomic selection, where p markers > > n individuals is the norm, PNN seems promising because of its better generalization capacity than MLP, and is faster than MLP in obtaining optimal solutions, thus presenting appealing computational advantages.

Availability of supporting data
The 33 datasets (16 maize and 17 wheat trials) and the MATLAB scripts used in this work are available at http://hdl.handle.net/11529/10576.