High polymorphism in pepper EcoTILLING platform
We developed a cDNA EcoTILLING platform for pepper to search for allelic variants of the eIF4E and eIF(iso)4E genes. 233 accessions from South America and the Canary Islands were selected in their primary (C. chinense, C. frutescens, C. pubescens and C. baccatum) or secondary (C. annuum) centres of diversity [5]. Maximum variability is found within these centres of diversity, which has made it possible to find a high polymorphism in the eIF4E and eIF(iso)4E initiation factors.
36 nucleotide changes were detected in the 21 coding sequences of the eIF4E gene as well as 26 changes in the 17 eIF(iso)4E sequences. Our results showed that the pepper accessions used are highly variable. It is possible that cultivated peppers have more genetic variability compared to other crops, such as tomato [45]; but it is also possible that this high variability is due to our selection, based as it is on the diversity centres as opposed to a worldwide selection. The only other EcoTILLING study with the eIF4E gene done in melon found very low diversity [14]. Only six polymorphic sites were identified in 113 accessions of C. melo and one accession of Cucumis africanus L. In spite of the fact that our EcoTILLING assays were only done in exons, our pepper cDNA platform has a high level of polymorphism compared to melon, and is similar to other works that use EcoTILLING in wild or natural species with other genes. Comai et al. used 150 plants of A. thaliana and discovered 55 haplotypes in the five loci analysed [6]. In 25 natural variants of Mla in barley, from five to ten point mutations in 451 bp were identified [13]. EcoTILLING has also identified polymorphisms in black cottonwood (Populus trichocarpa T.), and 63 SNPs were identified in 8191 bp [18]. In Vigna radiata L., a total of 131 SNPs and 26 indels in 45 haplotypes from ten primer sets (6461 bp) were detected [19]. In common bean (Phaseolus vulgaris L.), 22 SNPs were identified in 37 EST candidates [21], whereas in 30 accessions of peanut, eight SNPs were identified in an amplicon of 1280 bp [17]. Therefore, our EcoTILLING collection has high genetic variation between selected accessions, which could be very useful for identifying natural variation in other genes related to biotic or abiotic responses and quality.
Variability of eIF4E and eIF(iso)4E genes in pepper
A total of 21 haplotypes of the eIF4E gene and 15 haplotypes of the eIF(iso)4E gene were identified. Although most of the haplotypes were species-specific, eIF4E_B1 was detected in C. annuum and C. chinense, eIF4E_G1 in C. chinense and C. frutescens, eIF4E_F3 was present in all three of these species and eIF(iso)4E_B1 was present in C. chinense, C. frutescens and C. eximium. This may be due to the genetic flow between the species of the C. annuum complex (C. annuum, C. chinense and C. frutescens) [46]. However, in the case of C. eximium, it is likely due to the presence of ancestral haplotypes, as this species does not cross viably with the others [47].
The 21 eIF4E nucleotide sequences coded for 19 different proteins, and the 15 eIF(iso)4E haplotypes coded for 10 eIF(iso)4E proteins, 23 of which were new protein variants. In previous works, 10 allelic variants of eIF4E proteins in C. annuum[23, 25] and 2 allelic variants in C. chinense[24] that coded for 12 eIF4E proteins were described. The published proteins of C. chinense and three of C. annuum were also identified by EcoTILLING in our pepper collection. In the case of the eIF(iso)4E initiation factor, only 2 proteins in C. annuum had been previously described [36]; our eIF(iso)4E_C1 haplotype was identical to the published allele, pvr6+. We identified 14 new eIF4E proteins and 9 new eIF(iso)4E proteins in our pepper collection, but alleles and proteins that had already been described were also identified, confirming that our EcoTILLING platform contains a good representation of Capsicum variability.
We detected 24 non-synonymous changes in eIF4E haplotypes and 14 in eIF(iso)4E. This high number of amino acid substitutions might be due to the selection and co-evolution of virus and host. In fact, Charron et al. described the evidence for co-evolution between eIF4E of C. annuum plants and potyviral VPg, as there is a strong evolutionary pressure to resist viral pathogens [25]. Amino acid changes in the central domain of the PVY-VPg protein have been demonstrated to be subject to positive selection [48]. Moreover, the existence of positive selection within the recessive resistance gene eIF4E has already been described in plants [49].
Functionality of new alleles identified by EcoTILLING
The main objective of EcoTILLING is to isolate useful new haplotypes or alleles in target genes. Thus, the molecular variation in eIF4E and eIF(iso)4E genes could be very useful for identifying new resistance alleles against important viruses in pepper. Among the pepper potyviruses, Potato virus Y is widespread throughout most of the cultivated areas [25]. PVY can be transmitted by many species of aphids, but chemical methods are effective enough to control the vector. Nevertheless, the impact of this virus has increased due to the restriction of phytosanitary treatments. In recent years, resistance alleles for this virus have not been very important in breeding programs, but now, with the treatment reduction, the use of resistant varieties is the most effective way to prevent damage from the virus. Therefore, a screening with PVY-F14K was done to study the response of the new eIF4E and eIF(iso)4E proteins discovered. The F14K isolate of PVY completed the viral cycle in some or all accessions of each analysed species, which indicated that resistance is not species-specific.
We analysed the results according to the protein combinations of eIF4E and eIF(iso)4E (see Additional file 4). From analysis of the correlation between proteins and disease resistance, we hypothesised that the eIF(iso)4E proteins were not involved in the resistance to PVY as most of the accession responses could be explained by eIF4E proteins. Accessions carrying the proteins eIF4E_C, eIF4E_F, eIF4E_G, eIF4E_J and eIF4E_P generated symptoms, while accessions with the proteins eIF4E_M, eIF4E_L, eIF4E_R and one accession with the eIF4E_Q protein (CDP04710) and another with eIF4E_F (CDP00614) showed systemic infection but were symptomless. Other accessions carrying new eIF4E proteins found in this study (eIF4E_D, eIF4E_H, eIF4E_K, eIF4E_O and eIF4E_N) showed a resistance response as the viral infection was not detected throughout the experiment.
When we compared previous results of the published eIF4E proteins, we obtained the expected response in some accessions that contain these eIF4E proteins. Thus, the accessions with eIF4E_E and eIF4E_A proteins did not show systematic infection, like the pvr21 and pvr22 alleles that break the PVY cycle when inoculated [23, 25, 48]. The eIF4E_I protein was identical to the eIF4E protein that carries the pvr1 allele that confers broad-spectrum resistance to strains of PVY [50]. In spite of this, the CDP07700 accession (eIF4E_I protein) showed a tolerant response, as 3/5 of the plants were DAS-ELISA positive without symptoms. Nevertheless, all CDP07700-accession plants blocked viral accumulation at 45 and 60 DPI. Different factors may explain the initial viral accumulation in some plants of this accession, such as environmental or experimental factors (heavy inoculation pressure, development stage, temperature) or distinct responses to different PVY isolates. All accessions that contain the eIF4E_B protein, identical to the pvr2+ susceptible allele [23, 25], showed a susceptible response to PVY-F14K, except the CDP09688 accession that did not show PVY systemic infection and in which no symptoms were observed. The resistance response of this accession suggests that it is not related to eIF4E nor to eIF(iso)4E, as accessions that carry the same proteins in homozygosis (eIF4E_B, eIF(iso)4E_C or eIF(iso)4E_D) showed susceptible, tolerant or resistant responses. Thus, this response to PVY could be due to another resistance mechanism.
The CDP06234 accession with the eIF4E_F and eIF(iso)4E_C proteins did not show viral multiplication in all tested plants, unlike the other accession with the same protein variants of both genes, CDP00624. The resistance response of the CDP06234 C. baccatum accession could not be explained by these translation initiation factors. The CDP00614 accession that carries the eIF4E_F and eIF(iso)4E_A proteins showed a different response compared to other accessions with the same eIF4E protein variant. These plants showed systemic infection but were symptomless. Nevertheless, in this case, the combination of eIF4E_F and eIF(iso)4E_A could not be ruled out as no accessions with eIF(iso)4E_A showed a susceptible response. Moreover, eIF(iso)4E_A could also explain the different responses of the accessions with the eIF4E_Q protein, CDP05929 and CDP04710. Nevertheless, this combination hypothesis is unlikely as the eIF(iso)4E factor was not related to resistance to PVY in the previous works [23, 25] and could only clarify the response of these two latter accessions. In fact, the response of CDP09688, CDP06234, CDP00614 and CDP05929 to PVY-F14K is more easily explained by the presence of another gene or resistant mechanism. For instance, the Pvr4 anonymous locus of the C. annuum line, Criollo de Morelos 334, showed a complete dominant resistance to all PVY pathotypes as well as to Pepper mottle virus[51].
Nine newly identified eIF4E proteins seemed to interact with PVY-F14K, but five new eIF4E variants (D, H, K, N and O) were related to resistance response to this virus. The published resistance alleles of pepper carry different combinations of non-conservative amino acid changes that are localised on the surface of eIF4E in regions I (exon 1) and II (exon2) [25, 41]. The changes in these regions seem to be responsible for resistance at least against PVY [23, 25]. Thirteen non-synonymous changes (amino acid positions 15, 65, 73, 77, 131, 160, 213, 214, 216, 218 and 219) of the eIF4E haplotypes analysed by mechanical inoculation in this survey were previously unknown (Table 2). Seven of the mutations detected by EcoTILLING in exons 1 and 2 of the eIF4E gene generate a change of charge. Six mutations are specific to the new resistance alleles and are localised in exon 1 or exon 5. The published proteins eIF4E_A (pvr22 ) and eIF4E_E (pvr21 ) have a V67E change (region 1), which is sufficient to compromise PVY infection [25]. eIF4E_N, eIF4E_O and eIF4E_K of C. baccatum and eIF4E_ H of C. chinense have a single amino acid substitution in exon 1 (N65D) which also may prevent multiplication of the virus. Although C. baccatum showed a low susceptibility to PVY-F14K, this mutation may explain the resistance response of some C. baccatum accessions. The other resistant protein, eIF4E_D (C. chinense), has other amino acid changes located in regions I (P66T, A73D), II (G107R) and V (L218F). These changes in regions I and II are also present in other resistance or tolerance eIF4E proteins (eIF4E_K and eIF4E_I).
The new alleles of the eIF4E initiation factor have been named pvr2 with a numerical superscript for each new allele based on the latest described allele (pvr29 of Chile de Arbol (C. annuum) [25]), using the nomenclature of Kyle and Palloix [50]. The resistance alleles to PVY-F14K, D1, H1, K1, N1 and O1, have been designated as pvr210 , pvr211 , pvr212 , pvr213 and pvr214 , respectively. The new susceptible or tolerant alleles of the eIF4E gene, C1, G1, R1, J1, L1, M1, P1 and Q1, have been designated as pvr215 , pvr216 , pvr217 , pvr218 , pvr219 , pvr220 , pvr221 and pvr222 , respectively. The new eIF(iso)4E alleles have been named pvr6 with a numerical superscript for the new allele based on the only resistance allele, pvr6 (Perennial [Genbank: DQ022083] cultivar of C. annuum[36]). Thus, the eIF(iso)4E alleles that code for a new eIF(iso)4E protein, A1, B1, D1, E1, F1, H1 and K1, have been designated as pvr62 , pvr63 , pvr64 , pvr65 , pvr66 , pvr67 and pvr68 . Finally, the alleles that code for the eIF(iso)4E_G protein (eIF(iso)4E_G1 and eIF(iso)4E_G2) have been named pvr69 .
These five new resistant eIF4E alleles, and their possible use in heterozygosis, could make their introgression into new commercial hybrids easier. Moreover, these recessive alleles are excellent allele reserves against the changing nature of viral pathogens, as mutations are required in the VPg gene of the PVY to restore its interaction with the different mutated eIF4E proteins [25, 48].
EcoTILLING platform implemented in pepper
The use of cDNA as starting material for endonuclease-based EcoTILLING has not been previously described. However, the cDNA-based mutation screening has been described using other technologies, such as HRM [52] or DNA sequencing [53]. This strategy has several advantages with respect to EcoTILLING assays based on DNA. Although EcoTILLING is a potent technique for discovering SNPs and examining DNA variation in natural populations, one problem is the amplification of target genes from wild species, due to sequence mispriming [14]. The use of cDNA and primers located in transcribed sequences avoids this problem, as these regions are more conserved. This facilitates the use of sequence data from related and model species. Another advantage is that this system avoids the intron sequence, and the complete CDS can be amplified in one or two reactions. In DNA EcoTILLING, the presence of introns may be a problem as most of the changes will be located in these regions. To avoid this, an exon-by-exon strategy is usually adopted, requiring as many reactions as there are exons in the candidate gene. In contrast, the cDNA EcoTILLING approach reduces the number of PCR and CEL I reactions per sample.
This work is an example of where cDNA EcoTILLING is both cost- and time-effective: it involves different species, only mutations in the CDS are of interest and the genes have several highly variable exons. Another question is the possibility of using oligo-dT in order to transcribe several genes with the same RT reaction. We used specific primers to improve the efficiency and specificity of the amplification of candidate genes. The expression of the candidate genes in the target tissue has to be previously studied.
One of the essential points of CEL I-based EcoTILLING is the analysis of the band pattern and classification of the samples in relation to it. Some factors may make pattern identification in EcoTILLING assays more difficult. One is experimental variation, as different reactions and denaturing gels make band comparison more difficult. Another factor is the differential mismatch preference shown by the endonuclease, CEL I [9], which may also be influenced by experimental variance, resulting in different band intensity in different samples. It is of note that this preference was not observed by Till et al. using IRDye labelling and the LI-COR system [10]. Some factors depend on target sequence, for if there are several mismatches between the samples and the standard sample, the detection of all internal cuts is difficult because the cDNA is only labelled at the 5' and 3' terms. Thus, the presence of several internal cut sites produces weak bands due to partial digestion. Moreover, very close mismatches are very difficult to differentiate by electrophoresis, especially if, due to the number of samples involved, several gels are used. The use of a second standard with a similar or identical sequence reduces these problems and facilitates the correct classification of the samples. These problems, related to accuracy, are not associated with complete CDS amplification; even if the analysis is carried out independently for each exon, it is necessary to use a second standard to detect the presence of very close mutations.
Thus, to confirm our initial classification, the samples were compared to a new related standard. After the second enzyme reaction, the samples were reclassified and the groups reduced to 46 for the eIF4E gene and increased to 31 for eIF(iso)4E. Nevertheless, after cDNA sequencing, the groups were reduced to 29 and 17, respectively. This discrepancy between CEL I groups and sequencing haplotypes is attributable to small differences in band pattern due to experimental variation, the high variability of eIF4E and eIF(iso)4E genes in the selected accessions and the presence of very close mutations. To minimise rare allele loss we designed our double-test protocol and took into account a large number of weak bands, which increased the false positive rate. This explains why most CEL I groups were represented by one or two accessions and why some of them showed the same sequence. This would also happen if a genomic DNA platform were used. In spite of the number of false positives in our assay, this double test is very efficient in detecting differences between very similar haplotypes. In our analysis, we detected six haplotypes that had been misclassified in the first analysis, one of which resulted in an eIF4E allele resistant to PVY-F14K.
This EcoTILLING platform contains a good representation of Capsicum genetic variability and is easy and efficient to test using CEL I-based protocols. But this collection is also useful in identifying new nucleotide variations using other techniques, such as direct sequencing with new sequencing systems (NGS), high resolution melting (HRM) or sensitive capillary electrophoresis (CSCE). Several new approaches based on these techniques have been published [54–56], and it is very likely that more efficient NGS-based strategies will be developed. For this collection, we have also built up a DNA platform in 96-well format available for use. In this new scenario, our pepper platforms with cDNA and DNA will facilitate and still be useful for the isolation of new mutations and SNPs related to traits useful in the breeding of these species.