- Research article
- Open Access
Identification of SNP and SSR markers in eggplant using RAD tag sequencing
BMC Genomicsvolume 12, Article number: 304 (2011)
The eggplant (Solanum melongena L.) genome is relatively unexplored, especially compared to those of the other major Solanaceae crops tomato and potato. In particular, no SNP markers are publicly available; on the other hand, over 1,000 SSR markers were developed and publicly available. We have combined the recently developed Restriction-site Associated DNA (RAD) approach with Illumina DNA sequencing for rapid and mass discovery of both SNP and SSR markers for eggplant.
RAD tags were generated from the genomic DNA of a pair of eggplant mapping parents, and sequenced to produce ~17.5 Mb of sequences arrangeable into ~78,000 contigs. The resulting non-redundant genomic sequence dataset consisted of ~45,000 sequences, of which ~29% were putative coding sequences and ~70% were in common between the mapping parents. The shared sequences allowed the discovery of ~10,000 SNPs and nearly 1,000 indels, equivalent to a SNP frequency of 0.8 per Kb and an indel frequency of 0.07 per Kb. Over 2,000 of the SNPs are likely to be mappable via the Illumina GoldenGate assay. A subset of 384 SNPs was used to successfully fingerprint a panel of eggplant germplasm, producing a set of informative diversity data. The RAD sequences also included nearly 2,000 putative SSRs, and primer pairs were designed to amplify 1,155 loci.
The high throughput sequencing of the RAD tags allowed the discovery of a large number of DNA markers, which will prove useful for extending our current knowledge of the genome organization of eggplant, for assisting in marker-aided selection and for carrying out comparative genomic analyses within the Solanaceae family.
Eggplant (Solanum melongena L., 2n = 2x = 24) is a species belonging to the Solanaceae family. It is assumed to have been first domesticated in South and East Asia , and brought to Europe by Arab traders and immigrants around 600 CE . In production terms, eggplant is the third most important Solanaceae crop species (after potato and tomato; http://faostat.fao.org), and is cultivated all over the world, but most intensively in China and India. About 2.4% of world production in 2009 is sited in Europe, with Italy being the single largest producer.
The estimated genome size of eggplant is 1.1 Gbp . Knowledge of its genome organization is rather limited compared to that of either tomato or potato (http://solgenomics.net/, http://www.potatogenome.net). Genetic maps based on both inter-specific [4, 5] and intra-specific [6–9] crosses have been developed. The most recent inter-specific map  is constituted of 347 COS and RFLP markers spanning 1,535 cM, while the most recent intra-specific maps were constructed by Barchi et al. and Nunome et al. and comprise 238 markers, spanning 718.7, and 236 markers, spanning 951.4 cM, respectively. Nevertheless the level of marker saturation is still low in the context of both fine mapping and genomic synteny. A small set of SSR markers was developed by Stagel et al. from genic DNA sequence lodged in public access databases, while Nunome et al. reported the identification of over 1,000 SSR markers from a screen of enriched gDNA and cDNA libraries. Many of these latter proved informative for intra-specific mapping and have been used to generate what is currently the best available genetic linkage map. More recently, Fukuoka et al. have published a dataset containing a large number (~ 16,000) of transcript sequences, but these have yet to be mined for either SSR or SNP markers.
The so-called "Restriction-site Associated DNA" (RAD) method was proposed by Miller et al. as providing a reliable means for genome complexity reduction. The concept is based on acquiring the sequence adjacent to a set of particular restriction enzyme recognition sites. The application of high throughput sequencing technology has allowed significant progress in developing a RAD genotyping platform ; specifically, large volumes of polymorphism data can be now generated by applying massively parallel sequencing and multiplexing with RAD tag libraries.
In this report we describe the generation of genomic RAD tags from the two parents of an F2 segregating population used to generate an intra-specific eggplant genetic map ; the RAD tags were sequenced using the Illumina platform and then annotated/categorized. These data allowed the discovery of a large number of SNP, indel and SSR markers, and some of the SNPs have been tested against a panel of eggplant accessions.
Results and Discussion
Sequencing and contig assembly
The sequencing procedure (Figure 1) generated 10.90 million reads for '305E40' and 12.12 million for '67/3', parents of an F2 intra-specific mapping population (see methods section), equivalent to ~13.3 Mb of sequence for '305E40' and 13.8 Mb for '67/3'. After editing/trimming, ~ 17.5 Mb high quality sequence was available. Raw data have been made available through the Sequence Read Archive (SRA) repository at NCBI (SRA035360.1). The reads were assembled into 77,876 contigs (38,935 from '305E40', 38,941 from '67/3'); the '305E40' assemblies were of mean length 351 bp (range: 218-585 bp; N50: 362 bp), and those from '67/3' of mean length 368 bp (range: 218-579 bp; N50: 382 bp). The SM-I (Solanum melongena-Illumina) dataset finally comprised 45,390 sequences, including 31,635 sequences (12.5 Mbp) in common between the two mapping parents (Table 1), and formed the basis of the subsequent annotation and functional categorization (Additional file 1). The SM-I dataset was also screened for the occurrence of repetitive elements. About 6.7% (1.1 Mbp) of the sequence database showed some similarity with known plant mobile elements and was thus filtered out for SNP mining procedures.
In all, 6,411 sequences (14.1%) of the SM-I dataset matched 4,761 entries in the Fukuoka 16 K eggplant annotated unigene dataset (later referred as 16 K) . A BlastN search of the SGN Cornell unigene database (http://solgenomics.net/) produced significant hits from 9,476 (20.9%) of the SM-I sequences matching 8,244 SGN unigenes, of which ~47% originated from tomato, ~38% from potato, and ~11% from tobacco. Combining the 16 K and SGN hits produced 12,315 unique sequences; a total of 9,976 sequences were properly annotated, of which 2,123 were annotated in both the SGN and 16 K databases, 6,440 only in SGN, and 1,413 only in 16 K. Some 35,414 SM-I sequences were unrepresented in either of these two databases, and these were used as a batch BlastX query against the TAIR9 Arabidopsis thaliana protein database to allow a putative assignment of function. In all, 2,798 sequences (7.9%) produced a hit with an E value of < e-15, corresponding to 1,853 A. thaliana genes. This rather small number of hits presumably reflects sequence divergence between eggplant and A. thaliana orthologs, although it has been recognized that the BLAST algorithm can be rather inefficient in identifying homologous sequences when short reads are involved . Globally, therefore, the SM-I dataset consists of some 12,774 annotated sequences which match 7,191 A. thaliana loci (Additional file 2).
The annotated SM-I sequences were functionally assigned using their A. thaliana orthologs as input (AGI codes) (Additional file 2), these functions were then arranged into GO slim categories (Figure 2) . Since a given gene product can be associated with more than one GO term, the total number of GO terms exceeded that of the unigenes [14, 16]. The eggplant SM-I sequences resolved into 24,522 GO terms associated with "biological process", 15,137 with "cellular component" and 12,144 with "molecular function". The "response to biotic stimulus" category applied to 492 sequences (290 GO terms), among which the majority was related to the defense response against bacterial (22.1%), nematode (10.3%) and fungal (9.3%) infection. These sequences, especially the fungal response ones, are of particular interest, as '305E40' carries a major gene conferring resistance to Fusarium oxysporum f. sp. melongenae[9, 17]. Among the "response to abiotic stimulus" sequences (937 sequences, 737 GO terms), 19.5% were associated with the response to salinity stress, 12.3% to low temperature and 7.7% to high temperature. Globally, the sequences were assigned to a wide range of gene ontology categories, indicating that a wide representation of transcripts was originally present in the RAD tags. Since about 12,000 SM-I eggplant sequences were annotated, it seems plausible to assume that we were able to capture a consistent fraction of the eggplant gene space.
Just over 10,000 SNPs were identified between the mapping parents, involving 5,179 of the 31,635 shared sequences (later referred as 10 K, Additional file 3), as well as 874 indels (Table 1). To minimize false positives with respect to the SNPs, paired-end reads and SNP calling based on deep multiple alignment (minimum 6x coverage) were applied. The global inter-samples SNP frequency was 0.8 per Kb, and the indel frequency 0.07 per Kb. We report the current SNP frequency as mostly belonging to the un-transcribed portion of the eggplant genome since we adopted two endonucleases recognizing GC-rich sites, being one methylation sensitive (Sgr AI). This SNP frequency is lower than has been detected in potato (11.5 per Kb; ), grapevine (2.5 per Kb in coding and 5.5 in non-coding sequence; ), barley (6.3 per Kb in coding sequence; ), maize (8.9 per Kb in coding sequence; ) and Citrus spp. (6.1 per Kb; ), but is similar to that found in tomato (0.6 per Kb; ), sweet pepper (1.0 per Kb; ), rice (1.7 per Kb; ) and confirmed the low level of intra-specific genetic polymorphism previously observed in eggplant . As pointed out by Schneider et al., however, inter-specific comparisons of SNP frequency are problematic, given that polymorphism is germplasm-, genomic context- and mating system-dependent. About two thirds of the SNPs proved to be transitions (Figure 3), which have generally been found to be the predominant type [23, 25–27]. The transition/transversion ratio has been suggested to be high in a situation where a low level of genetic divergence applies, decreasing as the genetic distance between the comparator genomes rises [28, 29]. The relatively high ratio of 1.65 probably therefore reflects the overall low level of polymorphism between the two mapping parents, as is generally the case within the cultivated gene pool of eggplant . A rather high frequency of C/T alleles was observed, as also noted for bean , maize  and Citrus spp. [21, 31]. In about 25% of the SNP loci, there was no additional sequence variation in either the upstream or the downstream 60 bp and almost all of them (2,354 out of 2,435) were associated with a quality score > 0.4 (the minimum threshold for the GoldenGate assay) and 2,201 produced a score of > 0.6. The identification of > 10,000 potential SNPs is clearly a major advance for eggplant genotyping; incorporation of a sample of the 2,354 high quality SNPs into a GoldenGate assay would certainly saturate the '305E40' × '67/3' linkage map, while many of the remaining ~8,000 SNPs could be assayed by other technologies, such as the Affymetrix SNP chip or the High Resolution Melting technique .
The successful identification of a large number of SNP (and indel and SSR) markers highlights the utility of the RAD approach for uncovering genome-wide polymorphisms, especially in materials with low polymorphism . The versatility of the method lies in the ease with which different samples of the genome can be accessed merely by changing the identity of the restriction enzyme(s) used to cleave the genomic DNA; its particular advantage in the context of SNP discovery lies in the ease of aligning short DNA fragments between contrasting templates. Note also that the application of Illumina sequencing allowed for the identification of polymorphic sites outside of the restriction enzyme recognition site .
Genetic diversity revealed by SNP markers
A sample of 384 of the 2,201 highest quality SNPs (score > 0.6) was assembled into a GoldenGate assay, which was then applied to genotype 23 S. melongena templates (Table 2), a representative panel of eggplant germplasm which captured a large part of variation with respect to fruit shape and colour (including '305E40' and '67/3'), together with one accession of S. aethiopicum. Of these, 343 produced non- ambiguous data, a percentage in agreement with that previously reported in maize  and soybean . The two duplicated genotypes included as internal controls gave consistent calls, indicating that the assay was highly robust. The frequency of missed calls was ~ 0.6% among the eggplant templates, but was 16.0% for the S. aethiopicum template. PIC values ranged from 0.29 to 0.5 (mean 0.43), with 240 of the markers producing a PIC value > 0.4, a level which is suitable for genetic diversity analyses. The phylogeny of the germplasm accessions based on these SNPs suggested the presence of two major clades (Figure 4); one included '305E40' together with its progenitors 'Dourga', 'Tal1/1', 'DR2' and S. aethiopicum, while the second included '67/3'. Within each of these major clades, a number of sub-clades correlated with fruit shape could be recognized. Thus, the phenotypic divergence between the pair of mapping parents appears to be representative of the genetic variation present within the cultivated gene pool.
Identification of SSRs
A screen of the SM-I dataset resulted in the identification of 1,797 sequences containing 1,877 putative SSRs. A small number of these SSRs (22) were discarded as they had already been previously identified [7, 10, 11]. The SSR was present in both mapping parents for 1,145 sequences, in '305E40' alone for 381 sequences, and in '67/3' alone for 329 sequences. At least 1,119 sequences permitted the design of PCR primers, leading to the generation of 1,155 putative markers (Additional file 4). About 4.1% of the SM-I sequences contained an SSR (equivalent to a density of one SSR per 9.0 Kb), which is comparable to the success rate recorded from ESTs of eggplant [7, 10] and tomato , somewhat higher than in potato  but lower than in either coffee  or sweet pepper [36, 37]. Thus the RAD technique appears to offer an effective means of discovering SSRs, especially given the understanding that SSRs are more common in transcribed rather than in genomic sequences .
The most abundant repeat motif among the RAD SSRs were trinucleotides (34.6%), followed by dinucleotides (18.6%) and pentanucleotides (16.6%) (Table 3), consistent with the observations of Stagel et al.. The most common di- and tri-nucleotide motifs were AT (9.6%) and AAC (19.0%) (Table 4), in contrast to the observation in a previous study, where AG and AAG were the predominant motifs [10, 14, 36–39]. On the other hand Shirasawa et al. showed that among tomato genomic SSRs, AAT is the most abundant trinucleotide and AT the most abundant dinucleotide motif, while in wheat, AAC is the predominant trinucleotide SSR motif . SSRs composed of either AGG and CCG repeats were rather rare, as reported by Stagel et al. in eggplant but also in other recent studies on Epimidium sagittatum and Vigna radiata. These particular motifs are relatively uncommon in dicotyledonous plant genomes [14, 36], although they do feature in monocotyledonous ones [38, 43]. Among the 160 mononucleotide SSRs detected, 151 were A/T; these loci have been suggested as providing a means of filling gaps in linkage maps constructed with higher order SSRs .
The RAD method was highly successful for the rapid and large-scale discovery of DNA markers, even in a species recognized to be low polymorphic. Applied to a pair of eggplant mapping parents, the approach was able to define over 10,000 SNPs, 1,600 indels and 1,800 putative SSRs. The current eggplant genetic maps are far from saturated, and as such have had little impact on breeding. The early maps were based on a wide cross, as this was considered necessary to achieve a sufficient level of polymorphism for the markers then available. With the rapid advances being made in sequencing technology, it is now possible to work with intra-specific crosses which are more relevant to the breeder. The present study has generated a large number of SNP, indel and SSR assays, which should permit the rapid saturation of the best available intra-specific genetic map .
Our primary goal was the identification of SNP markers, however data from RAD tags sequencing made it also possible the identification of SSR motifs and respective primers pairs for their amplification. The multi-allelic SSR markers are currently widely applied for both genetic mapping and diversity analyses, despite their cost for development and their limited throughput capabilities . During the last few years the exploitation of publicly available EST sequences leaded to the identification of several thousands of new SSRs markers in a wide range of vegetables species like tomato, pepper, globe artichoke, Brassica, as well as eggplant [10, 36, 37, 39, 44, 45]
The GoldenGate SNP array was highly robust for S. melongena germplasm, but also has potential for a wide-cross population as 84% of the loci were scorable in a contrast between cultivated eggplant and its relative S. aethiopicum. Since these DNA markers define a specific position in the eggplant genome, they should be useful for merging the various genetic linkage maps currently available, some of which include loci related to important agronomic traits. Finally, the markers are very informative for the analysis of genetic diversity, as well as for comparative studies across species within the Solanaceae family.
Plant materials and DNA isolation
DNA was extracted from the two eggplant lines '305E40' and '67/3', which are the parents of an F2 intra-specific mapping population . The female parent, double-haploid line '305E40', produces long, highly pigmented dark purple fruit. The parent '305E40' is an introgression line derived from the somatic hybrid S. melongena cv. 'Dourga'(+)S. aethiopicum which was backcrossed with a tetraploid plant of the eggplant line 'DR2' and then subjected to anther culture; an anther-derived dihaploid plant was backcrossed 4 times with the line 'Tal1/1', then selfed two times and, finally, made completely homozygous through anther culture [17, 44]. The male parent, line '67/3', was an F8 selection from the intra-specific cross cvs. 'Purpura' × 'CIN2'. Its fruit is round and violet coloured. The DNAs extracted from a set of 23 accessions (including the two mapping parents) representative of the S. melongena gene pool (Table 2), together with an accession of S. aethiopicum (a progenitor of '305E40') were tested with a subset of the newly developed SNP assays. All DNA samples were extracted from young leaves, using the GenElute™ Plant Genomic DNA Miniprep kit (Sigma, St. Louis, MO), following the manufacturer's protocol.
RAD library preparation, sequencing, assembly
The RAD library was constructed at Floragenex Inc. (USA), according to the protocol described by Baird et al., as follows. Genomic DNA (300 ng) was digested for 60 min at 37° C in a 50 μL reaction containing 20 U each of SgrA I and Pst I (New England Biolabs, Beverly MA, USA). The reactions were stopped by holding at 65° C for 20 min. The P1 adapter (a modified Illumina adapter, see Baird et al. was ligated to the products of the restriction reaction, and the "barcoding" of the various samples was achieved with a set of index nucleotides in the P1 adapter sequence. A 2.5 μL aliquot of 100 nM P1 adapter was added to each sample, along with 1 μL 10 mM ATP (Promega), 1 μL 10 × NEB Buffer4, 1 μL (equivalent to 1,000 U) T4 DNA ligase (Enzymatics, Inc) and 5 μL water, and the reaction was incubated at room temperature for 20 min, and then heat-inactivated (20 min at 65° C). The reactions were then pooled and the products randomly sheared to a mean size of 500 bp using a Bioruptor (Diagenode). The material was electrophoresed through a 1.5% agarose gel, and the DNA in the range 300-800 bp isolated using a MinElute Gel Extraction Kit (Qiagen). The dsDNA ends were treated with end blunting enzymes (Enzymatics, Inc) to remove overhangs, and the samples purified by passing through a MinElute column (Qiagen). 3'-adenine overhangs were then added by the addition of 15 U Klenow exo- (Enzymatics), followed by an incubation at 37° C for 10 min. Following re-purification, 1 μL 10 μM P2 adapter (a modified Illumina adapter, see Baird et al.) was ligated, as described above for P1. The samples were then purified as above, and eluted in a volume of 50 μL. Following quantification (Qubit fluorimeter), 20 ng were taken as the template for a 100 μL PCR containing 20 μL Phusion Master Mix (NEB), 5 μL 10 μM P1 adapter primer (Illumina), 5 μL 10 μM P2 adapter primer (Illumina) and water. The Phusion PCR settings followed product guidelines (NEB) over 18 cycles. The amplicons were gel purified, the size range 300-700 bp was excised from the gel, its DNA content adjusted to 3 ng/μL. RADs from each parent were sequenced on a Genome Analyzer II (Illumina, San Diego, CA) using paired end 54 bp sequence reads. The paired end sequences from each parent were pooled and segregated by single read RAD sequences. Velvet  was used to assemble consensus LongRead contigs from the paired end data. Repetitive element occurrence was searched via CENSOR, a software tool which screens query sequences against a reference collection of repeats (http://www.girinst.org/censor; ), adopting default parameters and considering Viridiplantae as target database.
CAP3  algorithm was used to identify sequences in common between the mapping parents using default parameters with some modifications (overlap length cut-off = 80 and overlap percent identity cut-off = 95). The resulting dataset (SM-I; Solanum melongena-Illumina) included singlets from '67/3' and '305E40' as well as contigs deriving from both RAD rounds. A stand-alone BLAST tool was used to provide the optimal annotation for each dataset.
A BlastN search was performed against the SGN Cornell unigene database (http://solgenomics.net/), using as cut-off parameters 90% identity and a minimum alignment of 100 bp. A second BlastN search was made against the 16 K Fukuoka eggplant unigene dataset (in the article referred as 16 K, http://vegmarks.nivot.affrc.go.jp), using as cut-off parameters 95% identity and a minimum alignment of 100 bp. A BlastX search was carried out against the TAIR9 dataset (http://www.arabidopsis.org), adopting a threshold E-value of e-15. The annotated sequences were assigned a function based on the Gene Ontology tool available at TAIR (http://www.arabidopsis.org/tools/bulk/go/), using A. thaliana orthologs as input (AGI codes), and mapped to higher level categories (plant GO Slim) using GOSlimViewer  according to the three principal GO categories "molecular function", "biological process" and "cellular localization" .
SNPs were called using a short read alignment algorithm  which aligned non-assembled 50 bp Illumina reads from '67/3' against the '305E40' assembly, by analogy with the MAQ style sequence pileup  at a minimum coverage of 6x; to call indels, an SSAHA-based alignment strategy  was applied. Both SNPs and indels were regarded as true polymorphisms, when each allele was observed at least three times.
Each SNP was assigned a designability score via a dedicated "assay design tool" (http://www.illumina.com), which identified SNP loci free of other polymorphisms 60 bp either upstream or downstream. A quality score, based on the probability of good performance using the Illumina Golden Gate assay, was assigned to each SNP, where a score > 0.6 indicated a high probability of success.
Genetic diversity assessment based on the GoldenGate assay
The GoldenGate assay (Illumina, San Diego, CA) was used for SNP genotyping at the UC Davis Genome Center. Automatic allele calling for each locus was obtained by GenCall software (Illumina). As an internal control, two duplicate templates were included in each run. An estimate of PIC (Polymorphism Information Content) was made following the suggestion of Anderson et al.. Each SNP locus was scored in binary fashion. A co-phenetic distance matrix based on co-dominant markers was generated, as described by Smouse et al. and used to construct a UPGMA-based dendrogram as implemented within NTSYS software package v2.10 .
SSR motifs were identified by SciRoKo software . Both perfect and imperfect mono, di-, tri-, tetra-, penta- and hexanucleotide motifs were targeted. Primer pairs were designed from the flanking sequences using PRIMER3 software  in batch mode, as implemented in the SciRoKo package. The target amplicon size range was set as 125-250 bp, the optimal annealing temperature 60° C, and the optimal primer length 20 bp.
Polignano G, Uggenti P, Bisignano V, Gatta Della C: Genetic divergence analysis in eggplant (Solanum melongena L.) and allied species. Genetic Resources and Crop Evolution. 2010, 57 (2): 171-181. 10.1007/s10722-009-9459-6.
Daunay M, Lester R, Ano G: Eggplant. Tropical plant breeding. Edited by: Charrier, A, Jacquot, M, Hamon, S & Nicolas, D. CIRAD, Paris, France, 199-222.
Arumuganathan K, Earle E: Nuclear DNA content of some important plant species. Plant Molecular Biology Reporter. 1991, 9 (3): 208-218. 10.1007/BF02672069.
Doganlar S, Frary A, Daunay M, Lester R, Tanksley S: A comparative genetic linkage map of eggplant (Solanum melongena) and its implications for genome evolution in the Solanaceae. Genetics. 2002, 161 (4): 1697-1711.
Wu F, Eannetta N, Xu Y, Tanksley S: A detailed synteny map of the eggplant genome based on conserved ortholog set II (COSII) markers. Theoretical and Applied Genetics. 2009, 118 (5): 927-935. 10.1007/s00122-008-0950-9.
Nunome T, Ishiguro K, Yoshida T, Hirai M: Mapping of fruit shape and color development traits in eggplant (Solanum melongena L.) based on RAPD and AFLP markers. Breeding science. 2001, 51 (1): 19-26. 10.1270/jsbbs.51.19.
Nunome T, Negoro S, Kono I, Kanamori H, Miyatake K, Yamaguchi H, Ohyama A, Fukuoka H: Development of SSR markers derived from SSR-enriched genomic library of eggplant (Solanum melongena L.). Theoretical and Applied Genetics. 2009, 119 (6): 1143-1153. 10.1007/s00122-009-1116-0.
Nunome T, Suwabe K, Iketani H, Hirai M: Identification and characterization of microsatellites in eggplant. Plant Breeding. 2003, 122 (3): 256-262. 10.1046/j.1439-0523.2003.00816.x.
Barchi L, Lanteri S, Portis E, Stagel A, Vale G, Toppino L, Rotino GL: Segregation distortion and linkage analysis in eggplant (Solanum melongena L.). Genome. 2010, 53 (10): 805-815. 10.1139/G10-073.
Stàgel A, Portis E, Toppino L, Rotino GL, Lanteri S: Gene-based microsatellite development for mapping and phylogeny studies in eggplant. BMC Genomics. 2008, 9: 357-10.1186/1471-2164-9-357.
Fukuoka H, Yamaguchi H, Nunome T, Negoro S, Miyatake K, Ohyama A: Accumulation, functional annotation, and comparative analysis of expressed sequence tags in eggplant (Solanum melongena L.), the third pole of the genus Solanum species after tomato and potato. Gene. 2010, 450 (1-2): 76-84. 10.1016/j.gene.2009.10.006.
Miller MR, Dunham JP, Amores A, Cresko WA, Johnson EA: Rapid and cost-effective polymorphism identification and genotyping using restriction site associated DNA (RAD) markers. Genome Research. 2007, 17 (2): 240-248. 10.1101/gr.5681207.
Baird NA, Etter PD, Atwood TS, Currey MC, Shiver AL, Lewis ZA, Selker EU, Cresko WA, Johnson EA: Rapid SNP Discovery and Genetic Mapping Using Sequenced RAD Markers. PLoS ONE. 2008, 3 (10): e3376-10.1371/journal.pone.0003376.
Zeng S, Xiao G, Guo J, Fei Z, Xu Y, Roe B, Wang Y: Development of a EST dataset and characterization of EST-SSRs in a traditional Chinese medicinal plant, Epimedium sagittatum (Sieb. Et Zucc.) Maxim. BMC Genomics. 2010, 11 (1): 94-10.1186/1471-2164-11-94.
Harris M, Clark J, Ireland A, Lomax J, Ashburner M, Foulger R, Eilbeck K, Lewis S, Marshall B, Mungall C, Richter J, Rubin GM, Blake JA, Bult C, Dolan M, Drabkin H, Eppig JT, Hill DP, Ni L, Ringwald M, Balakrishnan R, Cherry JM, Christie KR, Costanzo MC, Dwight SS, Engel S, Fisk DG, Hirschman JE, Hong EL, Nash RS, et al: The Gene Ontology (GO) database and informatics resource. Nucleic Acids Research. 2004, 32 (Database): D258-261.
Varshney R, Hiremath P, Lekha P, Kashiwagi J, Balaji J, Deokar A, Vadez V, Xiao Y, Srinivasan R, Gaur P, Siddique KHM, Town CD, Hoisington DA: A comprehensive resource of drought- and salinity-responsive ESTs for gene discovery and marker development in chickpea (Cicer arietinum L.). BMC Genomics. 2009, 10: 523-10.1186/1471-2164-10-523.
Toppino L, Vale G, Rotino GL: Inheritance of Fusarium wilt resistance introgressed from Solanum aethiopicum Gilo and Aculeatum groups into cultivated eggplant (S.melongena) and development of associated PCR-based markers. Molecular Breeding. 2008, 22 (2): 237-250. 10.1007/s11032-008-9170-x.
Simko I, Haynes KG, Jones RW: Assessment of Linkage Disequilibrium in Potato Genome With Single Nucleotide Polymorphism Markers. Genetics. 2006, 173 (4): 2237-2245. 10.1534/genetics.106.060905.
Velasco R, Zharkikh A, Troggio M, Cartwright DA, Cestaro A, Pruss D, Pindo M, Fitzgerald LM, Vezzulli S, Reid J, Malacarne G, Iliev D, Coppola G, Wardell B, Micheletti D, Macalma T, Facci M, Mitchell JT, Perazzolli M, Eldredge G, Gatto P, Oyzerski R, Moretto M, Gutin N, Stefanini M, Chen Y, Segala C, Davenport C, Dematté L, Mraz A, et al: A High Quality Draft Consensus Sequence of the Genome of a Heterozygous Grapevine Variety. PLoS ONE. 2007, 2 (12): e1326-10.1371/journal.pone.0001326.
Barker G, Edwards K: A genome-wide analysis of single nucleotide polymorphism diversity in the world's major cereal crops. Plant Biotechnology Journal. 2009, 7 (4): 318-325. 10.1111/j.1467-7652.2009.00412.x.
Jiang D, Ye QL, Wang FS, Cao L: The Mining of Citrus EST-SNP and Its Application in Cultivar Discrimination. Agricultural Sciences in China. 2010, 9 (2): 179-190. 10.1016/S1671-2927(09)60082-1.
Van Deynze A, Stoffel K, Buell CR, Kozik A, Liu J, van der Knaap E, Francis D: Diversity in conserved genes in tomato. BMC Genomics. 2007, 8: 9-10.1186/1471-2164-8-9.
Jung J, Park S, Liu W, Kang B: Discovery of single nucleotide polymorphism in Capsicum and SNP markers for cultivar identification. Euphytica. 2010, 175 (1): 91-107. 10.1007/s10681-010-0191-2.
Feltus FA, Wan J, Schulze SR, Estill JC, Jiang N, Paterson AH: An SNP Resource for Rice Genetics and Breeding Based on Subspecies Indica and Japonica Genome Alignments. Genome Research. 2004, 14 (9): 1812-1819. 10.1101/gr.2479404.
Schneider K, Kulosa D, Soerensen T, Mohring S, Heine M, Durstewitz G, Polley A, Weber E, Jamsari , Lein J, Hohmann U, Tahiro E, Weisshaar B, Schulz B, Koch G, Jung C, Ganal M: Analysis of DNA polymorphisms in sugar beet (Beta vulgaris L.) and development of an SNP-based map of expressed genes. Theoretical and Applied Genetics. 2007, 115 (5): 601-615. 10.1007/s00122-007-0591-4.
Riju A, Arunachalam V: Interspecific differences in single nucleotide polymorphisms (SNPs) and indels in expressed sequence tag libraries of oil palm Elaeis guineensis and E. oleifera. Available from Nature Preceding. 2009, [http://precedings.nature.com/documents/3593/version/2]
Batley J, Barker G, O'Sullivan H, Edwards K, Edwards D: Mining for single nucleotide polymorphisms and insertions/deletions in maize expressed sequence tag data. Plant Physiology. 2003, 132 (1): 84-91. 10.1104/pp.102.019422.
Holmquist R: Transitions and transversions in evolutionary descent: An approach to understanding. Journal of Molecular Evolution. 1983, 19 (2): 134-144. 10.1007/BF02300751.
Yang Z, Yoder AD: Estimation of the Transition/Transversion Rate Bias and Species Sampling. Journal of Molecular Evolution. 1999, 48 (3): 274-283. 10.1007/PL00006470.
Ramirez M, Graham M, Blanco-Lopez L, Silvente S, Medrano-Soto A, Blair M, Hernandez G, Vance C, Lara M: Sequencing and Analysis of Common Bean ESTs. Building a Foundation for Functional Genomics. Plant Physiology. 2005, 137: 1211-1227. 10.1104/pp.104.054999.
Terol J, Naranjo M, Ollitrault P, Talon M: Development of genomic resources for Citrus clementina: characterization of three deep-coverage BAC libraries and analysis of 46,000 BAC end sequences. BMC Genomics. 2008, 9: 423-10.1186/1471-2164-9-423.
Wittwer CT, Reed GH, Gundry CN, Vandersteen JG, Pryor RJ: High-Resolution Genotyping by Amplicon Melting Analysis Using LCGreen. Clinical Chemistry. 2003, 49 (6): 853-860. 10.1373/49.6.853.
Wu X, Ren C, Joshi T, Vuong T, Xu D, Nguyen H: SNP discovery by high-throughput sequencing in soybean. BMC Genomics. 2010, 11 (1): 469-10.1186/1471-2164-11-469.
Yan J, Yang X, Shah T, Sánchez-Villeda H, Li J, Warburton M, Zhou Y, Crouch J, Xu Y: High-throughput SNP genotyping with the GoldenGate assay in maize. Molecular Breeding. 2010, 25 (3): 441-451. 10.1007/s11032-009-9343-2.
Hyten D, Song Q, Choi I, Yoon M, Specht J, Matukumalli L, Nelson R, Shoemaker R, Young N, Cregan P: High-throughput genotyping with the GoldenGate assay in the complex genome of soybean. Theoretical and Applied Genetics. 2008, 116 (7): 945-952. 10.1007/s00122-008-0726-2.
Kumpatla S, Mukhopadhyay S: Mining and survey of simple sequence repeats in expressed sequence tags of dicotyledonous species. Genome. 2005, 48: 985-998. 10.1139/g05-060.
Portis E, Nagy I, Sasvari Z, Stagel A, Barchi L, Lanteri S: The design of Capsicum spp. SSR assays via analysis of in silico DNA sequence, and their potential utility for genetic mapping. Plant Science. 2007, 172: 640-648. 10.1016/j.plantsci.2006.11.016.
Morgante M, Hanafey M, Powell W: Microsatellites are preferentially associated with nonrepetitive DNA in plant genomes. Nature Genetetics. 2002, 30: 194-200. 10.1038/ng822.
Nagy I, Stagel A, Sasvari Z, Roder M, Ganal M: Development, characterization, and transferability to other Solanaceae of microsatellite markers in pepper (Capsicum annuum L.). Genome. 2007, 50: 668-688. 10.1139/G07-047.
Shirasawa K, Asamizu E, Fukuoka H, Ohyama A, Sato S, Nakamura Y, Tabata S, Sasamoto S, Wada T, Kishida Y, Tsuruoka H, Fujishiro T, Yamada M, Isobe S: An interspecific linkage map of SSR and intronic polymorphism markers in tomato. Theoretical and Applied Genetics. 2010, 121 (4): 731-739. 10.1007/s00122-010-1344-3.
Kantety R, La Rota M, Matthews D, Sorrells M: Data mining for simple sequence repeats in expressed sequence tags from barley, maize, rice, sorghum and wheat. Plant Molecular Biology. 2002, 48: 501-510. 10.1023/A:1014875206165.
Tangphatsornruang S, Sangsrakru D, Chanprasert J, Uthaipaisanwong P, Yoocha T, Jomchai N, Tragoonrung S: The Chloroplast Genome Sequence of Mungbean (Vigna radiata) Determined by High-throughput Pyrosequencing: Structural Organization and Phylogenetic Relationships. DNA Research. 2010, 17 (1): 11-22. 10.1093/dnares/dsp025.
La Rota M, Kantety R, Yu J, Sorrells M: Nonrandom distribution and frequencies of genomic and EST-derived microsatellite markers in rice, wheat, and barley. BMC Genomics. 2005, 6: 23-10.1186/1471-2164-6-23.
Tang J, Baldwin S, Jacobs J, Van der Linden CG, Voorrips RE, Leunissen JAM, Van Eck HJ, Vosman B: Large-scale identification of polymorphic microsatellites using an in silico approach. BMC Bioinformatics. 2008, 9: 374-10.1186/1471-2105-9-374.
Scaglione D, Acquadro A, Portis E, Taylor C, Lanteri S, Knapp S: Ontology and diversity of transcript-associated microsatellites mined from a globe artichoke EST database. BMC Genomics. 2009, 10: 454-10.1186/1471-2164-10-454.
Rizza F, Mennella G, Collonnier C, Shiachakr D, Kashyap V, Rajam M, Prestera M, Rotino GL: Androgenic dihaploids from somatic hybrids between Solanum melongena and S. aethiopicum group gilo as a source of resistance to Fusarium oxysporum f. sp melongenae. Plant Cell Reports. 2002, 20 (11): 1022-1032. 10.1007/s00299-001-0429-5.
Zerbino DR, Birney E: Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Research. 2008, 18 (5): 821-829. 10.1101/gr.074492.107.
Kohany O, Gentles AJ, Hankus L, Jurka J: Annotation, submission and screening of repetitive elements in Repbase: RepbaseSubmitter and Censor. BMC Bioinformatics. 2006, 7: 474-10.1186/1471-2105-7-474.
Huang X, Madan A: CAP3: A DNA Sequence Assembly Program. Genome Research. 1999, 9 (9): 868-877. 10.1101/gr.9.9.868.
McCarthy F, Wang N, Magee GB, Nanduri B, Lawrence M, Camon E, Barrell D, Hill D, Dolan M, Williams WP, Luthe DS, Bridges SM, Burgess SC: AgBase: a functional genomics resource for agriculture. BMC Genomics. 2006, 7 (1): 229-10.1186/1471-2164-7-229.
Needleman SB, Wunsch CD: A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology. 1970, 48 (3): 443-453. 10.1016/0022-2836(70)90057-4.
Li H, Ruan J, Durbin R: Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Research. 2008, 18 (11): 1851-1858. 10.1101/gr.078212.108.
Ning Z, Cox A, Mullikin J: SSAHA: A fast search method for large DNA databases. Genome Research. 2001, 1725-1729.
Anderson J, Churcill G, Autrique J, Tanksley S, Sorrels M: Optimizing parental selection for genetic linkage maps. Genome. 1992, 36: 181-186.
Smouse PE, Peakall R: Spatial autocorrelation analysis of individual multiallele and multilocus genetic structure. Heredity. 1999, 82 (5): 561-573. 10.1038/sj.hdy.6885180.
Rohlf F: NTSYS-pc Numerical Taxonomy and Multivariate Analysis System version 2.02 User Guide. 1998
Kofler R, Schlotterer C, Lelley T: SciRoKo: a new tool for whole genome microsatellite search and investigation. Bioinformatics. 2007, 23 (13): 1683-1685. 10.1093/bioinformatics/btm157.
Rozen S, Skaletsky H: Primer3 on the www for general users and for biologist programmers. Methods Molecular Biology. 2000, 132: 365-386.
This research was partially supported by the Italian Ministry of Agricultural Alimentary and Forest Politics in the framework of "PROM", "ESPLORA" and "AGRONANOTECH" projects
SL and GLR planned and supervised the work. LB carried out BLAST analyses, SSR primer design and sequence annotation; EP carried out the diversity analysis, AA supervised the BLAST analyses, LT and GLR provided plant materials; GV contributed to SNP identification. All the authors read and approved the final version of the manuscript.