A high-density genetic map of Arachis duranensis, a diploid ancestor of cultivated peanut

Background Cultivated peanut (Arachis hypogaea) is an allotetraploid species whose ancestral genomes are most likely derived from the A-genome species, A. duranensis, and the B-genome species, A. ipaensis. The very recent (several millennia) evolutionary origin of A. hypogaea has imposed a bottleneck for allelic and phenotypic diversity within the cultigen. However, wild diploid relatives are a rich source of alleles that could be used for crop improvement and their simpler genomes can be more easily analyzed while providing insight into the structure of the allotetraploid peanut genome. The objective of this research was to establish a high-density genetic map of the diploid species A. duranensis based on de novo generated EST databases. Arachis duranensis was chosen for mapping because it is the A-genome progenitor of cultivated peanut and also in order to circumvent the confounding effects of gene duplication associated with allopolyploidy in A. hypogaea. Results More than one million expressed sequence tag (EST) sequences generated from normalized cDNA libraries of A. duranensis were assembled into 81,116 unique transcripts. Mining this dataset, 1236 EST-SNP markers were developed between two A. duranensis accessions, PI 475887 and Grif 15036. An additional 300 SNP markers also were developed from genomic sequences representing conserved legume orthologs. Of the 1536 SNP markers, 1054 were placed on a genetic map. In addition, 598 EST-SSR markers identified in A. hypogaea assemblies were included in the map along with 37 disease resistance gene candidate (RGC) and 35 other previously published markers. In total, 1724 markers spanning 1081.3 cM over 10 linkage groups were mapped. Gene sequences that provided mapped markers were annotated using similarity searches in three different databases, and gene ontology descriptions were determined using the Medicago Gene Atlas and TAIR databases. Synteny analysis between A. duranensis, Medicago and Glycine revealed significant stretches of conserved gene clusters spread across the peanut genome. A higher level of colinearity was detected between A. duranensis and Glycine than with Medicago. Conclusions The first high-density, gene-based linkage map for A. duranensis was generated that can serve as a reference map for both wild and cultivated Arachis species. The markers developed here are valuable resources for the peanut, and more broadly, to the legume research community. The A-genome map will have utility for fine mapping in other peanut species and has already had application for mapping a nematode resistance gene that was introgressed into A. hypogaea from A. cardenasii.


Background
Cultivated peanut (Arachis hypogaea L.) is a major crop in most tropical and subtropical areas of the world and provides a significant source of oil and protein to large segments of the population in Asia, Africa and the Americas. In the U. S., peanut is a high-value cash crop of regional importance, with major production areas concentrated in the Southeast. Plant breeding efforts to pyramid genes for disease and insect resistances, quality, and yield is hampered by the polyploid genetics of the crop species, the multigenic nature of many traits (e.g., yield), and the difficulty of selecting for many traits in the field (e.g., soil borne diseases). Thus, secondary selection methods that are environmentally neutral would greatly facilitate crop improvement efforts. Molecular markers fit this criterion, but only recently have markers been developed that reveal sufficient polymorphisms in A. hypogaea and related species to have widespread application in peanut breeding. Preliminary steps for utilizing molecular markers for crop improvement are developing collections of polymorphic markers and utilizing them to construct dense and high-resolution genetic maps.
Constructing a high-quality genetic map depends largely upon finding one or more marker systems that can detect high levels of polymorphism between two individual parents. Unfortunately, low levels of molecular polymorphism were observed within tetraploid (2n = 4x = 40) A. hypogaea throughout the 1990s and early 2000s with the marker systems available at that time [1,2]. However, compared with the limited numbers of polymorphic markers detected for the tetraploid, the same marker systems can uncover high levels of molecular polymorphism within and between the diploid (2n = 2x = 20) peanut species. This polymorphism led researchers to create molecular maps for Arachis. The first molecular map in peanut was constructed between the diploids A. stenosperma Krapov. and W.C. Gregoryx and A. cardenasii Krapov. and W.C. Gregory by Halward et al. [3] who used Restriction Fragment Length Polymorphisms (RFLPs) to associate 117 markers into 11 linkage groups. Additional maps were subsequently published using Randomly Amplified Polymorphic DNA (RAPD) [4] and Simple Sequence Repeats (SSRs) [5,6]. Burow et al. [7] published the first tetraploid map in peanut based on 370 RFLP loci across 23 linkage groups by utilizing the complex interspecific cross, Florunner × 4x [A. batizocoi Krapov. and W.C. Gregory (A. cardenasii × A. diogoi Hoehne)]. Another interspecific tetraploid linkage map of 298 loci and 21 linkage groups was derived from a backcross population between A. hypogaea and a synthetic amphidiploid [8].
Only recently have linkage maps been developed from crosses between A. hypogaea genotypes, most with less than 200 loci and with more than the expected 20 linkage groups [9][10][11][12][13]. An exception is the recently published map containing 1114 loci across 21 linkage groups that was constructed in part with highly polymorphic markers derived from sequences harboring miniature inverted repeat transposable elements [14]. Therefore, there is a continuing need to generate dense linkage maps for the cultivated tetraploid peanut that will not only cluster the markers into the expected 20 linkage groups to cover the haplotype chromosomes, but also to facilitate marker-trait association and eventually assist in its genetic improvement.
The domesticated peanut is thought to have arisen from a single hybridization event between two diploid wild species followed by whole genome duplication approximately 3,500 years ago [15]. This short evolutionary history, along with hybridization barriers between diploids and the tetraploid have resulted in a narrow genetic base for the cultivated tetraploid peanut. On the contrary, diploid Arachis species are genetically diverse, have simpler inheritance patterns, and most importantly, contain a rich source of agronomically important traits for peanut improvement. Due to these attributes, diploid Arachis species have been proposed as model systems to map the peanut genome. Because the genomes of progenitor diploid species [i.e., A. duranensis (A-genome donor) and A. ipaensis (B-genome donor)] are closely allied to the cultivated peanut [16], mapping the genome of one or both of these species should be useful for predicting the positions of loci in the cultivated peanut. This approach has been employed in wheat [17,18], alfalfa [19,20], oat [21], and other crop species.
One accession of A. ipaensis and 67 accessions of A. duranensis have been collected in South America. The largest concentration of A. duranensis is in southern Bolivia and northern Argentina, with a few populations being reported in Paraguay and one in central Brazil [22,23]. The species is morphologically diverse and the Bolivia and Argentina types can be separated cytogenetically and morphologically [24]. Due to the availability of diverse accessions to produce intraspecific crosses in the greenhouse, a dense linkage map in the diploid species A. duranensis was produced using large numbers of molecular markers derived from transcribed sequences.

Species relationships
A preliminary study of SSR marker variation among 37 A. duranensis accessions using 556 markers indicated that the species is highly polymorphic at the molecular level and individual accessions could be separated based on a cluster analysis ( Figure 1). Interestingly, we found that A. ipaensis, the proposed B-genome (BB) progenitor species, clustered with the A-genome (AA) species A. stenosperma and not with the B-genome species A. batizocoi. Recent molecular cytogenetic analysis of Aand non-A-(i.e., B-) genome species suggests that karyotype diversity among non-A-genome species is extensive enough to support separation into additional genome classes where A. ipaensis remains in B sensu stricto while A. batizocoi is placed into a separate group [25]. Therefore, A. batizocoi is less typical of B-genome species.
The number of polymorphic SSR markers between paired A. duranensis accessions ranged from 160 to 375 out of 556, which is 29 to 67% of the total number of SSR markers screened. This is a significant amount of variation, which indicates the high genetic diversity within the species. Based on cluster analysis, success of crosses, and fertility of F 1 s, accessions PI 475887 and Grif 15036 were selected for subsequent mapping studies using 94 F 2 progenies. Screening of the parental accessions with 2,138 SSR markers derived from A. hypogaea EST sequences resulted in 1,768 (82.7%) that were scorable (detected by ABI3730XL genotyping systems) and 896 (41.9%) that were polymorphic (Guo Y et al: Comparative mapping in intraspecific populations uncovers a high degree of macrosynteny between A-and B-genome diploid species of peanut, Submitted). The same markers were used to create a map between two A. batizocoi accessions and to determine syntenic relationships between the A and B genome species (Guo Y et al: Comparative mapping in intraspecific populations uncovers a high degree of macrosynteny between A-and B-genome diploid species of peanut, submitted).

Arachis duranensis genetic map
The total number of published SSR markers has now risen beyond the 2,847 cataloged in a related paper by Guo et al. (Guo Y et al: Comparative mapping in intraspecific populations uncovers a high degree of macrosynteny between A-and B-genome diploid species of peanut, submitted) to around 6,000 [26]. Those most recently reported include: 14 by Gimenes et al. [27]; 51 by Mace et al. [28]; 188 by Proite et al. [29]; 104 by Cuc et al. [30]; 138 by Yuan et al. [31]; 33 by Song et al. [32]; 123 by Wang et al. [33]; 290 by Liang et al. [34]; and 1,571 by Koilkonda et al. [35]. Five hundred and ninetyeight of these markers are included in the A. duranensis map ( Figure 2). Of the 34 genomic SSR markers mapped in the current study (Table 1), 24 were mapped previously in an interspecific population between A. duranensis and A. stenosperma [6,36]. These markers served to anchor and align the current and previously published peanut maps ( Figure 2). Linkage group assignments of all markers were consistent between the current map and that of Bertioli et al. [36] except for the marker GM117 (AC3C02 on map in reference 36 derived from GenBank accession DQ099133) that was localized on  chromosome 2A (the ' A' following a chromosome number is presented in this study to represent chromosomes in the A genome of peanut) in their interspecific map, while mapping to chromosome 10A in the A. duranensis intraspecific map. Although detailed information for parental alleles in the study by Bertioli et al. [36] was not presented, GM117 amplified only one locus from each parent in both their population and ours. It is, therefore, unlikely that the marker location discrepancy was due to mapping of multiple loci and perhaps could reflect a small chromosomal rearrangement. Chromosomal rearrangements are not unexpected based on previous cytological observations in the genus [24,37]. EST libraries of A. duranensis were developed to produce Single Nucleotide Polymorphism (SNP) markers for mapping (Table 2). Of the 1,536 SNP markers developed (Additional file 1), 1,054 were included in the A. duranensis map ( Figure 2). The remaining 482 SNP markers were either of low quality (GC quality score <0.25) or they showed segregation patterns (extremely distorted) that could not be mapped. Of the 1,054 mapped SNP markers, 815 were derived from the cDNA sequencing project while the other 239 were genomic legume orthologs.
The A. duranensis map produced in this study contained 1,724 markers combined into 10 linkage groups with a total genetic distance of 1081.3 cM. MSTMap, a software program that accommodates large numbers of markers and utilizes a "minimum spanning tree" algorithm, was used to construct an initial genetic map using only the codominant markers. The 1,673 codominant markers were distributed into 810 co-segregating groups (bins). Although this program has been reported to be accurate for large-scale mapping projects [38], few independent studies are available establishing consistency between MSTMap and other commonly used mapping software [39]. To confirm the linkage group assignments, marker orders, and genetic distances determined by alternative software, both codominant and dominant markers were mapped with Joinmap 3.0. Marker orders and genetic distances were highly consistent between MSTMap and Joinmap 3.0 (Additional file 2).
Significant segregation distortion (p = 0.05) was observed for 513 (29.8%) markers ( Figure 2 Figure 2 High-density linkage map of Arachis duranensis including 1,724 markers. SNP and SSR markers are prefixed by 'SNP' and 'GM', respectively, resistance gene candidate markers are prefixed by 'RGC' and 'GS'. Twenty-four previously published markers (underlined) were selected from an interspecific map between A. duranensis and A. stenosperma [36] to establish synteny between the current and former linkage groups. The original linkage group assignments are given in the marker names separated by the pound (#) sign. Loci with significant segregation distortion (p = 0.05) are labeled with an asterisk. Graphs to the right of the linkage groups represent recombination frequencies. Each data point represents genetic distances between adjacent markers averaged for a window of 20 markers.
populations uncovers a high degree of macrosynteny between A-and B-genome diploid species of peanut, Submitted) found that a single linkage group (4/9B) in A. batizocoi was syntenic with chromosomes 4A and 9A of A. duranensis implicating inversion and reciprocal translocation events as the underlying chromosomal rearrangements in this B-genome species. Recombination frequencies were generally low in the central, presumably centromeric chromosomal regions of A. duranensis and increased toward the telomeres, a pattern typical of many plant species [40,41]. More even distribution was observed along chromosome 3A and only slightly suppressed recombination was observed around the presumable location of the centromere (Figure 2). Across the A. duranensis linkage map, each linkage group spanned on average 108.1 cM (77.3-145.6 cM) and included 172.4 markers (119-266) ( Table 3). This is considerably denser than the previously published AA,  . The A-genome map produced using the interspecific hybrid A. duranensis × A. stenosperma had 339 SSRs that were mapped into 11 linkage groups [6,42]. For A. hypogaea, there are now several maps with the most dense consensus map containing 324 loci on 21 linkage groups [11]. The map produced in the current study is the first high-density map available in peanut, and because it was generated from a progenitor species of A. hypogaea, we anticipate that it will have significant applications for analyzing the cultivated genome. For example, the data generated in this map was used by Nagy et al. [43] to more precisely map the Rma gene for nematode resistance that originated from an introgression line between A. hypogaea and A. cardenasii. The A-genome SNP array also has been useful at the tetraploid level for genotyping a recombinant inbred line population derived from a cross between cultivated peanut and a synthetic A. ipaensis x A. duranensis tetraploid (Ozias-Akins, unpublished).

Gene annotation and comparative mapping
Homology search of the 1,724 mapped loci resulted in significant hits for 1,463 loci in at least one of the three databases: Medicago, Uniprot and GenBank NR database, and 580 of the mapped loci gave significant similarities in either of the two gene ontology databases: Medicago Gene Atlas and TAIR (Additional file 4). Altogether 1,366 gene ontology terms were assigned to the 580 genes. These were distributed among the three major gene ontology categories as follows: 521 molecular functions, 534 biological processes, and 311 cellular components (Additional file 4).
The sequences used to create the A. duranensis map also were compared to the genomes of two legumes where 995 loci on the A. duranensis map could be mapped to M. truncatula, and 2,711 matches could be found in G. max (with potentially two hits per mapped locus). While a majority of the dots in the synteny plots appear to be random (Figure 3), there are definite clusters of markers, and striking examples of colinearity (red arrows), especially for the comparisons to Glycine. Presumably there has been extensive single gene movement since the last common ancestors in one or both species, but many genes remain in the ancestral locations and can be detected. Overall, the synteny patterns for G. max showed the recent whole genome duplication within Glycine, with each location in peanut showing corresponding synteny at two locations in Glycine. Colinearity between Medicago and Arachis is much less conserved than between Arachis and Glycine. This could be due to extensive inversions in either genome, or more likely, due to preliminary ordering of sequences within the Medicago unfinished genome assembly. In general, the patterns showed strong synteny on the chromosomal ends in both genetic and physical distance, while the central regions of chromosomes tended to show less synteny. Presumably this could be attributed to pericentromeric heterochromatin which is known to define less recombinogenic regions where genomic rearrangements are more likely to persist [44]. Chromosome arms tend to be maintained as syntenic between Glycine and Arachis, but there is evidence that chromosome arms have been translocated in some cases so that synteny exists at the chromosome arm level, but less so at the whole chromosome level.

Conclusions
This investigation provided a large number of de novo EST sequences that were deposited into GenBank. The markers developed here are valuable resources for peanut  and, more broadly to the legume research community. This research presents the first high-density molecular map in peanut with 1,724 markers grouped into the 10 expected linkage groups for an A-genome species. Because the map was produced with the progenitor species A. duranensis which contributed the A genome of A. hypogaea, it will serve as the reference map for both wild and cultivated species. Lastly, synteny was found between Arachis and the Glycine and Medicago genomes, which indicates that markers developed for . Arrows indicate clusters of genes in common between the two genomes. For plotting the data on the Y axis, the peanut genome for each chromosome is proportional in size to the total map size in centimorgans. For the X axis, the unit of measure is scaled to bp within the chromosomal assemblies of the respective genomes. The plot was obtained with a visual basic program that plotted the x-y coordinates of each hit. The total number of matches for each pair wise comparison is listed at the upper left corner.
other legume species may be of value for crop improvement in peanut. The A-genome map will have utility for fine mapping in other peanut species and has already had application to mapping a nematode resistance gene that was introgressed to A. hypogaea from A. cardenasii. . Unigenes in the transcript assembly were screened for perfect repeat motifs using SSR-IT http:// www.gramene.org/db/markers/ssrtool) and for imperfect motifs using FastPCR (http://primerdigital.com/fastpcr. html). The repeat count (n) threshold for each motif type was set for n ≥ 5. SSR markers were genotyped on an ABI3730XL Capillary DNA Sequencer (Applied Biosystems, Foster City, CA) using forward primers labelled with FAM, HEX, or TAMRA fluorophores. PCR was performed in a 12 μL reaction mixture containing 1.0 × PCR buffer, 2.5 mM Mg ++ , 0.2 mM each of dNTPs, 5.0 pmol of each primer, 0.5 unit of Taq polymerase, and 10 ng of genomic DNA. Touchdown PCR was used to reduce spurious amplification. The SSR markers were screened for length polymorphisms using GeneMapper 3.0 software (Applied Biosystems, Foster City, CA). Of the 2,138 EST-SSR primer pairs tested, markers derived from 598 could be mapped. A set of 34 SSR markers from genomic sequences of Arachis previously screened for polymorphism between parents of the A. duranensis mapping population (Guo Y et al: Comparative mapping in intraspecific populations uncovers a high degree of macrosynteny between A-and B-genome diploid species of peanut, Submitted) were also mapped ( Table 1).

Single-stranded conformational polymorphism (SSCP) markers
SSCP markers were developed from genomic DNA templates for previously described NBS sequences isolated by targeting conserved sequence motifs in NBS-LRR encoding genes [56,57] and from Arachis unigenes showing similarity to R-gene homologs identified by mining a peanut transcript assembly [43]. SSCP fragments were amplified using touch-down PCR and detected by silver-staining as previously described [58][59][60]. A total of 380 SSCP markers were evaluated for polymorphism between the parents PI 475887 and Grif 15036. The resistance gene analog markers are prefixed by either 'GS' or 'RGC' in the map. cDNA sequences for unigenes targeted for SSCP marker development in the present study were deposited in GenBank (Acc. No. GF100476-GF100638). One additional marker, the SCAR marker S197 linked to a root-knot nematode resistance gene in Arachis hypogaea [43,61] was also mapped.

Development of single nucleotide polymorphism (SNP) markers
Total RNA was isolated from roots of young seedlings (up to four trifoliate) and from developing seeds (up to developmental stage R6) of the two parental genotypes, PI 475887 and Grif 15036 (alias DUR25 and DUR2, respectively). cDNA libraries were developed using the Mint cDNA synthesis kit (Evrogen) and normalized using the Trimmer cDNA normalization kit (Evrogen). cDNA sequences were generated by Sanger and 454 GS-FLX sequencing methods and assembled using the tool Mira [62]. Altogether, more than one million cDNA sequence reads were generated from A. duranensis PI 475887 and Grif 15036. These were assembled into 81,116 unique transcripts (unigenes) (GenBank Accn. No. HP000001-HP081116). Assemblies were searched for single nucleotide polymorphisms (SNPs) that fulfilled the following two criteria: (a) the SNP position is covered at least by two reads from each genotype, and (b) at least 80% of the reads call the SNP in the particular genotype. Using these criteria, we identified 8,478 SNPs in 3,922 unigenes. To facilitate the selection of candidate SNPs for designing and building Illumina GoldenGate SNP genotyping arrays, putative intron positions were predicted by aligning Arachis contigs with Arabidopsis and Medicago genomic DNA sequences identified by BLAST analyses. SNPs within 60 bp of a putative intron were eliminated, thereby reducing the collection of candidate SNPs to 6,789 in 3,264 unigenes from which 1,236 high-quality SNPs, each representing separate unigenes, were selected for genotyping. SNPs were also detected by allele re-sequencing in a subset of 768 conserved legume orthologs identified by coauthors (R.V. Penmetsa, N. Carrasquilla-Garcia, A. D. Farmer and D.R. Cook), and 300 of these SNPs were added to the GoldenGate array. SNP genotyping on the Golden-Gate array was conducted at the Emory Biomarker Service Center, Emory University. The BeadStudio (Illumina) genotyping module was used for calling genotypes. Markers with GC quality scores lower than 0.25 were excluded from subsequent analysis.