A first genetic map of date palm (Phoenix dactylifera) reveals long-range genome structure conservation in the palms
- Lisa S Mathew†1,
- Manuel Spannagl†2,
- Ameena Al-Malki3,
- Binu George1,
- Maria F Torres4,
- Eman K Al-Dous1,
- Eman K Al-Azwani1,
- Emad Hussein5,
- Sweety Mathew1,
- Klaus FX Mayer2,
- Yasmin Ali Mohamoud1,
- Karsten Suhre6 and
- Joel A Malek1, 4, 6Email author
© Mathew et al.; licensee BioMed Central Ltd. 2014
Received: 27 December 2013
Accepted: 4 April 2014
Published: 15 April 2014
The date palm is one of the oldest cultivated fruit trees. It is critical in many ways to cultures in arid lands by providing highly nutritious fruit while surviving extreme heat and environmental conditions. Despite its importance from antiquity, few genetic resources are available for improving the productivity and development of the dioecious date palm. To date there has been no genetic map and no sex chromosome has been identified.
Here we present the first genetic map for date palm and identify the putative date palm sex chromosome. We placed ~4000 markers on the map using nearly 1200 framework markers spanning a total of 1293 cM. We have integrated the genetic map, derived from the Khalas cultivar, with the draft genome and placed up to 19% of the draft genome sequence scaffolds onto linkage groups for the first time. This analysis revealed approximately ~1.9 cM/Mb on the map. Comparison of the date palm linkage groups revealed significant long-range synteny to oil palm. Analysis of the date palm sex-determination region suggests it is telomeric on linkage group 12 and recombination is not suppressed in the full chromosome.
Based on a modified gentoyping-by-sequencing approach we have overcome challenges due to lack of genetic resources and provide the first genetic map for date palm. Combined with the recent draft genome sequence of the same cultivar, this resource offers a critical new tool for date palm biotechnology, palm comparative genomics and a better understanding of sex chromosome development in the palms.
The date palm (Phoenix dactylifera L.) is a critical crop tree in a large section of North Africa, the Middle East and South Asia. It is one of 3 commercially important palms (Arecaceae) that include the monoecious oil and coconut palms. The date palm is dioecious with separate male and female trees. Only the female bears the commercially important date fruit. Trees grown from seed require approximately 6–8 years to flower before gender can be determined and no sex chromosome has been identified . Sex determination in plants is highly heterogeneous and sex chromosomes in the palms appear to have developed more than once offering an excellent model for studying this process [2, 3].
Despite the clear importance of the date palm, few genetic resources exist mainly due to a long generation time of approximately 6 years. A breeding program began in the United States in the 1940’s resulted in backcrossed varieties that exhibited depressed breeding ability and other physiological problems . Previous research into the number of chromosomes in date palm suggests it has 18 chromosome pairs (2n = 36)  though some evidence for alternate numbers have been presented [6, 7]. Technological advances allowed us to generate a draft sequence of a commercial cultivar of date palm  that was recently improved on  and revealed a genome size of approximately 670 Mb. The draft sequence, while critical to understanding gene content and polymorphism, does not provide structural information of the genome. We were able to use the draft sequence to identify scaffolds containing polymorphisms segregating with date palm gender but other studies such as a search for selective sweeps and quantitative trait loci (QTL) would benefit from a genetic map.
Construction of a genetic map in the absence of defined genetic resources is challenging. Specifically, outcrossed pedigrees require the identification of sufficient polymorphic markers between the male and female parents to allow map construction . While date palm females, which bears the fruit, are well defined by cultivar, the males are much less so as they are simply used for pollination. The introduction of genotyping-by-sequencing (GBS) was important in overcoming these challenges and provides a method for genotyping thousands of new markers in potentially hundreds of offspring [11, 12]. Accuracy in the GBS data is critical in our case as errors would result in lack of linkage or highly inflated map distances. We have adapted the method to our requirements by using a reduced representation method of GBS. We employed this adapted approach as each marker needed to be accurately genotyped by high depth sequencing rather than summed over a region as by Andolfatto and colleagues [13, 14]. Here we present a first genetic map for date palm and its comparison to the other commercially important palms.
Results and discussion
Using various genotype quality requirements (see Methods below) such as minimum coverage and alignment scores, a total of 64,783 Single Nucleotide Polymorphisms (SNPs) were considered as genotyped at high quality. Using this method, comparison of ‘genotypes between two Khalas control libraries carried separately through the entire process revealed a genotype accuracy rate of 99.4%. Our previous genome sequence and SNP data from the Khalas cultivar allowed detection of male parent specific SNPs where Khalas was known to be homozygous. These male parent-specific SNPs were used to segregate progeny by which male parent they derived from. Four major populations derived from the same female parent but different male parents were selected for use in the genetic map and contained 29, 24, 17, and 15 individuals.
We obtained 5873 markers that fit our quality requirements, showed similar segregation in the populations and were called in at least 53 of the 85 individuals genotyped. Markers that shared the same segregation pattern across all progeny were grouped and a single representative marker was selected from each group. This collapsing resulted in 1235 representative markers of which 1199 were placed on the framework map. Markers placed on the map represented as few as one marker (especially male/female segregating markers) and as many as 28 markers. The 1199 representative markers represent 3976 total markers and these derive from 2115 unique segregating parent/scaffold combinations. That is, a scaffold could share two markers but this was only allowed if the markers segregated in different parents. The mapped marker set derived from 1823 scaffolds and these scaffolds span a total of 74,079,588 bp or ~ 20% of the 380 Mb of sequence in the assembly.
Genetic map features
Sex chromosome features
Comparison to the recently updated draft sequence
Markers were mapped to the Saudi date palm genome  by BLAST  alignment of ~1000 bp surrounding the SNP. To be considered, BLAST alignments were required to be unique in the genome. 3884 (98%) of the marker regions uniquely aligned to 702 scaffolds in the Saudi date palm sequence. These 702 scaffolds span ~304 Mb (54%) of the 563.5 Mb assembled genome.
High synteny in palm genome structure
To study palm genome structure, sequence surrounding markers on the genetic map were aligned to the recently released oil palm genome . 2000 bp of sequence surrounding the markers were aligned to the oil palm scaffolds and oil palm chromosome locations were documented. It is clear from this comparison (Figure 5) that the two genomes maintain high levels of synteny. Indeed most of the 18 date palm linkage group were syntenic with one of the 16 oil palm chromosomes (Figure 6). How the two genomes diverged from 16 to 18 chromosomes is of interest. Based on telomere repeats detected in putative oil palm chromosome centromeres, it has been suggested that a robertsonian fusion caused the collapse of chromosomes into 2 and 14 in oil palm . Our data supports this theory for oil palm chromosome 2. Synteny determined between the date palm genetic map and the oil palm chromosomes suggests that oil palm chromosome 2 constitutes a fusion of date palm chromosomes 1 and 10 whereas date palm LG11 shows blocks shared with oil palm chromosomes 1, 6 and 10. In addition, longer stretches on oil palm chromosome 1 correspond to regions on date palm chromosomes 4 and 16.
Despite the fact that the oil palm genome is more than double the size (1.535 Gb) of the date palm genome (0.67 Gb), long-range synteny is maintained and overall chromosome structure is very well preserved. It will be of interest to understand if the oil palm has added nearly double the sequence to the genome while maintaining chromosome structure or the date palm has lost ancestral DNA.
We provide a first genetic map for the commercially important date palm tree and strong evidence for identifying the sex chromosome. Using next-generation sequencing we have overcome challenges due to lack of genetic resources for this tree. While markers on our map are not guaranteed to be polymorphic in other mapping populations, they nevertheless provide ordering of scaffolds for improved genome analysis. Additionally, by using SNPs from sequence scaffolds, other polymorphic markers within the same scaffolds could be selected for other populations. The high density of the markers on the map sets a target for future date palm genome sequence projects. Future research should aim for greater than 500 kb average scaffold length to ensure that most of the genome sequences will be properly ordered on the genetic map. To resolve more markers onto the map, it will be critical to expand these approaches with a larger pedigree (such as 200 offspring) derived from a single male and female parent. Extending the methods presented here will allow for an even denser genetic map of date palm and an improved resource for quantitative trait locus mapping.
For the first time we have localized the gender segregating region in date palm to LG12 suggesting this region may contain the sex determination genetic alteration. Using the genetic map we estimate the size of the region to be approximately 5 to 13 Mb and more work will be needed to ensure the true recombination boundaries. Identification of the gender segregating region will facilitate future studies on dioecy and the development of sex chromosomes. Indeed, our comparison of the region to the monoecious oil palm revealed shorter blocks of synteny in this region than the rest of the genome.
Analysis of the genetic map reveals important features in the date, oil and coconut palm genomes including maintenance of long range synteny despite dramatic genome size differences. Analysis of the genetic maps combined with genomic data will assist in the comparison and improvements of these 3 commercially important palms.
Date palm seeds were collected from a single Khalas cultivar individual at the agricultural affairs department farm in Rawdat Al-Faras, Al-Ghuwayriyah, Qatar. Seeds were washed and germinated at 29 C for 1 week and then transferred to potting soil. They were grown with minimal watering until primary leaves were ~15 cm. Leaf samples of 10 cm were collected for DNA extraction. Leaf material from each plant was arrayed into individual wells of 96-well plates, crushed with the TissueLyser II (Qiagen), and DNA extracted with the DNeasy 96 Plant Kit (Qiagen) according to the manufacturer’s protocol. As control for the full process, Khalas (female parent) leaves replaced two individuals in the 96 well plate. DNA was quantified and normalized using a Nanodrop (ThermoScientific). Genotyping-by-sequencing (GBS) libraries were constructed according to a modified protocol of Elshire and colleagues . Samples were processed in batches of 24 to match barcodes available. Briefly, 150 ng of DNA per sample were digested with 1.25 units of ApeKI (New Engand Biolabs, USA) and then ligated to 10pmol of a universal PCR adapter and one of 24 possible barcoded adapters. DNA from the 24 samples, each with unique barcodes, was then pooled and cleaned with 1.2X Agencourt AMPure XP beads (Beckman Coulter, USA). To increase coverage of fewer SNPs and allow high quality calling of heterozygous alleles, a reduced-representation approach was used. Specifically, after pooling of 24 samples, the libraries were size selected on a 1.5% agarose gel from 350-550 bp using the automated Pippin Prep system (Sage Science, USA). Size selected samples were then amplified by PCR for 16 cycles and cleaned with a 1.2X Agencourt AMPure XP (Beckman Coulter, USA). Pools containing 24 libraries were sequenced on separate lanes of the HiSeq2000 (Illumina, USA) according to the manufacturer’s paired-end sequencing protocol. Five lanes of high-quality data representing 120 individuals were collected. Libraries were ‘spiked’ with 20% of a balanced nucleotide library to improve base-calling within the barcode.
DNA sequences were separated for each barcoded library, matched to the Date Palm V3 genome  using BOWTIE2 . SNPs were then called with SAMTOOLS  to generate a VCF file. To ensure high-quality genotype calling, we required a SNP contain at least 15X coverage to be called. We required a SNP to be called in at least 80% of individuals to be considered. Individuals were then clustered in the Partek Genomics Suite (Partek, USA) using ~4000 SNPs that segregated in the male parent to determine their respective population. To detect the sex chromosome markers in the clustered data with similar inheritance patterns as markers in scaffolds previously shown to segregate with sex  were collected and added to the set of markers used to construct the map. That is, even single occurrence markers were used. While this may increase error rates in the region it provides higher numbers of mapped gender related scaffolds for future analysis.
Genetic map construction
Utilizing the collected female parent genotype data we inferred male parent data by looking for 1:1 or 1:2:1 segregation patterns in the 4 populations. SNPs that shared segregation patterns in all populations were selected for use in the genetic map. Different markers having the same genotypes in all individuals were collapsed and the marker with the highest call-rate was selected as a representative. In the case of markers known to segregate male and female [8, 21], we used all markers that clustered with markers on scaffolds known to segregate with sex (see above).
Genotypes were phased separately for each population and all populations merged using TMAP . Two-points recombination data and LOD scores were calculated using CarthaGene . Markers were grouped and ordered using JoinMap 4.0 . Grouping was conducted with a LOD score of 10, the regression algorithm was used for marker ordering and the Kosambi function was used in marker distance calculation.
To study the sex control region, we utilized 5 genomes sequenced in our original study  including Khalas, Deglet Noor, Deglet Noor Backcross 5 male, Medjool, and Medjool Backcross 4 male. We documented SNPs that were homozygous in all females while heterozygous in both males. We then plotted the counts of these SNPs per 100 kb of scaffolds within a 5 cM sliding window and step size of 2.5 cM (Figure 4).
Syntenic relationships between date palm genetic map and oil palm finished genome sequence
To determine syntenic relationships between the genetic map of Phoenix dactylifera (date palm)  and its close relative Elaeis guineensis (pisifera fruit form; oilpalm) we first downloaded the latest genome sequence release of oilpalm  from http://genomsawit.mpob.gov.my (16 EG5 chromosome sequences, without unanchored scaffolds) and extracted the position of 4850 distinct genetic markers from the datepalm genetic map (Additional file 1). From their position on a date palm sequence scaffold 1,000 bp upstream and downstream sequences were extracted for each marker where possible. In case of conflicts with a scaffold end or start we extracted sequence from the SNP until the beginning or end of the scaffold. BLASTN with an e-value cutoff of 10e-05 was used to identify homologous regions on the oil palm chromosomes for the 2000 bp marker-flanking date palm sequences. For a total of 4513 (out of 4850) date palm markers one or more homologous regions were found on the oil palm chromosome sequences; in case of multiple hits, only best blast hits (BBH) were considered for downstream analyses (in case of identical BBH bitscores all hits with this bitscore were considered).
The syntenic relationships established between date palm genetic map and oil palm chromosome sequences were plotted using the CIRCOS software  and are illustrated in Figure 5. Chromosomes with strong syntenic relations between date palm and oil palm were assigned the same color.
Syntenic relationships between date palm genetic map, oil palm genome sequence and coconut genetic markers
To determine syntenic relationships between date palm, oil palm and coconut, markers from coconut were downloaded from GENBANK (http://www.ncbi.nlm.nih.gov) selecting only microsatellite DNA markers with significant (~500 bp) surrounding sequence. Location of the downloaded markers on the coconut genetic map was obtained from http://tropgenedb.cirad.fr/.
Fifty six coconut markers were associated with genomic sequence. BLASTN with an e-value cutoff of 10e-05 was used to identify homologous regions on the oil palm chromosomes for the coconut marker sequences. For a total of 45 (out of 56) coconut markers one or more homologous regions were found on the oil palm chromosome sequences; in case of multiple hits, only Best Blast Hits (BBH) were considered for downstream analyses (in case of identical BBH bitscores all hits with this bitscore were considered).
The syntenic relationships established between date palm genetic map, coconut markers and oil palm chromosome sequences were plotted individually for all 18 date palm linkage groups using the CIRCOS software  in Figure 7. Chromosomes with strong syntenic relations between date palm and coconut palm were assigned the same color.
Availability of supporting data
The draft genome sequence used in this research is available at: http://qatar-weill.cornell.edu/research/datepalmGenome/download.html.
And at http://www.ncbi.nlm.nih.gov/genbank under the accession number: ACYX00000000.
Funding for this research was provided by Qatar Foundation’s National Priorities Research Program – Exceptional Award (NPRP-EP) NPRPX-014-4-001.
- Bekheet SA, Hanafy MS: Towards sex determination of date palm. Date palm biotechnology, XVIII, 551–566. Edited by: Jain SM, Al-Khayri JM, Johnson DV. 2011, New York, New York, USA: SpringerGoogle Scholar
- Weiblen G, Oyama R, Donoghue M: Phylogenetic Analysis of Dioecy in Monocotyledons. Am Nat. 2000, 155: 46-58. 10.1086/303303.PubMedView ArticleGoogle Scholar
- Ming R, Bendahmane A, Renner SS: Sex chromosomes in land plants. Annu Rev Plant Biol. 2011, 62: 485-514. 10.1146/annurev-arplant-042110-103914.PubMedView ArticleGoogle Scholar
- Chao CT, Krueger RR: The Date Palm (Phoenix dactylifera L.): Overview of Biology, Uses, and Cultivation. HortScience. 2007, 42 (5): 1077-1082.Google Scholar
- Beal JM: Cytological Studies in the Genus Phoenix. Bot Gaz. 1937, 99: 400-407. 10.1086/334708.View ArticleGoogle Scholar
- ON THE STATUS OF CHROMOSOMES OF THE DATE PALM (PHOENIX DACTYLIFERA L). http://www.actahort.org/books/882/882_28.htm,
- Al-Salih AA, Al-Rawi AMA: A study of the cytology of two female cultivars of date palm. Date Palm J. 1987, 5 (2): 123-142.Google Scholar
- Al-Dous EK, George B, Al-Mahmoud ME, Al-Jaber MY, Wang H, Salameh YM, Al-Azwani EK, Chaluvadi S, Pontaroli AC, Debarry J, Arondel V, Ohlrogge J, Saie IJ, Suliman-Elmeer KM, Bennetzen JL, Kruegger RR, Malek JA: De novo genome sequencing and comparative genomics of date palm (Phoenix dactylifera). Nat Biotechnol. 2011, 29: 521-527. 10.1038/nbt.1860.PubMedView ArticleGoogle Scholar
- Al-Mssallem IS, Hu S, Zhang X, Lin Q, Liu W, Tan J, Yu X, Liu J, Pan L, Zhang T, Yin Y, Xin C, Wu H, Zhang G, Ba Abdullah MM, Huang D, Fang Y, Alnakhli YO, Jia S, Yin A, Alhuzimi EM, Alsaihati BA, Al-Owayyed SA, Zhao D, Zhang S, Al-Otaibi NA, Sun G, Majrashi MA, Li F, Tala , et al: Genome sequence of the date palm Phoenix dactylifera L. Nat Commun. 2013, 4: 2274-PubMed CentralPubMedView ArticleGoogle Scholar
- Grattapaglia D, Sederoff R: Genetic linkage maps of Eucalyptus grandis and Eucalyptus urophylla using a pseudo-testcross: mapping strategy and RAPD markers. Genetics. 1994, 137: 1121-1137.PubMed CentralPubMedGoogle Scholar
- Elshire RJ, Glaubitz JC, Sun Q, Poland JA, Kawamoto K, Buckler ES, Mitchell SE: A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoS One. 2011, 6: e19379-10.1371/journal.pone.0019379.PubMed CentralPubMedView ArticleGoogle Scholar
- Davey JW, Hohenlohe PA, Etter PD, Boone JQ, Catchen JM, Blaxter ML: Genome-wide genetic marker discovery and genotyping using next-generation sequencing. Nat Rev Genet. 2011, 12: 499-510. 10.1038/nrg3012.PubMedView ArticleGoogle Scholar
- Beissinger TM, Hirsch CN, Sekhon RS, Foerster JM, Johnson JM, Muttoni G, Vaillancourt B, Buell CR, Kaeppler SM, de Leon N: Marker density and read depth for genotyping populations using genotyping-by-sequencing. Genetics. 2013, 193: 1073-1081. 10.1534/genetics.112.147710.PubMed CentralPubMedView ArticleGoogle Scholar
- Andolfatto P, Davison D, Erezyilmaz D, Hu TT, Mast J, Sunayama-Morita T, Stern DL: Multiplexed shotgun genotyping for rapid and efficient genetic mapping. Genome Res. 2011, 21: 610-617. 10.1101/gr.115402.110.PubMed CentralPubMedView ArticleGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215: 403-410. 10.1016/S0022-2836(05)80360-2.PubMedView ArticleGoogle Scholar
- Singh R, Ong-Abdullah M, Low E-TL, Manaf MAA, Rosli R, Nookiah R, Ooi LC-L, Ooi S-E, Chan K-L, Halim MA, Azizi N, Nagappan J, Bacher B, Lakey N, Smith SW, He D, Hogan M, Budiman MA, Lee EK, DeSalle R, Kudrna D, Goicoechea JL, Wing RA, Wilson RK, Fulton RS, Ordway JM, Martienssen RA, Sambanthamurthi R: Oil palm genome sequence reveals divergence of interfertile species in Old and New worlds. Nature. 2013, 500: 335-339. 10.1038/nature12309.PubMed CentralPubMedView ArticleGoogle Scholar
- Lebrun P, Baudouin L, Bourdeix R, Konan JL, Barker JH, Aldam C, Herrán A, Ritter E: Construction of a linkage map of the Rennell Island Tall coconut type (Cocos nucifera L.) and QTL analysis for yield characters. Genome. 2001, 44: 962-970. 10.1139/gen-44-6-962.PubMedView ArticleGoogle Scholar
- Billotte N, Marseillac N, Risterucci A-M, Adon B, Brottier P, Baurens F-C, Singh R, Herrán A, Asmady H, Billot C, Amblard P, Durand-Gasselin T, Courtois B, Asmono D, Cheah SC, Rohde W, Ritter E, Charrier A: Microsatellite-based high density linkage map in oil palm (Elaeis guineensis Jacq.). Theor Appl Genet. 2005, 110: 754-765. 10.1007/s00122-004-1901-8.PubMedView ArticleGoogle Scholar
- Langmead B, Trapnell C, Pop M, Salzberg SL: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009, 10: R25-10.1186/gb-2009-10-3-r25.PubMed CentralPubMedView ArticleGoogle Scholar
- Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R: The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009, 25: 2078-2079. 10.1093/bioinformatics/btp352.PubMed CentralPubMedView ArticleGoogle Scholar
- Cherif E, Zehdi S, Castillo K, Chabrillange N, Abdoulkader S, Pintaud J-C, Santoni S, Salhi-Hannachi A, Glémin S, Aberlenc-Bertossi F: Male-specific DNA markers provide genetic evidence of an XY chromosome system, a recombination arrest and allow the tracing of paternal lineages in date palm. New Phytol. 2013, 197: 409-415. 10.1111/nph.12069.PubMedView ArticleGoogle Scholar
- Cartwright DA, Troggio M, Velasco R, Gutin A: Genetic mapping in the presence of genotyping errors. Genetics. 2007, 176: 2521-2527. 10.1534/genetics.106.063982.PubMed CentralPubMedView ArticleGoogle Scholar
- De Givry S, Bouchez M, Chabrier P, Milan D, Schiex T: CARHTA GENE: multipopulation integrated genetic and radiation hybrid mapping. Bioinformatics. 2005, 21: 1703-1704. 10.1093/bioinformatics/bti222.PubMedView ArticleGoogle Scholar
- Ooijen JW: Multipoint maximum likelihood mapping in a full-sib family of an outbreeding species. Genet Res (Camb). 2011, 93: 343-349. 10.1017/S0016672311000279.View ArticleGoogle Scholar
- Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, Jones SJ, Marra MA: Circos: an information aesthetic for comparative genomics. Genome Res. 2009, 19: 1639-1645. 10.1101/gr.092759.109.PubMed CentralPubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.