New insights into domestication of carrot from root transcriptome analyses
© Rong et al.; licensee BioMed Central Ltd. 2014
Received: 10 July 2014
Accepted: 7 October 2014
Published: 14 October 2014
Understanding the molecular basis of domestication can provide insights into the processes of rapid evolution and crop improvement. Here we demonstrated the processes of carrot domestication and identified genes under selection based on transcriptome analyses.
The root transcriptomes of widely differing cultivated and wild carrots were sequenced. A method accounting for sequencing errors was introduced to optimize SNP (single nucleotide polymorphism) discovery. 11,369 SNPs were identified. Of these, 622 (out of 1000 tested SNPs) were validated and used to genotype a large set of cultivated carrot, wild carrot and other wild Daucus carota subspecies, primarily of European origin. Phylogenetic analysis indicated that eastern carrot may originate from Western Asia and western carrot may be selected from eastern carrot. Different wild D. carota subspecies may have contributed to the domestication of cultivated carrot. Genetic diversity was significantly reduced in western cultivars, probably through bottlenecks and selection. However, a high proportion of genetic diversity (more than 85% of the genetic diversity in wild populations) is currently retained in western cultivars. Model simulation indicated high and asymmetric gene flow from wild to cultivated carrots, spontaneously and/or by introgression breeding. Nevertheless, high genetic differentiation exists between cultivated and wild carrots (Fst = 0.295) showing the strong effects of selection. Expression patterns differed radically for some genes between cultivated and wild carrot roots which may be related to changes in root traits. The up-regulation of water-channel-protein gene expression in cultivars might be involved in changing water content and transport in roots. The activated expression of carotenoid-binding-protein genes in cultivars could be related to the high carotenoid accumulation in roots. The silencing of allergen-protein-like genes in cultivated carrot roots suggested strong human selection to reduce allergy. These results suggest that regulatory changes of gene expressions may have played a predominant role in domestication.
Western carrots may originate from eastern carrots. The reduction in genetic diversity in western cultivars due to domestication bottleneck/selection may have been offset by introgression from wild carrot. Differential gene expression patterns between cultivated and wild carrot roots may be a signature of strong selection for favorable cultivation traits.
KeywordsCrop and wild relative Daucus carota Domestication gene Gene expression difference High-throughput sequencing Single nucleotide polymorphism Root transcriptome
Understanding the molecular basis of crop domestication, especially identifying target genes under selection during domestication, can provide insight into the processes of rapid evolution and crop improvement [1–3]. The transcriptome represents all mRNA transcripts of actively expressed genes. Identifying sequence variants (e.g. single nucleotide polymorphisms: SNPs) and detecting differential gene expression patterns in transcriptomes is of primary interest in any attempt to characterize the effects of selection and identify target genes under selection . The rapid development of high-throughput sequencing technology enables us to perform genome/transcriptome-scale studies not only by re-sequencing a few model species but also by de novo sequencing of many non-model species. This makes it feasible to compare the genome/transcriptome of a wide range of crops and progenitor species, permitting more solid conclusions to be drawn about the effects of domestication and revealing domestication genes. In this study, carrot was used as a model species to demonstrate how to study the effects of domestication and identify domestication genes based on transcriptome analyses.
Cultivated carrot (Daucus carota L. ssp. sativus) is one of the most popular vegetables in the world, providing the main source of dietary provitamin A [5–7]. According to the pigmentation of the roots, cultivated carrot can be distinguished into two main groups: the anthocyanin or eastern-type carrot (e.g. yellow or purple carrot), and the carotene or western-type carrot (e.g. yellow, orange or red carrot) . For human consumption the eastern-type carrot has nowadays been largely replaced by the western-type carrot . It is generally agreed that the eastern-type cultivated carrot originated in southwestern Asia in the area around Afghanistan only about 1100 years ago [5, 7]. However, the origin of the western-type cultivated carrot is still uncertain. Banga  demonstrated that an orange-colored carrot similar to the “Long Orange”-type western carrot first appeared on Dutch paintings in the beginning of the 17th century, suggesting a Dutch origin of the western orange carrot, probably directly selected from yellow eastern carrots. The Netherlands was the center of carrot breeding during the 18th century, and most of the modern varieties of western cultivated carrot may descend from the old orange Dutch carrots [7–9]. Because of the huge differences in root and leaf traits between eastern and western carrots, Heywood  disagreed with the idea that western carrot originated directly from eastern carrot. By summarizing the morphological evidence from different studies, he proposed a secondary domestication event, namely that the western cultivated carrot was selected from hybrids among yellow eastern carrots, cultivated white-rooted derivatives of wild carrot (D. carota L. ssp. carota) and adjacent wild populations of D. carota subspecies . Iorizzo et al.  reported the first molecular study on carrot domestication indicating that eastern cultivated carrots originated in Central Asia and western cultivated carrots may have directly originated from eastern carrots. They focused mainly on wild carrot D. carota ssp. carota. However, other wild D. carota subspecies may also have played important roles in carrot domestication, because different D. carota subspecies within the D. carota complex can successfully hybridize in nature and the taxonomy is much disputed . Therefore, in this study, various D. carota subspecies from different geographic regions will be used to further investigate the process of carrot domestication.
Usually domestication decreases the genetic diversity of crops through genetic bottlenecks and selection . For instance, maize has only about 57% of the genetic diversity found in its progenitor . In contrast, two previous studies found that carrot domestication did not result in a significant reduction of genetic diversity using allozymes, amplified fragment length polymorphisms (AFLPs) and inter-simple sequence repeat (ISSR) markers [12, 13]. However, the conclusions of these studies were based on only small regions of the carrot genome. Using thousands of SNPs, a new study by Iorizzo et al.  also detected similar levels of genetic diversity between cultivated and wild carrots suggesting the absence of a genetic bottleneck during carrot domestication. Considering the predominantly outcrossing nature of carrots and the relatively short time period of carrot domestication, the effects of domestication bottlenecks on cultivated carrots may have been offset by a high level of introgression from wild carrot and other D. carota subspecies after the bottlenecks. Further studies are required to test the hypothesis using different domestication models.
Key genes underlying valuable cultivation traits are mostly unknown in carrots. Since not all genes are targeted in domestication and/or breeding processes, we need to focus on those influencing favored traits to identify key genes under selection . In the case of carrot, as a root crop, most of the traits of interest are related to the root, such as root color, shape, size, flavor etc. [5, 7]. Cultivated carrot differs from wild carrot in forming relatively large, unbranched, smooth and juicy storage roots with high sugar and carotenoid contents [5–7, 14]. The main varietal groups of cultivated carrot in use today are categorized by root type according to root shape, size and color . Examples include the European carrot groups “Amsterdam Forcing”, “Berlicum”, “Chantenay”, “Flakkee”, “Nantes” and “Paris Market” . Thus, the variation in the root transcriptomes between cultivated and wild carrots may provide essential information about the differentiation of cultivated carrot from wild carrot.
To develop SNP markers polymorphic in the transcriptomes within and between diverse cultivated and wild carrots;
To infer the origin of cultivated carrot based on validated SNPs;
To show the effects of domestication on genetic diversity in the transcriptome;
To reveal gene expression changes between cultivated and wild carrots and identify key functional genes under selection.
As most of the domesticated traits may be related to the expression of functional genes in carrot roots, we sequenced and compared the root transcriptomes of several cultivated and wild carrots. SNPs were discovered and validated using diverse cultivated carrots, wild carrots and other wild D. carota subspecies. Phylogenetic analysis was performed to infer the origin of the cultivated carrot with different Daucus species as outgroup. Genetic diversity was calculated to evaluate the effects of domestication on genetic diversity. Domestication models were constructed to simulate the processes of carrot domestication. Key functional genes underlying cultivation traits were identified based on differential gene expression patterns between cultivated and wild carrots.
Number of reads and mean coverage to the reference sequence of cultivated and wild carrot transcriptomes
Number of reads
CA (Amsterdamse Bak)1
(D. carota ssp. sativus)
WIL (Lachish, Israel: 31.565°N, 34.849°E)2
(D. carota ssp. carota)
WNL-M (Meijendel, Netherlands: 52.156°N, 4.380°E)
WPT (Esposende, Portugal: 41.533°N, 8.783°W)
WSK (Trenčin, Slovakia: 48.892°N, 18.037°E)
WNL-SP (Schermer Polder, Netherlands: 52.621°N, 4.861°E)
To further validate the SNPs and infer the origin of cultivated carrots, an additional set of 49 cultivated carrots with both eastern and western cultivars, 18 wild carrots (D. carota ssp. carota), 32 accessions of 10 other wild D. carota subspecies, and 6 accessions of 4 different wild Daucus species (D. muricatus, D. aureus, D. guttatus and D. broteri) from Mediterranean, Southern, Western and Northern Europe, Western, Central, Southern and Eastern Asia were used (Additional file 1: Table S1).
RNA extraction and purification
RNA was extracted from each root sample with the RNeasy Plant Mini Kit (QIAGEN, Venlo, The Netherlands). About 2000 ng RNA was taken from each sample and adjusted to a volume of 12 μL with RNase-free water. For DNA digestion, this was mixed with RNase-free 1.5 μL 10× DNase I reaction buffer, 0.75 μL of 2 U/μL DNase I (Ambion) and 0.75 μL water to a total volume of 15 μL. The mixture was placed at room temperature for 15 min. To inactivate DNase I, 1.5 μL RNase-free 25 mM EDTA was added to the mixture, which was then incubated at 65°C for 10 min. Subsequently, the three RNA samples of plants of the same cultivated carrot variety or wild population (two samples for WPT) were equimolarly pooled and adjusted to a volume of 100 μL with RNase-free water. The RNA was purified with the RNeasy Mini Kit (QIAGEN, Venlo, The Netherlands). The RNA samples were stored at -80°C.
Transcriptome sequencing (RNA-Seq)
RNA-Seq analysis was performed at Leiden Genome Technology Center (LGTC). First, cDNA fragments were synthesized and amplified from each RNA sample with the Ovation RNA-Seq System (NuGEN). Then, sample preparation for Illumina multiplexing paired-end (PE) sequencing was performed according to the Illumina protocol. Each sample was tagged with a unique index tag (Index primer 1–11 for sample ID 1–11 in Table 1), permitting discrimination of sequences from different samples after multiplex sequencing. The quality and quantity of each sample was measured with an Agilent 2100 Bioanalyzer (Agilent Technologies). Each sample was diluted to 10 nmol/L. We then equimolarly pooled cultivated carrot samples into one tube and wild carrot samples into another for sequencing. Cluster generation was performed with the pooled cultivated carrot sample in one lane of the Illumina flow cell and the pooled wild carrot sample in another. The PE sequencing was carried out on the Illumina Genome Analyzer IIx for 75 cycles.
Sequence assembly and mapping
The default Illumina pipeline filter (chastity ≥0.6) was used for cleaning up raw reads. CLC Genomics Workbench 4.0 (CLC bio) was used for a de novo assembly (Insertion cost = 3; Deletion cost = 3; Mismatch cost = 2) of all obtained sequences from both cultivated and wild carrots into contigs. All resulting contigs with a coverage ≥40 or length ≥500 bases were selected and concatenated to create a single consensus reference sequence. The coverage of at least 40 was chosen in order to obtain coverage of at least 3–4 per transcript per sample. This allowed us to genotype each sample and compare gene expressions between samples later. In the reference sequence, adjacent contigs were separated by a 30-letter string of 10 Ns, 10 Cs, and 10 Ns. This artificial spacer sequence was designed not to disturb read alignment at the end of the contig. Then, reads from each cultivated or wild carrot were aligned to the reference sequence with the program Burrows-Wheeler Aligner (BWA) . The alignments were processed in the Sequence Alignment/Map (SAM) format with the program SAMtools . Afterwards the alignment data were processed in R (version 2.12.1)  for additional quality control, for genotyping each cultivated carrot or wild carrot population, for SNP discovery and for further statistical analysis.
where qbinom is an R function calculating the quantile (in our case p = 0.99) of a binomial distribution with given number of reads n = n 1 + n 2 + n 3 + n 4 and error rate ϵ. If the observed number of a nucleotide was larger than n E , the chance of the observation due to error is smaller than 0.01 and it was taken into consideration as a valid allele. To reduce false positive rates, if the value of ϵ of a sample at a position (e.g. ϵ = 0) was less than the mean ϵ over all samples and positions, the mean ϵ was used for the calculation. If no nucleotide had a count larger than n E or more than two nucleotides had counts larger than n E , the sample was assigned an ‘N’ at the position.
where (n 2 - n E ) is the corrected number of nucleotides, which should be higher than the minimum expected number of nucleotides given the minimum ratio of an allele in the mixture (1/6 or 1/4), n and 0.01 in Equations 3 and 4 means that the chance of a value equal to or less than the expected value is no more than 0.01. Otherwise, the sample was scored as homozygous for the most-observed nucleotide. With the same strategy as indicated above, the genotypes of different samples at different SNP positions were scored. Finally, we selected for further analysis genotypes of SNP positions with no more than 1 ‘N’ genotype, at least one different genotype other than ‘N’ and no more than 2 alleles over all cultivated and wild carrot samples.
The KBioscience Competitive Allele-Specific PCR (KASP) genotyping system (LGC KBioscience, UK) was applied for SNP validation. Primers were designed for 1000 SNPs based on sequences with 50 bases on either side of a SNP. Besides the carrot samples used for sequencing (10 × 3 + 1 × 2 = 32 samples), an independent set of 37 cultivated carrots, 15 wild carrots and 32 accessions of 10 other wild D. carota subspecies (part of the accessions in Additional file 1: Table S1) was used for SNP validation (116 samples in total). As a result, 622 SNPs were confirmed to be polymorphic. Afterwards, another 21 samples (indicated in bold in Additional file 1: Table S1) involving eastern-type carrots (as comparison to western carrots) and different Daucus species (as outgroup) were genotyped at 89 SNP positions, a subset of the 622 SNPs. Thus, we had two sets of genotypic data: 1) the 622-SNP dataset containing the genotypic data at 622 SNP positions of 115 carrot samples (WNL-SP3 was deleted for having too many missing data; without outgroup); 2) the 89-SNP dataset involving the data at 89 SNP positions of 136 samples (with outgroup).
A combined dataset of both the 622-SNP and 89-SNP datasets were used for the phylogenetic analysis, i.e. 115 samples genotyped at 622 SNP positions and 21 samples genotyped at 89 SNP positions. MrModeltest version 2.3  was used for selecting the best-fit model of nucleotide substitution. The GTR + G model is the best-fit with the smallest Akaike information criterion (AIC) value and the highest Akaike weight. Then, a Bayesian estimation of phylogeny was performed using MrBayes version 3.1.2 from the CIPRES Science Gateway (http://www.phylo.org/portal2/tools.action) [19–21]. Population structure of cultivated carrots, wild carrots and other wild D. carota subspecies (using the 622-SNP dataset) was inferred using Structure 2.3.4 . An admixture ancestry model was used and allele frequencies were assumed to be independent among populations. Population number (K) was set from 1–8. Three replicate runs were carried out for each K. Each run had a burn-in length of 50,000 iterations and 100,000 iterations after burn-in. Using the 622-SNP dataset, the Fst between cultivated and wild carrots was calculated with the software package ∂a∂i (dadi version 1.6.3) . The 95% confidence interval (95% CI) of the estimate was inferred by resampling SNP positions (1000 bootstrap samples).
The genetic diversity estimates were calculated using the 622-SNP dataset. The proportion of polymorphic loci (P) was calculated for cultivated carrots, wild carrots, and wild carrots plus other wild D. carota subspecies separately. A polymorphic locus is defined as having more than 1 allele. The 95% CIs of the P estimate were calculated from 1000 bootstrap samples of SNP positions. Nucleotide diversity (θ π ), Watterson’s estimator of theta (θ w ) and Tajima’s D of cultivated carrots, wild carrots, and wild carrots plus other wild D. carota subspecies were calculated with the software package ∂a∂i (dadi version 1.6.3) . The 95% CIs of the estimates were inferred by resampling SNP positions (1000 bootstrap samples).
Putative genes under selection
Genes under selection may show very different expression patterns between cultivated and wild carrots. Because the total number of reads varied across samples (Table 1), we first normalized the coverage of contigs. Normalized gene expression was calculated as the coverage of a contig from a given sample divided by the mean coverage of all the contigs in the reference sequence from the sample (Table 1). Then, the difference in gene expression of a contig between cultivated and wild carrots was calculated as (mean coverage of cultivated carrots - mean coverage of wild carrots)/(mean coverage of cultivated and wild carrots). The 95% CIs of the mean gene expression difference were calculated from 1000 bootstrap samples of contigs. Genes represented by contigs with coverage from only cultivated or wild carrot were termed “unique expression”. Putative functions for these unique expression contigs were determined by BLAST (Basic Local Alignment Search Tool: http://blast.ncbi.nlm.nih.gov/) in Genbank.
Results and discussion
For the high-throughput transcriptome sequencing, we obtained over 57 million reads from cultivated carrot roots, and over 40 million reads from wild carrot roots. 97% of the reads of cultivated carrot had tags and were assigned to one of the cultivated varieties, and 94% of the reads of wild carrot had tags and were assigned to one of the wild populations (Table 1). Each read was 75 bases long. 91% of the reads were assembled de novo into 252,715 contigs (mean length = 216; mean coverage = 122). 45,165 contigs were selected (coverage ≥40 or length ≥500; mean length = 411) representing the consensus/majority sequence of heterozygous and long contigs, and concatenated to form a single consensus reference sequence. The final reference sequence for the root transcriptome contained 18,600,079 bases (excluding artificial strings between contigs). The size of the protein-coding region in the carrot haploid genome (473 Mb) is estimated to be about 47.7 Mb . The selected reference sequence of the root transcriptome therefore corresponds to the size of about 39% of the complete carrot transcriptome. 41% of the reads from cultivated carrots and 40% of those from wild carrots were aligned to the reference sequence. The mean coverage of the various cultivated carrots was 31.3 ± 6.7 (mean ± standard error), for the wild carrots this was 29.4 ± 9.9 (excluding WNL-M, with very low mean coverage). The selected reference sequence is therefore not expected to cause a significant bias in comparing the read alignments of cultivated and wild carrots. Further analyses were all based on the alignments to the selected reference sequence. 11,369 SNP positions were identified in the reference sequence. Considering the conservative method of SNP discovery (to reduce false positive rates), the true number of SNPs is most likely higher. The ratio of transition substitutions (32.2% A/G and 31.4% C/T) to transversions (11.4% A/C, 10.8% G/T, 7.8% A/T and 6.4% C/G) was about 1.75 to 1.
Primers were designed for testing 1000 SNPs in a KASP assay, of which 871 generated PCR products. Of these, 79 were monomorphic or had many unreliable data points in the sequencing samples. The unreliable data points may be due to mismatches of primers (e.g. flanking SNPs). 792 (79.2% of the total SNPs tested) showed the expected SNP patterns in the sequencing samples. In the independent set of cultivated carrots, wild carrots and other wild D. carota subspecies (Additional file 1: Table S1), 170 out of the 792 SNPs showed only one genotype for most samples or many unreliable data points, and 622 (62.2% of the total SNPs tested) were polymorphic. Iorizzo et al. published the first large-scale transcriptome of carrot in 2011 . They computationally identified 20,058 SNPs . However, only 60% of their 354 tested SNPs had the expected SNPs in their sequencing samples, and 14% of the 354 tested SNPs were polymorphic in an unrelated mapping population . They sequenced the transcriptomes of three cultivated carrots and a pool of F4 RILs from a cross between cultivated and wild carrots , which may have led to ascertainment bias towards SNPs polymorphic in cultivated carrots. The higher success rate of our SNPs in both the sequencing and independent sets of samples indicates that the use of sequences from diverse cultivated and wild accessions together with a conservative SNP discovery method across these sequences have effectively reduced the false positive rate. Primers for the 622 validated SNPs are reported in Additional file 2: Table S2. They can be used for carrot genetic mapping and breeding as well as for population and evolutionary genetics studies.
The eastern-type cultivated carrots may have originated in the areas from Western to Central Asia (Figure 3), which is in close agreement to the results of Iorizzo et al. . Their study indicated that cultivated carrots most likely originated in Central Asia . With respect to the origin of the western-type cultivated carrots, our results strongly support that they were derived from eastern carrot cultivars, but introgression from wild carrots may have played a role as well, as proposed by Heywood . The Structure clustering results imply that “Long Orange” may be the original root type of western-type orange carrots (CHR05 and CHR20 in Figure 4). Although the “Yellow Belgian” root type clusters closer to wild carrots, these accessions have white (CHR08 and CHR26) or yellow (CHR04 and CHR30) roots. The “Long Orange” type carrot was the first observed type of orange carrot on Dutch paintings as early as about 1600 [7, 8]. Thus, our results support the notion that the western-type orange carrot may have originated in The Netherlands prior to the 17th century. However, the phylogenetic analysis does not support this hypothesis (Figure 3). On the other hand, the Structure clustering in our study was based on cultivated and wild carrots primarily of European origin. While Turkey was regarded as one of the places of origin of western carrot in previous studies , our study did not include cultivated and wild carrots from Turkey. Therefore, a more detailed study involving more carrot samples from Middle East (e.g. Turkey) needs to be conducted to further determine the place of origin of western carrot.
Effects of domestication on genetic diversity
Genetic diversity estimates and Tajima’s D of cultivated carrot, wild carrot and wild carrot plus other wild Daucus carota subspecies
H e 1
% polymorphic loci1
0.303 (0.288 - 0.317)
72.1 (69.2 - 74.7)
0.559 (0.532 - 0.584)
0.470 (0.452 - 0.487)
0.947 (0.846 - 1.042)
Wild carrot Daucus carota ssp. carota
0.349 (0.336 - 0.360)
84.0 (82.0 - 86.0)
0.643 (0.620 - 0.664)
0.548 (0.535 - 0.561)
0.869 (0.773 - 0.960)
Wild carrot plus other wild D. carota subspecies
0.344 (0.333 - 0.355)
84.3 (82.5 - 85.9)
0.635 (0.614 - 0.655)
0.550 (0.538 - 0.560)
0.776 (0.684 - 0.863)
The domestication model we used is illustrated in Figure 2. For both the 622-SNP dataset without outgroup polarization and the 89-SNP dataset with outgroup polarization, the domestication model assuming asymmetric migration between cultivated and wild carrots is a much better fit to the data than models assuming symmetric migration or no migration (parameter estimates and likelihoods for both datasets and all three migration models are given in Additional file 3: Table S3). The maximum-likelihood estimates of parameters specified in Figure 2 with different datasets were virtually identical and here only the results based on the 622-SNP dataset are shown. Compared to the current effective population size of cultivated carrot N C , the bottleneck size was small: N B = 0.0200N C (95% CI: 0.0024 - 0.0346N C ). However, the duration of the bottleneck T B was also much shorter than the period of exponential growth T after the bottleneck: T B = 0.0113T (95% CI: 0.0054 - 0.0195T), which may limit the loss of genetic diversity. Following the bottleneck, the effective population size of cultivated carrot increased exponentially to a present population size N C of 0.1039N W (95% CI: 0.0170 - 0.2508N W ), which is smaller than the population size of wild carrot N W . The population growth took about T = 1.3138N W (95% CI: 0.0964 - 2.0036N W ) generations. During the population growth, asymmetric gene flow occurred between cultivated and wild carrots. The gene flow from cultivated to wild carrot m WC was estimated at 0.1452/N W (95% CI: 0.0002 - 0.3889/N W ) while the gene flow from wild to cultivated carrot m CW was 6.4537/N W (95% CI: 2.0731 - 15.9550/N W ). The significantly higher gene flow from wild to cultivated carrot may be the result of efforts to introduce genetic diversity from wild carrot germplasm into cultivated carrot for breeding purposes. Still, the final effective population size of cultivated carrot is significantly smaller than that of wild carrot and the genetic differentiation between them is high (Fst = 0.295). Moreover, as mentioned above, the Structure analyses provided some evidence of recent introgression, although cultivated and wild carrots remain in fairly distinct clusters (Figure 4). These results suggest that human selection had a strong impact on the genetic differentiation between cultivated and wild carrots.
Wild carrot is a widely distributed species native to temperate areas in the Mediterranean region, Europe and Western Asia . Our results as well as those of Iorizzo et al.  suggest a single origin of cultivated carrot from wild carrot in Western and Central Asia, only a subset of the total genetic diversity in wild carrot. However, Iorizzo et al.  detected no reduction of genetic diversity in cultivated compared to wild carrots and proposed that the genetic bottleneck might be absent in carrot domestication. In our opinion, it is unlikely that the domestication of carrot did not go through a bottleneck at the beginning, and the results from our model simulations support this notion. Based on the simulations with different domestication models in our study, we propose another explanation of the relatively high genetic diversity maintaining in cultivated carrot. First, our model simulation suggests a small size of the domestication bottleneck but also a relatively short duration of the bottleneck, which implies a limited reduction in genetic diversity. Second, a relatively large amount of genetic diversity was recruited in cultivated carrot after the bottleneck through introgression from wild carrot. Because carrot is a predominantly outcrossing species, introgression may be relatively high between cultivated and wild carrots [12–14, 29, 30], either spontaneously or artificially, which is also supported by the results of model simulation above. For these reasons, the level of genetic diversity retained in cultivated carrot is higher than that found in other genome-wide studies of major crop species under strong pressure from bottlenecks and selection: for instance, both maize and rice, having about 57% (θ w per kb) of the diversity in their progenitors [11, 31]. Our result is closer to that retained in the whole genome and in the protein coding sequences (CDS) of soybean, about 73.2% and 75.5% (θ w per kb), respectively . All major crops had much longer histories of domestication than carrot and the associated stronger effects of bottlenecks and selection may be responsible for the more severe loss of genetic diversity in the former.
Putative genes under selection
Putative gene functions of unique expression contigs in either cultivated or wild carrots
Significant alignments in NCBI nucleotide collection database2
26S ribosomal RNA
1.8 ± 0.5
0.0 ± 0.0
4.9 ± 0.8
0.0 ± 0.0
Light harvesting protein
33.4 ± 22.0
0.0 ± 0.0
19.4 ± 8.9
0.0 ± 0.0
182.1 ± 89.8
0.0 ± 0.0
38.9 ± 13.8
0.0 ± 0.0
Dihydroflavonol 4-reductase (DFR2)
5.6 ± 1.7
0.0 ± 0.0
17.5 ± 3.3
0.0 ± 0.0
2.9 ± 1.1
0.0 ± 0.0
Peptidyl-prolyl cis-trans isomerase B
25.8 ± 8.0
0.0 ± 0.0
Phosphatidic acid phosphatase alpha
9.8 ± 3.6
0.0 ± 0.0
16.6 ± 10.2
0.0 ± 0.0
Photosystem I reaction center subunit
21.4 ± 4.0
0.0 ± 0.0
15.5 ± 7.9
0.0 ± 0.0
Plastid division regulator MinD mRNA
16.7 ± 5.3
0.0 ± 0.0
Ribosomal protein S3
3.2 ± 1.2
0.0 ± 0.0
Tonoplast aquaporin 1;1
23.5 ± 7.0
0.0 ± 0.0
Daucus carota major allergen isoform Dau c1.0201
0.0 ± 0.0
52.1 ± 33.9
0.0 ± 0.0
209.8 ± 58.0
Phloem protein 2-2
0.0 ± 0.0
28.1 ± 17.8
Apium graveolens var. dulce
Receptor protein kinase
0.0 ± 0.0
32.0 ± 9.2
An interesting finding is the activated expression of the light-harvesting complex protein of photosystem II (LHC-II) genes (Lhcb-like) in cultivated carrot roots (Table 3). LHC-II proteins are chloroplast membrane proteins encoded by a nuclear multigene family. They bind mainly chlorophyll, and therefore are often referred to as chlorophyll a/b binding proteins [33–35]. They play important roles in photosynthesis, especially in the regulation of energy flow between photosystem I and II and control of the dissipation of excess energy under light stress [34, 35]. LHC-II proteins also bind yellow or orange carotenoids, in particular lutein, zeaxanthin, violaxanthin, neoxanthin and β-carotene [34, 35]. The expression of Lhcb genes appears to be regulated by light, and plants grown in darkness contain a very low amount of Lhcb mRNA [33, 34]. Carotenoid-deficient leaves contain only trace amounts of Lhcb mRNA, suggesting that carotenoid biosynthesis and Lhcb gene expression are directly related . The Lhcb genes were thought to be silenced in roots. The high expression of Lhcb genes that we have found in cultivated carrot roots but not in wild carrot roots may be related to the high carotenoid accumulation in the former. Cultivated carrot is renowned for the high carotenoid content of its roots (xanthophylls for yellow, α- and β-carotene for orange roots), while wild carrot contains only traces of carotenoids (mainly xanthophylls) in roots . The activated expression of Lhcb genes may lead to the production of LHC-II proteins, and the binding to carotenoids of LHC-II may stimulate the accumulation of carotenoids in cultivated carrot. Carotenoid biosynthesis and the binding of carotenoids to LHC-II occur within plastids. Thus, the expression of Lhcb genes may be related to the differentiation of plastid to chromoplast in cultivated carrot roots [33, 36]. A plastid division regulator MinD gene was also found to be activated only in cultivated carrot roots (Table 3). The expression of the MinD gene may help to increase the amount of chromoplast, promote the expression of Lhcb genes and encourage the accumulation of carotenoids as shown by Galpaz et al. (2008) in tomato . Further studies are required to figure out the roles these genes played in the accumulation of carotenoids in carrot roots.
Putative allergen-related protein genes were expressed only in wild carrot roots (Table 3). The allergen-related proteins are presumed to be involved in plant defenses against microbial pathogens and abiotic stresses, but may also cause allergenic reactions in humans . The silencing of such genes in cultivated carrot may be the results of human selection for reducing allergy in cultivated carrot and/or due to different responses to stresses.
We studied carrot domestication based on transcriptome analyses of a diverse set of cultivated carrot, wild carrot and other wild D. carota subspecies. The results support the hypothesis that eastern-type carrot may have been domesticated from wild carrots in Western Asia. In addition to wild carrot, other wild D. carota subspecies may have contributed to the origin of cultivated carrots. Western-type orange carrot may originate from eastern carrot though introgression from wild carrots may also have played a role in the process. The genetic bottleneck during domestication reduced the genetic diversity in cultivated carrot, but a large amount of genetic diversity is still present in cultivated carrot. Model simulations support an important role of introgression from wild carrot in the increase of genetic diversity of cultivated carrot after the bottleneck, by breeding and/or through frequent gene flow between cultivated and wild carrots. Still, the high genetic differentiation between cultivated and wild carrots indicates the strong effects of selection. Our study demonstrated that high-throughput transcriptome sequencing of diverse cultivars and wild accessions may be very helpful in identifying functional genes under selection. Results of gene expression analysis suggest that carrot domestication significantly altered gene expression patterns by generally down-regulating the gene expressions in cultivated carrot roots. In addition, the expressions of some genes were radically different between cultivated and wild carrots. We found 174 contigs that were expressed only in cultivated carrot roots and 47 only in wild carrot roots. Transcriptional changes may be predominant among the major putative domestication genes controlling the differences between cultivated and wild carrots. Many of these genes are still unknown, however, and these require further analysis. In future studies, special attention shall be devoted to functional analysis of the genes under selection identified in the present study and to discovering the detailed molecular mechanisms of those genes in changing root traits in carrot.
Availability of supporting data
The data sets supporting the results of this article are included within the article (and its additional files), RNA-seq data are available in the ArrayExpress Archive database of functional genomics experiments at the European Bioinformatics Institute (EBI) under accession E-MTAB-1340 (http://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-1340/), the phylogenetic tree and associated data matrix are available in TreeBASE (Accession URL: http://purl.org/phylo/treebase/phylows/study/TB2:S16441?format=html).
We thank Sophie Greve and colleagues of Leiden Genome Technology Center for their support of our study and Yu Sun and Songting Shi of Leiden University Medical Center for their help and comments. We thank Cilia Grebenstein for providing carrot seeds and Prof. Martien Groenen, Dr. Hendrik-Jan Megens, Laura Bertola and Dick Groenenberg for helping with phylogenetic analysis. We are grateful to Dandan Cheng, Karin A. M. van der Veen, Cilke M. Hermans and Henk W. Nell for helping grow the carrots. Peter J. Steenbergen is thanked for genotyping 21 cultivated and wild accessions at 89 SNP positions. Prof. Allison A. Snow of The Ohio State University is thanked for comments on the manuscript. We thank Warwick Genetic Resources Unit in the United Kingdom and The Genebank of the Leibniz Institute of Plant Genetics and Crop Plant Research (IPK Gatersleben) in Germany for providing carrot materials. We are grateful to Dorien Postma-Haarsma, Henk Huits and colleagues of Bejo Zaden B.V. for their supports in SNP validation. Finally, we thank Nigel Harle for his revision of our English. This work was supported by the research program “Ecology Regarding Genetically Modified Organisms” (ERGO) No. 838.06.031 of the Dutch Ministries for the Environment, Economic Affairs, Agriculture and Science and Education, implemented by the Earth and Life Sciences Council (ALW) of The Netherlands Organisation for Scientific Research (NWO).
- Doebley JF, Gaut BS, Smith BD: The molecular genetics of crop domestication. Cell. 2006, 127: 1309-1321. 10.1016/j.cell.2006.12.006.PubMedView ArticleGoogle Scholar
- Purugganan MD, Fuller DQ: The nature of selection during plant domestication. Nature. 2009, 457: 843-848. 10.1038/nature07895.PubMedView ArticleGoogle Scholar
- Tang HB, Sezen U, Paterson AH: Domestication and plant genomes. Curr Opin Plant Biol. 2010, 13: 160-166. 10.1016/j.pbi.2009.10.008.PubMedView ArticleGoogle Scholar
- Renaut S, Nolte AW, Bernatchez L: Mining transcriptome sequences towards identifying adaptive single nucleotide polymorphisms in lake whitefish species pairs (Coregonus spp. Salmonidae). Mol Ecol. 2010, 19: 115-131.PubMedView ArticleGoogle Scholar
- Heywood VH: Relationships and evolution in the Daucus carota complex. Israel J Bot. 1983, 32: 51-65.Google Scholar
- Just BJ, Santos CAF, Fonseca MEN, Boiteux LS, Oloizia BB, Simon PW: Carotenoid biosynthesis structural genes in carrot (Daucus carota): isolation, sequence-characterization, single nucleotide polymorphism (SNP) markers and genome mapping. Theor Appl Genet. 2007, 114: 693-704. 10.1007/s00122-006-0469-x.PubMedView ArticleGoogle Scholar
- Simon PW, Freeman RE, Vieira JV, Boiteux LS, Briard M, Nothnagel T, Michalik B, Kwon YS: Carrot. Handbook of Plant Breeding: Vegetables II: Fabaceae, Liliaceae, Solanaceae, and Umbelliferae. Edited by: Prohens J, Nuez F. 2008, New York: Springer, 327-357.Google Scholar
- Banga O: The development of the original European carrot material. Euphytica. 1957, 6: 64-76.Google Scholar
- Stein M, Nothnagel T: Some remarks on carrot breeding (Daucus carota sativus Hoffm.). Plant Breeding. 1995, 114: 1-11. 10.1111/j.1439-0523.1995.tb00750.x.View ArticleGoogle Scholar
- Iorizzo M, Senalik DA, Ellison SL, Grzebelus D, Cavagnaro PF, Allender C, Brunet J, Spooner DM, Van Deynze A, Simon PW: Genetic structure and domestication of carrot (Daucus carota subsp. sativus) (Apiaceae). Am J Bot. 2013, 100: 930-938. 10.3732/ajb.1300055.PubMedView ArticleGoogle Scholar
- Wright SI, Bi IV, Schroeder SG, Yamasaki M, Doebley JF, McMullen MD, Gaut BS: The effects of artificial selection on the maize genome. Science. 2005, 308: 1310-1314. 10.1126/science.1107891.PubMedView ArticleGoogle Scholar
- St. Pierre MD, Bayer RJ: The impact of domestication on the genetic variability in the orange carrot, cultivated Daucus carota ssp. sativus and the genetic homogeneity of various cultivars. Theor Appl Genet. 1991, 82: 249-253.PubMedView ArticleGoogle Scholar
- Bradeen JM, Bach IC, Briard M, le Clerc V, Grzebelus D, Senalik DA, Simon PW: Molecular diversity analysis of cultivated carrot (Daucus carota L.) and wild Daucus populations reveals a genetically nonstructured composition. J Am Soc Hortic Sci. 2002, 127: 383-391.Google Scholar
- Wijnheijmer EHM, Brandenburg WA, Ter Borg SJ: Interactions between wild and cultivated carrots (Daucus carota L.) in the Netherlands. Euphytica. 1989, 40: 147-154. 10.1007/BF00023309.View ArticleGoogle Scholar
- Li H, Durbin R: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009, 25: 1754-1760. 10.1093/bioinformatics/btp324.PubMed CentralPubMedView ArticleGoogle Scholar
- Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R: The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009, 25: 2078-2079. 10.1093/bioinformatics/btp352.PubMed CentralPubMedView ArticleGoogle Scholar
- R Development Core Team: R: A Language and Environment for Statistical Computing. 2010, Vienna: R Foundation for Statistical ComputingGoogle Scholar
- Nylander JAA: MrModeltest v2. Program Distributed by the Author. 2004, Uppsala University: Evolutionary Biology CentreGoogle Scholar
- Huelsenbeck JP, Ronquist F: MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics. 2001, 17: 754-755. 10.1093/bioinformatics/17.8.754.PubMedView ArticleGoogle Scholar
- Ronquist F, Huelsenbeck JP: MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics. 2003, 19: 1572-1574. 10.1093/bioinformatics/btg180.PubMedView ArticleGoogle Scholar
- Miller MA, Pfeiffer W, Schwartz T: Creating the CIPRES Science Gateway for inference of large phylogenetic trees. Proceedings of the Gateway Computing Environments Workshop (GCE): 14 November 2010; New Orleans. 2010, 1-8.View ArticleGoogle Scholar
- Pritchard JK, Stephens M, Donnelly P: Inference of population structure using multilocus genotype data. Genetics. 2000, 155: 945-959.PubMed CentralPubMedGoogle Scholar
- Gutenkunst RN, Hernandez RD, Williamson SH, Bustamante CD: Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genet. 2009, 5: e1000695-10.1371/journal.pgen.1000695.PubMed CentralPubMedView ArticleGoogle Scholar
- Molina J, Sikora M, Garud N, Flowers JM, Rubinstein S, Reynolds A, Huang P, Jackson S, Schaal BA, Bustamante CD, Boyko AR, Purugganan MD: Molecular evidence for a single evolutionary origin of domesticated rice. Proc Natl Acad Sci USA. 2011, 108: 8351-8356. 10.1073/pnas.1104686108.PubMed CentralPubMedView ArticleGoogle Scholar
- Lam HM, Xu X, Liu X, Chen W, Yang G, Wong FL, Li MW, He W, Qin N, Wang B, Li J, Jian M, Wang J, Shao G, Wang J, Sun SS, Zhang G: Resequencing of 31 wild and cultivated soybean genomes identifies patterns of genetic diversity and selection. Nature Genet. 2011, 42: 1053-1059.View ArticleGoogle Scholar
- Cavagnaro PF, Chung SM, Szklarczyk M, Grzebelus D, Senalik D, Atkins AE, Simon PW: Characterization of a deep-coverage carrot (Daucus carota L.) BAC library and initial analysis of BAC-end sequences. Mol Genet Genomics. 2009, 281: 273-288. 10.1007/s00438-008-0411-9.PubMedView ArticleGoogle Scholar
- Iorizzo M, Senalik DA, Grzebelus D, Bowman M, Cavagnaro PF, Matvienko M, Ashrafi H, Van Deynze A, Simon PW: De novo assembly and characterization of the carrot transcriptome reveals novel genes, new markers, and genetic diversity. BMC Genomics. 2011, 12: 389-10.1186/1471-2164-12-389.PubMed CentralPubMedView ArticleGoogle Scholar
- Emerson KJ, Merz CR, Catchen JM, Hohenlohe PA, Cresko WA, Bradshaw WE, Holzapfel CM: Resolving postglacial phylogeography using high-throughput sequencing. Proc Natl Acad Sci USA. 2010, 107: 16196-16200. 10.1073/pnas.1006538107.PubMed CentralPubMedView ArticleGoogle Scholar
- Magnussen LS, Hauser TP: Hybrids between cultivated and wild carrots in natural populations in Denmark. Heredity. 2007, 99: 185-192. 10.1038/sj.hdy.6800982.PubMedView ArticleGoogle Scholar
- Rong J, Janson S, Umehara M, Ono M, Vrieling K: Historical and contemporary gene dispersal in wild carrot (Daucus carota ssp. carota) populations. Ann. Bot-London. 2010, 106: 285-296. 10.1093/aob/mcq108.View ArticleGoogle Scholar
- Caicedo AL, Williamson SH, Hernandez RD, Boyko A, Fledel-Alon A, York TL, Polato NR, Olsen KM, Nielsen R, McCouch SR, Bustamante CD, Purugganan MD: Genome-wide patterns of nucleotide polymorphism in domesticated rice. PLoS Genet. 2007, 3: 1745-1756.PubMedView ArticleGoogle Scholar
- Bramley H, Turner DW, Tyerman SD, Turner NC: Water flow in the roots of crop species: The influence of root structure, aquaporin activity, and waterlogging. Advances in Agronomy. Edited by: Sparks DL. 2007, San Diego, CA: Academic Press, 96: 133-196.Google Scholar
- Mayfield SP, Taylor WC: Carotenoid-deficient maize seedlings fail to accumulate light-harvesting chlorophyll a/b binding protein (LHCP) mRNA. Eur J Biochem. 1984, 144: 79-84. 10.1111/j.1432-1033.1984.tb08433.x.PubMedView ArticleGoogle Scholar
- Schmid VHR: Light-harvesting complexes of vascular plants. Cell Mol Life Sci. 2008, 65: 3619-3639. 10.1007/s00018-008-8333-6.PubMedView ArticleGoogle Scholar
- Barros T, Kuhlbrandt W: Crystallisation, structure and function of plant light-harvesting Complex II. Biochim Biophys Acta-Bioenergetics. 2009, 1787: 753-772. 10.1016/j.bbabio.2009.03.012.View ArticleGoogle Scholar
- Fuentes P, Pizarro L, Moreno JC, Handford M, Rodriguez-Concepcion M, Stange C: Light-dependent changes in plastid differentiation influence carotenoid gene expression and accumulation in carrot roots. Plant Mol Biol. 2012, 79: 47-59. 10.1007/s11103-012-9893-2.PubMedView ArticleGoogle Scholar
- Galpaz N, Wang Q, Menda N, Zamir D, Hirschberg J: Abscisic acid deficiency in the tomato mutant high-pigment 3 leading to increased plastid number and higher fruit lycopene content. Plant J. 2008, 53: 717-730. 10.1111/j.1365-313X.2007.03362.x.PubMedView ArticleGoogle Scholar
- Peters S, Imani J, Mahler V, Foetisch K, Kaul S, Paulus K, Scheurer S, Vieths S, Kogel KH: Dau c 1.01 and Dau c 1.02-silenced transgenic carrot plants show reduced allergenicity to patients with carrot allergy. Transgenic Res. 2011, 20: 547-556. 10.1007/s11248-010-9435-0.PubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.