The Brassicaceae family is an exemplary model for studying plant polyploidy. The Brassicaceae knowledge-base includes the well-annotated Arabidopsis thaliana reference sequence; well-established evidence for three rounds of whole genome duplication (WGD); and the conservation of genomic structure, with 24 conserved genomic blocks (GBs). The recently released Brassica rapa draft genome provides an ideal opportunity to update our knowledge of the conserved genomic structures in Brassica, and to study evolutionary innovations of the mesohexaploid plant, B. rapa.
Three chronological B. rapa genomes (recent, young, and old) were reconstructed with sequence divergences, revealing a trace of recursive WGD events. A total of 636 fast evolving genes were unevenly distributed throughout the recent and young genomes. The representative Gene Ontology (GO) terms for these genes were ‘stress response’ and ‘development’ both through a change in protein modification or signaling, rather than by enhancing signal recognition. In retention patterns analysis, 98% of B. rapa genes were retained as collinear gene pairs; 77% of those were singly-retained in recent or young genomes resulting from death of the ancestral copies, while others were multi-retained as long retention genes. GO enrichments indicated that single retention genes mainly function in the interpretation of genetic information, whereas, multi-retention genes were biased toward signal response, especially regarding development and defense. In the recent genome, 13,302, 5,790, and 20 gene pairs were multi-retained following Brassica whole genome triplication (WGT) events with 2, 3, and 4 homoeologous copies, respectively. Enriched GO-slim terms from B. rapa homomoelogues imply that a major effect of the B. rapa WGT may have been to acquire environmental adaptability or to change the course of development. These homoeologues seem to more frequently undergo subfunctionalization with spatial expression patterns compared with other possible events including nonfunctionalization and neofunctionalization.
We refined Brassicaceae GB information using the latest genomic resources, and distinguished three chronologically ordered B. rapa genomes. B. rapa genes were categorized into fast evolving, single- and multi-retention genes, and long retention genes by their substitution rates and retention patterns. Representative functions of the categorized genes were elucidated, providing better understanding of B. rapa evolution and the Brassica genus.
The genus Brassica belongs to the Brassiceae tribe, Brassicaceae family, Brassicales order. The genus contains 38 species and several varieties, as well as numerous hybrids. The six major Brassica species are described by the “Triangle of U”: three diploid genomes of B. rapa (AA genome, 2n = 2x = 20), B. nigra (BB genome, 2n = 2x = 16), and B. oleracea (CC genome, 2n = 2x = 18), formed into the three amphidiploid plants B. juncea (AABB genome, 2n = 4x = 36), B. napus (AACC genome, 2n = 4x = 38), and B. carinata (BBCC genome, 2n = 4x = 34) through interspecific hybridization . Genomic orders are conserved between diploid and amphidiploid Brassica species according to marker-based studies [2–4]. Therefore, the construction of reference Brassica A, B, and C genomes provides a framework for many various Brassica species. The whole genome sequence (WGS) of Brassica A has been released with the B. rapa ssp. pekinensis line Chiifu-401-42  and the WGS of B. oleraceae (C genome) will be available in the near future . These valuable resources enable us to elucidate species identity as a consequence of whole genome triplication (WGT), to discover molecular markers useful in breeding, and to profile gene variants, all further enhancing our understanding of evolution within the group.
One of the more interesting outcomes of the increase in plant genomic research is the plethora of species expansion and diversification studies available. The polyploidy event, known as whole genome duplication (WGD), is a major contributor to genome evolution and species radiation through its ability to increase the odds of obtaining new functions in a genome [7–9]. The Brassicaceae (formerly Cruciferae) family is an exemplary model for studying polyploidy events because the well annotated Arabidopsis thaliana (A. thaliana) genome exists as a reference , with its well supported three rounds of WGD (At-α, At-β, and At-γ) . In addition, sub-classification of the Brassicaceae species is relatively clear for lineages I–III . The genus Brassica experienced an additional WGT around 13–17 million years ago (Mya) [13, 14]. The timing of this WGT makes Brassica an important model genus for evolutionary study because genomic collinearity among the species is maintained with their ancestral genome, a decisive factor in estimating ancestral genomes. The model plants, A. thaliana and B. rapa, belong to the core-Brassicaceae lineage I and II, respectively . Conservation of genomic structure from the Ancestral Crucifer Karyotype (ACK; n = 8) has been reported in Brassicaceae , and 24 conserved genomic blocks (GBs) based on A. thaliana loci have also been established . The common ancestor of lineage II in Brassicaceae (Proto-Calepineae Karyotype (PCK); n = 7) experienced chromosomal reduction . Additional translocation was also experienced translocation-PCK (tPCK) in several genera of the Brassicaceae lineage II, including the genus Brassica
. Information about conserved GBs and their loci makes it easy to compare genomic structures as well as gene expansions related to Brassica diversity.
After WGD, a plant genome is reorganized via chromosomal rearrangements, excessive gene fractionation, and epigenetic changes [9, 19]. In Arabidopsis, 80 Mb and 33.2 Mb of the genome originated from recent (α-WGD), and from old (βγ-WGD) polyploidy events, respectively, according to a recent synonymous genomic blocks substitution analysis . After reorganization resulting from WGD, genomes preferentially retain genes or gene families [21–23]. In Arabidopsis these genes have been reported to be dosage sensitive and to be functionally involved in transcriptional and/or developmental regulation , biological networks and signal cascades [22, 24], as well as in protein complexes . Furthermore, longer retained genes contribute to species radiations by subfunctionalization or neofunctionalization after polyploidy . In the B. rapa genome multi-retained genes have been reported to be involved in environmental stress, hormone response, transcription factors (TFs), ribosome structure, cell wall, and cytoskeleton organization . Specifically, auxin-related gene families, which control a plant’s growth and morphological development, are over-retained in the B. rapa genome, which is an indicator that these genes are potential contributors to morphological diversification . Multi-retained genes possessing biased function are not specific to B. rapa, but are also common in other duplicated genomes . The innovative features of the B. rapa genome introduced by its recent WGT, and the major fate of those duplicated genes in the genome are not yet fully understood.
In this study we aim to refine GB information using the latest genomic data, and to distinguish the historic B. rapa genomes chronologically for further studies. Fast evolving and multi-retention genes have been elucidated, and genome innovations after the WGT event are discussed. This analysis will contribute to understanding B. rapa evolution in general, as well as suggest future experimental designs for studying Brassica diversity.
Reconstruction of three chronological B. rapagenomes with 24 refined genomic blocks
We identified putative homologous chromosomal segments between A. thaliana and B. rapa genomes using the MCScan algorithm. A total of 683 syntenic segments were identified between the two genomes with 36,683 collinear gene pairs. Sequence divergence between the collinear gene pairs, calculated as synonymous substitutions per synonymous site (Ks), ranged from 0.1 to 6.2, with an average value of 0.51 (Additional file 1). The average Ks values of the collinear gene pairs in each syntenic segment were distributed into three waves, which were attributed to traces of paleo-WGD and recent triplication events (Figure 1). The first wave contained 302 syntenic segments with 29,239 collinear gene pairs; we named these “recent” segments (Table 1). The second and third waves corresponded to “young” and “old” segments; and contained 366 syntenic segments, with 7,335 collinear gene pairs; and 15 syntenic segments, with 109 collinear gene pairs; respectively. Approximately 5–13 times more collinear genes were identified in recent syntenic segments than in other older segments. Twenty-four Brassicaceae GBs in the Arabidopsis genome were refined by identifying collinear genes between A. thaliana and B. rapa with evidence of the syntenic segment’s continuity. This represented an expansion to the GBs proposed by Schranz et al.  and Cheng et al. . A total of 24 GBs excluding “G”, “R”, and “W” blocks were expanded by 0.01–2.52 Mb, assigning 2,997 more genes for a total of 113.34 Mb (99.41%) of the GBs defined in the Arabidopsis genome (Table 2). Finally, three historic B. rapa genomes, identified by the sequence divergence values of their syntenic segments, were reconstructed, clearly showing the trace of recursive WGD events (Figure 2).
Classification of three chronological genomes by sequence divergence of the collinear gene pairs*
Syntenic segment information
No. of syntenic segments
No. of collinear protein pairs
Information ofB. rapagenome segment
Size of genome segment
No. of integrated GBs
Average collinear gene pairs
No. of distinct A. thaliana genes
*Detailed information about this table is represented in Additional file 2.
Reconstructed genomic blocks based on the synteny betweenA. thalianaandB. rapa
No of genes (%)
No of genes (%)
*Genomic blocks updated by Cheng et al. (2013) . AK: ancestral karyotype defined by Lysak et al. (2006) , GB: genomic block.
A total of 172 GBs were assigned to the recent genome, covering 250 Mb (98.69%) of the B. rapa genome by revealing 0.29 (“G” block) – 2.84 (“T” block) times of A. thaliana GB size (Additional file 2). There were 1,217 collinear gene pairs, preserving 53.48% of the Arabidopsis genome with collinearity (Table 1). The most conserved GB in terms of number of genes with collinearity was “R”, with 61.52% of the A. thaliana genes preserved in synteny, covering the A. thaliana “R” block 2.06 times. The B. rapa “G” block was the most fractionated, with 13.97% of the remaining collinear genes. The young genome was constructed with 203 Mb (79.24%) of B. rapa coverage, retaining 14.90% of A. thaliana collinear genes (Table 1). The most conserved and fractionated GBs in the young blocks were “A” and “Q” with 19.44% and 9.21% of A. thaliana genes, respectively (Additional file 2). The genomic block “G” was not detected in the young blocks. The old genome barely remained with 5.50 Mb of reconstructed blocks and 109 collinear gene pairs. The GB conservations were less intact in the older genome than in the younger genome, with 107, 37, and 10 collinear gene pairs per Mb in the recent, young, and old genomes, respectively (Table 1). Comparative analysis of GB arrangements in recent and young genomes showed that the “A”, “U”, and “F” GBs of the recent genome conservatively contained eight “O-V-J-I-C-D-L-K-E” blocks, eight “V-O-P-W-H-I-H-I” blocks, and seven “R-Q-R-W-R-C-T” block arrangements from the young genome, respectively (Figure 2). Three GBs (“H”, “K”, and “T”) in the recent genome consisted of only one GB (“U”, “A”, or “F”) from the young genome. The ancestral copies of the “G” block in the recent genome had been lost.
Fast evolving genes in B. rapagenome with recursive WGD
We selected fast evolving genes in the syntenic segments based on nucleotide substitution rates. A total of 636 fast evolving genes were identified from 265 syntenic segments by selecting genes with Ks values significantly higher than the average Ks value of their syntenic segments (p < 0.001). Among them 543 (85.38%) and 93 (14.62%) fast evolving genes were detected, from 181 recent and 84 young syntenic segments, respectively, whereas no fast evolving genes were identified in old syntenic segments (Additional file 3). The fast evolving genes were unevenly distributed throughout the B. rapa genome (Figure 3). In the “recent” genome, A03 chromosome had the highest number of fast evolving genes with 78 genes, while A04 had the lowest with 29 genes, not including scaffolds. In “young” genomes, A01 and A08 were the chromosomes with the most (17 genes) and the least (3 genes) fast evolving genes, respectively. The quantity of fast evolving genes in the 24 GBs were varied, with 1 (0.18% in “G”) – 67 (12.34% in “F”) genes/GB in recent blocks, and 0–11 (11.83% in “A”) genes/GB in young blocks (Additional file 3). No fast evolving genes were detected in the “G”, “L”, and “Q” GBs in the young genome. Only five genes were commonly identified as fast evolving genes in both the recent and young genomes. Fast evolving gene function was estimated using Gene Ontology (GO) annotation. A total of 631 fast evolving genes were assigned to GO terms in all hierarchies; 555 biological processes (BP), 244 molecular functions (MF) and 103 cellular components (CC) (Additional file 3). To simplify the presentation, GO terms were re-categorized into GO-slim terms. High proportions of fast evolving genes remained unknown (Figure 4A). Nevertheless, four BP-terms show moderate frequencies (>15%), ‘protein metabolism’ , ‘response to abiotic or biotic stimulus’ , ‘developmental process’ , and ‘cell organization and biogenesis.’ These genes function in ‘binding (to protein, DNA, RNA, nucleotide)’ , or as ‘enzymes (hydrolase, transferase, kinase)’ , and a small number of them are involved as ‘transporters’ or ‘receptors’ in MF-terms. In CC terms, fast evolving genes mainly localized at the ‘nucleus’ , ‘cytoplasmic’ , and ‘intracellular regions.’ To understand the representative functions of the fast evolving genes, an enrichment test was performed based on GO-slim. Despite many functions that remained unclear, several notable functions were identified such as ‘protein metabolism’ and ‘structural molecule activity’ (Figure 4B). The BP term ‘protein metabolism’ included ‘protein folding’ , ‘translation’ , ‘post-translational protein modification (myristoylation, phosphorylation, methylation, glycosylation, ubiquitination, dephosphorylation, deubiquitination, autophosphorylation)’ , ‘proteolysis’ , and ‘positive and negative regulation of serine/threonine kinase activity’. The ‘structural molecule activity’ MF terms contains ‘constituent of ribosome.’ Despite high frequencies of ‘stress’ and ‘biotic/abiotic response’ genes, terms such as ‘receptor binding/activity’ and ‘transporter’ , which facilitate the recognition and transportation of environmental signals, were observed at relatively lower frequencies in MF terms (Figure 4A). The term ‘plasma membrane’ , which indicates a recognition function, was not represented (Figure 4B).
Functional bias according to gene retention patterns after recursive WGD
We investigated the retention patterns of the 36,638 collinear gene pairs for the GO term ‘gene birth and death’ in the three chronological genomes (Table 3). The results show that 29,784 (72.61%) of B. rapa genes had more than one collinear gene pair with A. thaliana; 11,235 (27.39%) of them had lost collinearity. The collinear gene pairs were classified into eight groups based on their retention patterns, as shown in Table 3. Most collinear gene pairs were observed in the recent genome, and 76.58% of them (22,371) were retained in the recent genome following the death of their ancestral copy (Index 1). The remaining 6,841 gene pairs retained older copies of young and/or old genomes (Indexes 2–4), and are defined as long retention genes. The higher standard deviation of Ks (0.98) in Index 2 compared to Index 1 (0.31) demonstrated a longer retention period for the gene pairs. However, the standard deviation of Ks values for other multi-retention gene pairs in Indexes 3–4 showed no significant difference with those of the single retention gene pairs because of the small number of gene pairs analyzed. The retention rate of A. thaliana paralogues between young (6,740, 98.77%) and old (84, 1.23%) genomes was similar to the retention rate of B. rapa genes in recent (29,212, 98.08%) and old (572, 1.92%) genomes. This suggests that approximately 98% synteny is generated from the direct ancestor’s gene copies, while 1–2% is generated from older ancestor gene copies.
Classification of divergence time ofB. rapagene based onA. thalianacollinear pairs
Standard deviation ofKsvalues
No. of B. rapaproteins (%)
aPresence (O) and/or absence (X) of collinear gene pairs in recent, young and old genomic segments. ‘-‘ denotes absence of collinear gene pairs.
Functional bias depending on gene retention patterns was investigated using GO term analysis. We detected 304, 408, 10, 14, and 19 GO terms enriched in Indexes 1, 2, 3, 4, and 6, respectively (Additional file 4). However, there were no enriched GO terms detected in Index 7 and 8 due to their small numbers of gene pair datasets. The enriched GO terms were re-categorized into GO-slim terms (Figure 5A). The major frequent GO terms in Index 1 (recent specific genes) were ‘DNA-dependent transcription’ , ‘DNA or RNA metabolism’ , ‘protein metabolism’ , and ‘other metabolic process;’ while those in Index 2 (sharing recent and young genomes as multi-retention genes) were ‘stress or biotic/abiotic response genes’ , ‘developmental process’ , ‘cellular/biological process’ , ‘cell organization and biogenesis’ , ‘transport’ , and ‘signal transduction.’ Despite a low occurrence of GO terms in Index 3 (frequent in ‘transport’ and ‘development process’) and Index 4 (frequent in ‘biotic/abiotic stimulus’ and ‘signal transduction’), GO enrichments in these gene pairs showed patterns similar to Index 2. However, the enrichment pattern observed with gene pairs in Index 6 (young specific genes) was similar to that of Index 1 (Additional file 4). This result suggests that genes retained in the recent or young genomes as single retention genes mainly function in the interpretation of genetic information, whereas, multi-retention gene function is biased towards response to signals, especially for development and defense. Manual investigation of over 755 enriched GO terms in ‘development’ revealed six and 17 enriched GO terms over-represented in Index 1 and 2 representatives of single and multi-retention genes, respectively. Various reproductive structures (floral organs, seeds) were preferentially observed in Index 2, while embryo and embryo sac ‘development’ were over-represented in Index 1. ‘Development’ for vegetative tissues (leaf, root) was common in both indexes (Figure 5B). Seven and 38 enriched GO terms were manually detected in Indexes 1 and 2, respectively, using the term ‘response to’ (Figure 5C). The multi-retention gene pairs (Index 2) showed more enrichments with this term, including genes responding to phytohormones, nutrients, and other biological stimuli. Many biotic stress defense genes (bacterium, fungus, and insect) were observed in Indexes 1 and 2; however, molecules originating from bacteria and fungus were biased to Index 1 and to Index 2, respectively.
Fate of homoeologues in the recent genome
The multi-retention genes in the recent genome of the B. rapa homoeologues were identified from 18,713 recent genome collinear gene pairs. A total of 10,127 (54.12%) gene pairs were retained as single genes, while 13,302 (35.54%), 5,790 (10.31%), and 20 (0.03%) gene pairs were retained as two, three and four homoeologues, multi-retention genes in the recent genome, respectively. Functional enrichments provided by the single and homoeologous genes were evaluated. The enriched GO-slim terms of single-retention genes in the recent genome were similar to those of recent genome specific genes (Index 1 in Table 3), representing roles in the construction of functional proteins (Figure 6A). The enriched GO-slim terms in multi-retention genes in the recent genome (B. rapa homoeologues) were ‘response to stress’, ‘developmental process’ and ‘metabolic process’ (Figure 6B), similar to long-retention genes that had undergone recursive WGD (Figure 5A). These data demonstrate that the major effect of the B. rapa WGT event may have been a change in environmental adaptability or in developmental processes. The B. rapa homoeologues were analyzed for the existence of different expression patterns and/or positive selection (Ka/Ks > 1) (Additional file 5). A total of 145 homoeologues (1.69%) were defined to a “dead” class, showing no expression evidence in mRNA-Seq data. We also defined 690 (8.04%) homoeologues that only had one instance of expression evidence as a “nonfunctionalization” class, and 2,942 (34.27%) homoeologues with spatially differentiated expression patterns as a “subfunctionalization” class (Table 4). The “neofunctionalization” class, containing nine (0.10%) homoeologues was identified with by the existence of both expression evidence and Ka/Ks > 1. These results imply that the most frequent evolutionary tract for multi-retention genes following WGT were “subfunctionalization” through a changing in spatial expression patterns (Table 5).
Functional differentiation of theB. rapahomoeologues
Subfunctionalization ofB. rapagenes involved in root and leaf developments
Absolute expression inA. thalianac
Extra-large G protein 3
Glycerophosphoryl diester phosphodiesterase-like protein (GDPD)
aDigital expression patterns by Audic’s test, bold characters represent over-expressed tissues with p-value < 0.001. bspecific organs, sub-functionalized B. rapa genes involved in the developmental process, cexpression patterns of A. thaliana in eFP web site (http://bbc.botany.utoronto.ca/efp/cgi-bin/efpWeb.cgi) in developmental map.
Extraction of three chronological genomic segments from the B. rapagenome
The mesohexaploid B. rapa genome underwent four rounds of polyploidy events after the diversification of the Eudicots . Three paleo-WGDs, known as At-γ, β, α events (in chronologic order) are shared with the entire core Brassicaceae family, whereas the last genome triplication is specific to the Brassica genus . Brassica WGT yielded three or six copies of genome colinearity in diploid and amphidiploid Brassica species, respectively [2–4, 26]. Advancements in next-generation sequencing (NGS) technology and bioinformatics analyses have increased the number of WGS projects for commercial and/or evolutionarily important plants. However, NGS based approaches often bias genome assembly toward gene rich regions, leaving most intergenic and repetitive regions unassembled . In this study, we used 283.8 Mb of B. rapa draft genome (several ‘N’s occur in unassembled regions and unanchored scaffolds), which covered 98% of gene space . This was of sufficient quality to allow our comparative analysis illuminating the effects of WGD/WGT events on Brassica gene and genome evolution. There must be some amount of gene loss and underestimation in our present data; however, that should not change the overall evidence leading to our conclusions. As a part of these studies, Cheng et al. (2013) showed that ten chromosomes of the B. rapa genome formed from an ancestral tPCK structure (n = 7), revealing one to three copies with GB associations conserved in the ancestral genome, and showing a trace of ancestral centromeres in the B. rapa genome . We rebuilt ancestral genomic segments based on the amount of synteny between A. thaliana and B. rapa, as measured by the average Ks values of each syntenic segment (Figure 2). This was based on the assumption that genes in synteny should share similar substitution rates. These assessments clearly categorize syntenic segments into three chronologic classes: recent, young, and old genomes (Figure 1). The average Ks value of the recent genomic segments (0.53; Table 1) indicate that the birth of the recent genome was concurrent with the split of A. thaliana and B. rapa (24–40 Mya) . The average Ks value of young (1.16) and old (2.19) genomic segments in Brassica were similar to the paralogous gene sets in Arabidopsis recent (0.8–1 Ks) and old (2.0–2.2 Ks) segments . These values suggest that the birth of the young genome in Brassica is slightly older than the Arabidopsis recent polyploidy event, whereas the old Brassica genome is close in birth age to the old Arabidopsis polyploidy event (Table 1). Rare traces of the oldest paleo-WGD are explained by the broken collinearity resulting from recursive WGD and fractionation during 120 million years of evolutionary history after the emergence of the Eudicot plants .
Fast evolving genes may mediate stress response or development by changing protein metabolism
Genome-wide screening for fast evolving genes in plants has not been widely pursued. However, several proteins have been reported to belong to rapidly evolving gene families, including the nucleotide binding-leucine rich repeats (NB-LRRs) involved in plant resistance , and transcription factors (TFs) , as well as several protein-coding genes in plastids involved in RNA polymerase subunits and ribosomal proteins . In this study, a total of 636 potentially fast evolving genes were detected genome-wide. The distribution of the fast evolving genes was different among chronological genomes, chromosomes (Figure 3), and GBs (Additional file 3), suggesting a biased location of these genes. The fast evolving genes identified in our study had a high frequency of multi-retention suggesting that fast evolving genes may be affected by dosage-sensitive genes during recursive WGD events . Our GO terms and enrichment analysis of B. rapa fast evolving genes suggest that B. rapa has undergone positive evolution through rapid base substitution in these genes, enhancing environmental stress adaptability. The NB-LRR homologues in the whole-genome triplicated Brassica ancestor were deleted or lost quickly, and seem to have experienced species-specific amplification by tandem duplication [32, 33]. In our study, only three copies of NB-LRR genes were identified as fast evolving genes (Additional file 3) because tandem duplicated genes were filtered out based on our syntenic analysis criteria. Defense and developmental processes usually have three steps: recognition of signaling (biotic and abiotic stimulus for stress or hormones for development), signal transduction, and the expression of target genes. In our study, the GO-slim term ‘protein modification’ was enriched in fast evolving genes, and contained many sub-terms including the regulation of Ser/Thr signaling, effects on translation, and post-translational modifications enabling actual protein function (Figure 4B). These terms imply that defense and developmental processes may be enhanced in signaling levels and/or functional protein levels. Development and defense system signaling levels have been reported to be tightly linked . The results of our study suggest that stress response and developmental processes may have been enhanced by rapidly changing protein metabolism during the course of B. rapa evolution.
Evolutionary innovations of the recent B. rapagenome compared to its ancestral genome
We classified B. rapa genes into eight retention patterns (Table 3). The retention rates of B. rapa genes were similar to that of A. thaliana paralogs . The multiple collinear gene pairs in the recent and young genomes (Index 2) were older than genes specific to the recent genome with a higher standard deviation of Ks values, although both gene sets were classified into the same category of recent genome. Based on this evidence, we estimated functional bias among seven patterns of gene sets in a step-wise manner, excluding genes with lost synteny (Index 5 in Table 3). The results of these analyses suggest that genes retained in specific chronological genomes (Indexes 1 and 6) were enriched with the function of genetic material interpretations, such as DNA/RNA/protein metabolism and transcription (Figure 5A). This functional bias was similar to the single retention genes of A. thaliana
[24, 35]. However, genes with multiple synteny in different chronological genomes (Indexes 2–4) had frequent GO enrichments in ‘signal transducer’ or ‘transport’ mainly related to defense or development (Figure 5A), implying a more detailed process of adaptive evolution following WGD. Manual inspection of enriched GO terms mainly detected ‘development’ (Figure 5B) or ‘response to stimulus’ (Figure 5C) from the recent genome specific gene set. This data represented the functional innovative patterns following WGD or WGT events. Our study showed that the young genome was enriched with the GO-terms ‘reproductive structures (floral organs, seeds)’ , implying that reproductive organ development may be functionally diversified in the young genome . Interestingly, we observed that the GO terms ‘embryo’ and ‘embryo sac development’ were over-represented in genes specific to the recent genome, while ‘developments for vegetative tissues (leaf, root)’ were shared in two chronological genomes (Index 1 and 2). Embryos contain primordial tissue layers and drive morphogenetic diversity by regulating cell specification and cell-cell communication . Therefore, our GO enrichments cautiously suggest that the morphological diversity of B. rapa may be expanded during embryogenesis by concerted evolution. The specific GO enrichment patterns of signal response genes indicate that many pathogen (bacteria, fungi, and insects) or environmental stress (cold, heat, freezing, water deprivation) response genes were over-represented in both categories of Index 1 and 2 (Figure 5C), with duplicates continuously retained during recursive WGD events. Many phytohormones were also enriched in Index 2, which are important in regulating plant developments, as well as in defense by way of cross-talk signal transductions [34, 37, 38]. These retention patterns suggest that the B. rapa genome was more innovatively evolved to adapt to biotic/abiotic stress than to phytohormone stimuli.
Subfunctionalization is the primary fate of multi-retention genes in the recent genome
Functional diversification of surviving genes has been reported to be a major characteristic of long-term evolution in polyploids . Two times as many multi-retention genes were present in the B. rapa recent genome than there were in the A. thaliana recent genome, after the α-WGD event , in our study. There were 3.4 times as many two-copy retention genes than that of three copy retention genes. These results support the two-step theory of WGD events for the B. rapa mesohexaploid genome . GO functional annotation enrichments for single- and multi-retention gene sets were biased toward genetic control and the regulation of stress and/or development, respectively (Figure 6). In previous research duplicated genes were reported to acquire new functions via neofunctionalization or to alter their functions via subfunctionalization and pseudogenization . mRNA-Seq data have been published  providing tissue and developmental stage specific expression data, which enable us to study subfunctionalization for several developmental stages. We suggest that subfunctionalization is the major drive for the evolution of multi-retention genes (Table 4), because 34.27% of the homologues studied have spatially differentiated expression patterns, In previous research, 50% and 36–49% of homologous genes had undergone subfunctionalization in Glycine max and Triticum aestivum L., respectively [41, 42]. Comparatively lower distributions of B. rapa subfunctionalization were observed because of the limited number of the mRNA-Seq libraries used in this study. A. thaliana and B. rapa homoeologous gene expression patterns did not completely coincide in spite of their syntenic orthologous relationship (Table 5). Differences in expression patterns suggest a gain or alteration of function in duplicated B. rapa genes. Those genes, and/or their homologues, may have acquired new functions or altered their ancestral function after the Brassica WGT event. Several genes that we categorized as “dead” or “nonfunctionalization” classes could be expressed in other tissues or under specific conditions because of the dearth of public mRNA-seq data . Tissue-specific genes could be co-expressed in other tissues when expression conditions change. Our study shows a frequent subfunctionalization fate in duplicated genes, with small exceptions of nonfunctionalization via reciprocal gene loss after B. rapa WGD events. However, the mechanisms for gene loss, in both subfunctionalization and neofunctionalization, have not been fully resolved. Different duplicate retained gene expression levels were recently reported to be partially a result of epigenetic modifications such as methylation, histone modification, small RNAs, and transposable element genes (review in ). The patterns and processes driving gene retention and evolution in Brassica will be further elucidated through gene expression and function analysis, combined with epigenetic studies of the paralogous homologues.
Our interpretation of the B. rapa genome, based on sequence diversities, led to our construction of three chronological genomes. Furthermore, we identified fast evolving genes, and single- and multi-retention genes in the recent genome; long retention genes in young/old genomes; and three chronological genome specific genes. Both the fast evolving and the multi-retention genes were enriched with the GO terms ‘stress response’ and ‘development.’ However, detailed functions appeared to be related to the regulation of signal cascades and/or transport systems, rather than in recognition; while gene functions under ‘transcription and translation’ were highly enriched in recent or young genome specific gene sets. High numbers of multi-retention genes in the recent genome had undergone subfunctionalization, rather than neofunctionalization. The results of the present study will be useful in understanding innovative features of the B. rapa genome following Brassica WGT, and contribute to experimental design for studying Brassica diversity.
Construction of syntenic regions
We downloaded the model Brassicaceae plant A. thaliana’s genomics resource from The Arabidopsis Information Resource (TAIR, ver. 10) website (ftp://ftp.arabidopsis.org/Sequences/whole_chromosomes/), and the B. rapa genome from the BRAssica Database (BRAD, ftp://brassicadb.org/Bra_Chromosome_V1.2/). We identified all possible homologous proteins between A. thaliana and B. rapa using BLASTp, with a cut-off Expectation Value less than 1e-5, to build syntenic regions. After removing redundancies, the collinear gene-pairs within an adjacent 10 Kb were identified as syntenic segments using MCScan (ver. 0.8). MCScan is able to search for collinear protein pairs throughout a genome , and is downloaded from the Plant Genome Duplication Database (PGDD, http://chibba.agtec.uga.edu/duplication/mcscan/). The “mcl” algorithm in MCScan was used with the default parameters “--abc --abc-neg-log -abc-tf ‘mul (0.4343), ceil (200)’” to define syntenic segments.
Estimation of the levels of Ksand Ka
Protein sequences for each collinear protein pairs were aligned by clustalw2 and Ks and Ka were estimated using the maximum likelihood method in the PAML and PAL2NAL package [44, 45]. Finally, we applied the Yang-Nielson method.
Identification of fast evolving genes
The standard deviations of Ks for a syntenic block were calculated using in-house scripts with the equation below
where xi is Ks for individual gene pairs in a syntenic block and is average Ks for a syntenic block. Z-score and its p-value were calculated with the scipy.stats module in the python package. We defined fast-evolving genes with a p-value less than 0.001.
GO enrichment analysis
GO terms for A. thaliana proteins were downloaded from TAIR web-site (ftp://ftp.arabidopsis.org/home/tair/Ontologies/Gene_Ontology/). GO and GO-Slim terms for B. rapa genes were assigned based on their A. thaliana syntenic counterpart. GO enrichments were analyzed by Fisher’s exact test (in the python module (ver.0.1.4)) specifying a p-value < 0.001 .
Identifying of B. rapahomoeologues and the fate of genes
Homologues were defined as those collinear gene pairs, as filtered by BLASTp and MCScan above, observed in both the B. rapa and A. thaliana genome. The fate of the B. rapa homologues was determined using gene expression pattern and sequence diversification rates compared to pivotal genes. To analyze gene expression patterns the mRNA-Seq data of B. rapa was downloaded from the BRAD database (http://brassicadb.org/brad/genomeDominanceData.php), including reads per kilobase of exon model per million mapped reads (RPKM) values as expression evidence from three tissues (leaf, root, and stem of B. rapa accession Chiifu-401-42), as well as two pooled mRNA libraries for B. rapa Chiifu-401-42, and a cultivar line L 58 . The genes with an RPKM value of “0” were defined as pseudo-genes without expression evidence. Based on RPKM, differentially expressed genes in specific libraries were analyzed using Audic’s test (p-value < 0.001) . Enrichment values were applied to define “subfunctionalization”. “Neofunctionalization” was defined as genes with Ka/Ks values larger than one.
This work was financially supported by grants from Cabbage Genomics assisted breeding supporting Center (CGC) research programs and Golden Seed Project (2013003-04-1-SB330) funded by the Ministry of Agriculture, Food and Rural Affairs, Republic of Korea.
Korea Research Institute of Bioscience and Biotechnology (KRIBB)
Biosystems & Bioengineering, University of Science and Technology
College of Agriculture and Life Science, Chungnam National University
Division of Environment-Friendly Research, Gyeongsangnam-do Agricultural Research and Extension Services
Department of Horticulture, Sunchon National University
Nagaharu U: Genome analysis in Brassica with special reference to the experimental formation ofB. napusand peculiar mode of fertilization.Japan J Bot 1935, 7:389–452.
Priya P, Jagannath A, Bisht NC, Padmaja KL, Sharma S, Gupta V, Pradhan AK, Pental D: Comparative mapping ofBrassica junceaandArabidopsis thalianausing Intron Polymorphism (IP) markers: homoeologous relationships, diversification and evolution of the A, B and C Brassica genomes.BMC Genomics 2008, 9:113.View Article
Lagercrantz U: Comparative mapping betweenArabidopsis thalianaandBrassica nigraindicates that Brassica genomes have evolved through extensive genome replication accompanied by chromosome fusions and frequent rearrangements.Genetics 1998,150(3):1217–1228.PubMed CentralPubMed
Parkin IA, Gulden SM, Sharpe AG, Lukens L, Trick M, Osborn TC, Lydiate DJ: Segmental structure of theBrassica napusgenome based on comparative analysis withArabidopsis thaliana.Genetics 2005,171(2):765–781.PubMed CentralPubMedView Article
Wang X, Wang H, Wang J, Sun R, Wu J, Liu S, Bai Y, Mun JH, Bancroft I, Cheng F, Huang S, Li X, Hua W, Wang J, Wang X, Freeling M, Pires JC, Paterson AH, Chalhoub B, Wang B, Hayward A, Sharpe AG, Park BS, Weisshaar B, Liu B, Li B, Liu B, Tong C, Song C, Duran C, et al.: The genome of the mesopolyploid crop speciesBrassica rapa.Nat Genet 2011,43(10):1035–1039.PubMedView Article
Yu J, Zhao M, Wang X, Tong C, Huang S, Tehrim S, Liu Y, Hua W, Liu S: Bolbase: a comprehensive genomics database forBrassica oleracea.BMC Genomics 2013, 14:664.PubMed CentralPubMedView Article
Levin DA: Polyploidy and novelty in flowering plants.Am/Nat 1983, 122:1–25.View Article
Stephens SG: Possible significance of duplications in evolution.Adv Genet 1951, 4:247–265.PubMedView Article
Inititiative TG: Analysis of the genome sequence of the flowering plantArabidopsis thaliana.Nature 2000,408(6814):796–815.View Article
Tang H, Bowers JE, Wang X, Ming R, Alam M, Paterson AH: Synteny and collinearity in plant genomes.Science 2008,320(5875):486–488.PubMedView Article
Franzke A, Lysak MA, Al-Shehbaz IA, Koch MA, Mummenhoff K: Cabbage family affairs: the evolutionary history of Brassicaceae.Trends Plant Sci 2011,16(2):108–116.PubMedView Article
Yang YW, Lai KN, Tai PY, Li WH: Rates of nucleotide substitution in angiosperm mitochondrial DNA sequences and dates of divergence between Brassica and other angiosperm lineages.J Mol Evol 1999,48(5):597–604.PubMedView Article
Town CD, Cheung F, Maiti R, Crabtree J, Haas BJ, Wortman JR, Hine EE, Althoff R, Arbogast TS, Tallon LJ, Vigouroux M, Trick M, Bancroft I: Comparative genomics ofBrassica oleraceaandArabidopsis thalianareveal gene loss, fragmentation, and dispersal after polyploidy.Plant Cell 2006,18(6):1348–1359.PubMed CentralPubMedView Article
Lysak MA, Berr A, Pecinka A, Schmidt R, McBreen K, Schubert I: Mechanisms of chromosome number reduction inArabidopsis thalianaand related Brassicaceae species.Proc Natl Acad Sci U S A 2006,103(13):5224–5229.PubMed CentralPubMedView Article
Schranz ME, Lysak MA, Mitchell-Olds T: The ABC's of comparative genomics in the Brassicaceae: building blocks of crucifer genomes.Trends Plant Sci 2006,11(11):535–542.PubMedView Article
Mandáková T, Lysak MA: Chromosomal phylogeny and karyotype evolution in x = 7 crucifer species (Brassicaceae).Plant Cell 2008,20(10):2559–2570.PubMed CentralPubMedView Article
Cheng F, Mandáková T, Wu J, Xie Q, Lysak MA, Wang X: Deciphering the diploid ancestral genome of the MesohexaploidBrassica rapa.Plant Cell 2013,25(5):1541–1554.PubMed CentralPubMedView Article
Hufton AL, Panopoulou G: Polyploidy and genome restructuring: a variety of outcomes.Curr Opin Genet Dev 2009,19(6):600–606.PubMedView Article
Blanc G, Hokamp K, Wolfe KH: A recent polyploidy superimposed on older large-scale duplications in theArabidopsisgenome.Genome Res 2003,13(2):137–144.PubMed CentralPubMedView Article
Edger PP, Pires JC: Gene and genome duplications: the impact of dosage-sensitivity on the fate of nuclear genes.Chromosome Res 2009,17(5):699–717.PubMedView Article
Freeling M, Thomas BC: Gene-balanced duplications, like tetraploidy, provide predictable drive to increase morphological complexity.Genome Res 2006,16(7):805–814.PubMedView Article
Rosado A, Raikhel NV: Application of the gene dosage balance hypothesis to auxin-related ribosomal mutants in Arabidopsis.Plant Signal Behav 2010,5(4):450–452.PubMed CentralPubMedView Article
Blanc G, Wolfe KH: Functional divergence of duplicated genes formed by polyploidy during Arabidopsis evolution.Plant Cell 2004,16(7):1679–1691.PubMed CentralPubMedView Article
Van de Peer Y, Maere S, Meyer A: The evolutionary significance of ancient genome duplications.Nat Rev Genet 2009,10(10):725–732.PubMedView Article
Lysak MA, Koch MA, Pecinka A, Schubert I: Chromosome triplication found across the tribe Brassiceae.Genome Res 2005,15(4):516–525.PubMed CentralPubMedView Article
Bolger ME, Weisshaar B, Scholz U, Stein N, Usadel B, Mayer KF: Plant genome sequencing - applications for crop improvement.Curr Opin Biotechnol 2014, 26:31–37.PubMedView Article
De Bodt S, Maere S, Van de Peer Y: Genome duplication and the origin of angiosperms.Trends Ecol Evol 2005,20(11):591–597.PubMedView Article
Zhang X, Feng Y, Cheng H, Tian D, Yang S, Chen JQ: Relative evolutionary rates of NBS-encoding genes revealed by soybean segmental duplication.Mol Genet Genomics 2011,285(1):79–90.PubMedView Article
Van der Hoeven R, Ronning C, Giovannoni J, Martin G, Tanksley S: Deductions about the number, organization, and evolution of genes in the tomato genome based on analysis of a large expressed sequence tag collection and selective genomic sequencing.Plant Cell 2002,14(7):1441–1456.PubMed CentralPubMedView Article
Sloan DB, Alverson AJ, Wu M, Palmer JD, Taylor DR: Recent acceleration of plastid sequence and structural evolution coincides with extreme mitochondrial divergence in the angiosperm genusSilene.Genome Biol Evol 2012,4(3):294–306.PubMed CentralPubMedView Article
Kim J, Lim CJ, Lee BW, Choi JP, Oh SK, Ahmad R, Kwon SY, Ahn J, Hur CG: A genome-wide comparison of NB-LRR type of resistance gene analogs (RGA) in the plant kingdom.Mol Cells 2012,33(4):385–392.PubMed CentralPubMedView Article
Kazan K, Manners JM: Linking development to defense: auxin in plant–pathogen interactions.Trends Plant Sci 2009,14(7):373–382.PubMedView Article
Raes J, Vandepoele K, Simillion C, Saeys Y, Van de Peer Y: Investigating ancient duplication events in theArabidopsisgenome.J Struct Funct Genomics 2003,3(1–4):117–129.PubMedView Article
Jiao Y, Wickett NJ, Ayyampalayam S, Chanderbali AS, Landherr L, Ralph PE, Tomsho LP, Hu Y, Liang H, Soltis PS, Soltis DE, Clifton SW, Schlarbaum SE, Schuster SC, Ma H, Leebens-Mack J, de Pamphilis CW: Ancestral polyploidy in seed plants and angiosperms.Nature 2011,473(7345):97–100.PubMedView Article
Wendrich JR, Weijers D: The Arabidopsis embryo as a miniature morphogenesis model.New Phytol 2013,199(1):14–25.PubMedView Article
Pernisová M, Kuderová A, Hejátko J: Cytokinin and auxin interactions in plant development: metabolism, signalling, transport and gene expression.Curr Protein Pept Sci 2011,12(2):137–147.PubMedView Article
Cheng F, Wu J, Fang L, Sun S, Liu B, Lin K, Bonnema G, Wang X: Biased gene fractionation and dominant gene expression among the subgenomes ofBrassica rapa.PloS One 2012,7(5):e36442.PubMed CentralPubMedView Article
Moore RC, Purugganan MD: The evolutionary dynamics of plant duplicate genes.Curr Opin Plant Biol 2005,8(2):122–128.PubMedView Article
Roulin A, Auer PL, Libault M, Schlueter J, Farmer A, May G, Stacey G, Doerge RW, Jackson SA: The fate of duplicated genes in a polyploid plant genome.Plant J 2012,74(1):143–153.
Pont C, Murat F, Confolent C, Balzergue S, Salse J: RNA-seq in grain unveils fate of neo- and paleopolyploidization events in bread wheat (Triticum aestivumL.).Genome Biol 2011,12(12):R119.PubMed CentralPubMedView Article
He G, Elling AA, Deng XW: The epigenome and plant development.Annu Rev Plant Biol 2011, 62:411–435.PubMedView Article
Galtier N, Gouy M: Inferring pattern and process: maximum-likelihood implementation of a nonhomogeneous model of DNA sequence evolution for phylogenetic analysis.Mol Biol Evol 1998,15(7):871–879.PubMedView Article
Suyama M, Torrents D, Bork P: PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments.Nucleic Acids Res 2006, 34:W609–612.PubMed CentralPubMedView Article
Yang Z, Nielsen R: Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models.Mol Biol Evol 2000,17(1):32–43.PubMedView Article
Audic S, Claverie JM: The significance of digital gene expression profiles.Genome Res 1997,7(10):986–995.PubMed
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.