Chromosome doubling to overcome the chrysanthemum cross barrier based on insight from transcriptomic and proteomic analyses

Background Cross breeding is the most commonly used method in chrysanthemum (Chrysanthemum morifolium) breeding; however, cross barriers always exist in these combinations. Many studies have shown that paternal chromosome doubling can often overcome hybridization barriers during cross breeding, although the underlying mechanism has seldom been investigated. Results In this study, we performed two crosses: C. morifolium (pollen receptor) × diploid C. nankingense (pollen donor) and C. morifolium × tetraploid C. nankingense. Seeds were obtained only from the latter cross. RNA-Seq and isobaric tags for relative and absolute quantitation (iTRAQ) were used to investigate differentially expressed genes and proteins during key embryo development stages in the latter cross. A previously performed cross, C. morifolium × diploid C. nankingense, was compared to our results and revealed that transcription factors (i.e., the agamous-like MADS-box protein AGL80 and the leucine-rich repeat receptor protein kinase EXS), hormone-responsive genes (auxin-binding protein 1), genes and proteins related to metabolism (ATP-citrate synthase, citrate synthase and malate dehydrogenase) and other genes reported to contribute to embryo development (i.e., LEA, elongation factor and tubulin) had higher expression levels in the C. morifolium × tetraploid C. nankingense cross. In contrast, genes related to senescence and cell death were down-regulated in the C. morifolium × tetraploid C. nankingense cross. Conclusions The data resources helped elucidate the gene and protein expression profiles and identify functional genes during different development stages. When the chromosomes from the male parent are doubled, the genes contributing to normal embryo developmentare more abundant. However, genes with negative functions were suppressed, suggesting that chromosome doubling may epigenetically inhibit the expression of these genes and allow the embryo to develop normally. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2939-0) contains supplementary material, which is available to authorized users.


Background
Intergeneric/interspecific crosses between cultivars and their wild species are widely used to improve the tolerance of crop plants to biotic and abiotic stresses. The offspring of these crosses might have greater environmental adaptability and species diversity. The embryos develop normally, and seeds can be successfully obtained, which are the considerations when evaluating crossing efficiency. Many factors influence the efficiency of interspecific crosses. Differences in the ploidy levels of the parental plants were thought to be one major barrier causing endosperm malformation and the inhibition of germination [1]. Indeed, hybrid embryos are often spontaneously aborted due to the absence of endosperm or retarded development after successful interspecific pollination, which is a typical post-fertilization barrier that strongly hampers embryo development [2]. There is growing evidence that polyploid breeding has the potential to overcome this barrier and has emerged as one of the most efficient methods. The development of the major crop species common wheat (Triticum aestivum) is a universally acknowledged textbook example of an allohexaploid derived through hybridization between a domesticated forms of the tetraploid T. turgidum ssp. dicoccoides and the diploid Aegilops tauschii [3,4]. In 1988, Badger found that tetraploid azaleas could overcome interspecific barriers. The cross Rhododendron calendulaceum x tetraploid evergreen Obtusum was highly successful, and produced many viable seeds. However, the cross between R. calendulaceum and the diploid evergreen Obtusum was seldom successful [5]. Although chromosome doubling can also overcome reproductive barriers during the cross breeding of many other plant species, the underlying mechanism remains unknown.
C. morifolium is an important ornamental crop that is similar to the rose and lily with a high demand in the consumer market. To meet increasing consumer needs, breeders must search for novel traits, improve plant qualities and increase the resistance of existing species to biotic or abiotic stresses. Cross breeding is the conventional path to improve genetic variability and develop modified species. Successful intergeneric hybridization efforts have been reported between the chrysanthemum and related genera, such as C. lavandulifolium × Ajania remotipinna, A. remotipinna × C. chanetii [6] and Opisthopappus taihangensis × C. lavandulifolium [7]. However, interspecific crosses with the chrysanthemum are difficult because of its limited genetic diversity, especially between the hexaploid chrysanthemum and diploid wild species. In a cross between C. morifolium and diploid C. nankingense, six interspecific hybrids were generated but required ovary rescue [8]. In a cross chrysanthemum cultivar 'Zhongshanzixing' × diploid C. nankingense, no seeds were obtained, whereas seeds were obtained in the cross between 'Zhongshanzixing' and tetraploid C. nankingense [9]. These results suggest that polyploidization of the male parent can overcome cross barriers between incongruous groups. In these studies, the reason for interspecific cross inhibition was attentively examined. A pre-or post-fertilization barrier was confirmed to exist between different hybrids in a large number of morphological and cytological studies [8,10]. However, the expression patterns of genes and proteins underlying the morphological and cytological traits remain unclear. The effects of polyploidization on interspecific chrysanthemum crosses have not been studied, and the mechanisms by which different ploidy levels regulate chrysanthemum embryonic development remain elusive.
Currently, RNA-Seq is often used for gene discovery and transcript abundance measurements during a certain state in particular organs or tissues and has been successfully applied in different species [11]. Recently, transcriptome and proteome methodologies were applied to developing Medicago truncatula seeds [12] and Arabidopsis seed germination [13]. Deep sequencing analysis was applied to the peanut pod transcriptome to identify candidate genes related to early embryo abortion [14]; this approach was also used to study the early Arabidopsis embryo [15]. In a previous study, we explored genes and proteins associated with chrysanthemum embryo abortion in the cross C. morifolium × diploid C. nankingense using RNA-Seq and isobaric tags for relative and absolute quantitation (iTRAQ), focusing on the perspective of embryo abortion [16]. Here, we performed the cross C. morifolium × tetraploid C. nankingense, which decreased the embryo abortion rate and produced seeds, but the genes and proteins that aided normal embryo development were unclear. Therefore, the combination of transcriptomic and proteomic data is suitable to study chrysanthemum embryo development and explore the differential expression patterns in different interspecific crosses and samples. Our objectives were as follows: (i) to analyze gene and protein expression in the developing chrysanthemum embryo and characterize expression changes at different stages of embryonic development from the cross C. morifolium × tetraploid C. nankingense and to identify over-expressed genes and proteins that promote normal embryo development; (ii) to compare the expression of key genes and proteins related to chrysanthemum embryo abortion in the two different crosses using a previous study [16] and point out the genes and proteins with positive roles promoting chrysanthemum embryo development. This study will also provide a better understanding of the mechanism underlying the different seed setting rates when the male parent has different ploidy levels.

Ovule development and seed setting
In the cross C. morifolium × tetraploid C. nankingense, many ovules were morphologically well developed with a full form at 12 days after pollination (DAP); a total of 93.5 ± 1.7 % of the ovules were normal. The percentage was 89.7 ± 3.1 % in the cross C. morifolium × diploid C. nankingense, similar to that of the C. morifolium × tetraploid C. nankingense cross. Based on continuous observations, many ovules were shriveled, and only 54.8 ± 0.9 % of the ovules appeared normal at 18 DAP; however, this value was significantly higher than 43.3 ± 1.8 %, the corresponding rate in the C. morifolium × diploid C. nankingense cross. Eventually, a few seeds were obtained generating a seed set rate of 1.45 ± 0.03 % in this study. No seeds were obtained when the diploid C. nankingense was the male parent [16] (Table 1). In this study, transmission electron microscopy (TEM) analysis showed that the normal cells at 12 DAP contained many organelles, mitochondria and plastids with normal shapes. Moreover, the cells exhibited an intact cell wall structure and rich edge information. In the normal embryos at 18 DAP, although the cytoplasm contracted slightly as the embryos developed, the organelles were well developed and metabolism was robust in the mitochondria. However, abnormal embryos at 18 DAP showed significant differences, including clearly shrinking nuclei and a decrease in mitochondria and plastid. Additionally, the organelle structures were aberrant, with degradation and obvious variation and thickening of the cell wall ( Fig. 1).

Transcriptome sequencing and read assembly
Using Illumina high-throughput sequencing, we constructed three cDNA libraries from normal embryos at 12 DAP (NE12), normal embryos at 18 DAP (NE18) and abnormal embryo at 18 DAP (AE18) from the C. morifolium × tetraploid C. nankingense cross. After cleaning the raw data, we obtained 52,206,996, 51,935,854 and 51,720,046 clean reads from NE12, NE18 and AE18, respectively, containing 4,698,629,640, 4,674,226,860 and 4,654,804,140 clean nucleotides (Table 2). Based on  these high-quality clean reads, a total of 99,119 unigenes  were assembled with a mean length of 726 nt, including  45,770 clusters and 53,349 singletons. The numbers of  unigenes for NE12, NE18 and AE18 were 88,909, 91,971 and 97,889, respectively, with a mean length of 550-580 nt (Table 3).

Unigene functional annotation
Annotation analysis of the chrysanthemum embryo provided information on gene expression and the function of all unigenes detected at different developmental stages. The annotation consisted of protein functional annotation, pathway annotation, Clusters of Orthologous Groups of proteins (COG) functional annotation and gene ontology (GO) functional annotation. As a result, 58,799 unigenes were annotated with the NR, NT, Swiss-Prot, KEGG, COG and GO databases; the numbers of annotations obtained from each database were 56,665, 39,101, 36,897, 33,594, 20,391 and 43,526, respectively ( Table 4). The largest number of annotated unigenes was obtained from the NR database.
To investigate unigene function and evaluate the effectiveness of the annotation process, 20,391 sequences accounting for 34.68 % of the total annotated unigenes were assigned a COG functional annotation. Among the 25 COG categories, the clusters in the top three were 'general function prediction only' (6749); 'transcription' (3844); and 'replication, and recombination and repair' (3323). The number of unigenes in each cluster accounted for more than 15 % of all unigenes with a COG annotation. The two clusters with the fewest unigenes were 'extracellular structures' (14) and 'nuclear structure' (11); the percentage of unigenes in each category was less than 1 % (Additional file 1: Figure S1).
Using NR annotation, we obtained GO functional annotations describing the properties of genes and their products in chrysanthemum. Based on sequence homology, 43,526 sequences could be categorized into 55 functional groups, including the three main GO classifications (biological process, cellular component, and molecular function). There were 24, 15 and 16 functional subcategories in each main classification; 'cellular process' , 'cell' and 'cell part' (equal), and 'catalytic activity' were the most common subcategories (Additional file 2: Figure S2). Additionally, the classes 'organelle' and 'metabolic process' contained many unigenes that might play important roles in the metabolic pathways involved in chrysanthemum embryo development.
Next, the KEGG pathway database was applied to identify the biological pathways activated in the chrysanthemum embryo. A total of 33,594 annotated unigenes were assigned to 128 KEGG pathways (Additional file 3: Table S1). The majority of these pathways were 'metabolic  Genes related to embryo development and differentially expressed genes at three stages Using FPKM (Fragments Per kb per Million fragments), we explored the gene expression levels in normal and abnormal embryos. In three comparisons (NE12 vs NE18, NE18 vs AE18, and NE12 vs AE18), the number of differentially expressed genes (DEGs) was 6537 (4302 were up-regulated), 3276 (1119 were up-regulated), and 7794 (5082 were up-regulated). The first and third comparisons exhibited more up-regulated genes than down-regulated genes (Fig. 2). The details of the DEGs are presented in Additional file 4: Table S2. During the embryonic developmental stage from 12 DAP to 18 DAP (including both normal and abnormal embryos), the number of up-regulated genes at 18 DAP was nearly twice as high as the number of down-regulated genes. Interestingly, the number of down-regulated genes in abnormal embryos was higher than the number of up-regulated genes during the later developmental stage (18 DAP) (Fig. 2). After deep analysis of these DEGs, we found that several categories of genes had similar expression patterns in normal and abnormal embryos. Then, we used the CDS sequences of these DEGs as a query to identify the same unigenes in the transcriptome library sequenced from chrysanthemum embryos in the C. morifolium × diploid C. nankingense cross [16]. We analyzed the expression of some of the genes to obtain clues concerning their roles during chrysanthemum embryo development (i.e., CmLEA, CmLEC, CmEM, CmSSP, CmOLE, CmTUB and CmEXT). The expression patterns of these genes differed between the two crosses and various samples. For instance, CmTUB in NE18 had the highest expression level in the cross C. morifolium × tetraploid C. nankingense; however, in the cross C. morifolium × diploid C. nankingense, it was most highly expressed during NE12 (Fig. 3).

Pathway classification by KEGG
KEGG is a database that analyzes gene products during metabolism and related gene functions involved in   4). Auxin response factor (ARF) is the key factor in the auxin signal transduction pathway and receives auxin signals, leading to activation or repression of downstream genes. Two genes (AUX/IAA and GH3) involved in cell enlargement and plant growth reported to be regulated by ARF were down-regulated in the abnormal embryos when compared with the normal embryos (Fig. 5).

qRT-PCR validation
To validate the quality of the RNA-Seq data in this study, quantitative real-time RT-PCR (qRT-PCR) was performed on 29 randomly selected genes with differential expression levels. The differential expression patterns in the three samples, and the expression trend of almost all of the genes were consistent with the sequencing data ( Fig. 6). Most of these genes were related to plant embryonic development (i.e., globulin seed storage protein, ethylene-responsive transcription factor, auxin-induced protein, embryonic protein, leafy cotyledon 1-like protein and senescence-related protein). Moreover, 40 DEGs (in Tables 5 and 6) in two crosses were validated, and 35 of 40 DEGs (87.5 %) showed the same type of altered expression as RNA-Seq (Additional file 5: Figure S3 and Additional file 6: Figure S4).
Comparative analysis of identical genes during the same developmental stage between the two crosses Based on the CDS sequence comparison between the two NE12 libraries from the two crosses, we identified 18 candidate genes ( Table 5) that had not been analyzed in the C. morifolium × diploid C. nankingense cross, that might enhance chrysanthemum embryo development, including transcription factors, energy metabolismrelated genes and other genes that might function in embryo development. Then, the same comparative analysis was applied to the NE18 libraries. A total of 22 candidates ( Table 6) were found that facilitate the transformation from heart embryos to torpedo and cotyledonary embryos during seed development. These candidates were similar to NE12 except for unigene30868 (fumarate hydratase 1) and unigene9484 (defender against cell death), which were differentially expressed only at this stage. During both stages, we found that the auxin-binding protein, citrate synthase and other genes associated with energy metabolism were more highly expressed in cross I (C. morifolium × tetraploid C. nankingense) than in cross II (C. morifolium × diploid C. nankingense). Conversely, genes related to senescence and programmed cell death were down-regulated (Tables 5  and 6).

Differentially-expressed proteins during chrysanthemum embryo development
In the embryos from cross I (C. morifolium × tetraploid C. nankingense), a total of 23 differentially expressed proteins (DEPs) ( Table 7) were identified, 10 of which were considered predicted, hypothetical or unknown proteins, which was not useful for the elucidation of their functions. Thus, we aligned their peptides to the CDS sequences from the chrysanthemum embryo transcriptome library, and as a result, only two of the proteins remained unannotated, suggesting that studying the transcriptome was conducive to proteomics research for crops without genome sequences. Most of these DEPs were involved in energy metabolism (i.e., acetoacetyl CoA thiolase, pyruvate kinase, isopropylmalate synthase and malate dehydrogenase) and had high expression levels in normal embryos at 18 DAP.
To analyze the expression levels of these 23 DEPs in cross II (C. morifolium × diploid C. nankingense), peptides were searched in a proteomic library. As a result, 10   The expression patterns of DEGs in two transcriptome libraries. Blue columns represent the cross C. morifolium × tetraploid C. nankingense, and green columns represent the cross C. morifolium × diploid C. nankingense proteins were aligned with the same peptides, and their expression levels were variable. The proteins associated with energy metabolism, such as pyruvate kinase (gi|92870921), transitional endoplasmic reticulum ATPase (gi|326527541) and cytosolic nucleoside diphosphate kinase (gi|73808794), had significantly higher NE18/AE18 ratios in cross I compared with those of cross II (Table 8).

Discussion
Polyploid breeding can improve breeding efficiency by adjusting the optimal chromosome number to successfully match in interspecific crosses [17]. In this study, a few seeds were obtained following hybridization between a hexaploid maternal parent and a tetraploid male parent. Conversely, the cross with the diploid male failed to produce seeds [16], suggesting that the ploidy of the male parent might affect the formation of hybrid plants.
The closer the chromosome ploidy of the parents, the more likely the cross will be successful. The same phenomenon has been previously reported in chrysanthemum interspecific hybridization, in which the cross C. morifolium × C. nankingense failed, and cross ability was greatly affected by the pollen grains on stigmas and embryo abortion [10]. In another cross, C. morifolium × diploid C. nankingense, six hybrids were created, but they required ovary rescue, suggesting that the post-fertilization barrier played an important role in embryo abortion [8]. In the previous study in which C. morifolium 'Zhongshanzixing' was the female parent, the cross using diploid C. nankingense failed to produce seeds, whereas hybrids were obtained when the male parent was a tetraploid C. nankingense [9]. Therefore, the doubled chromosome number of C. nankingense likely enabled a successful interspecific cross in the cultivated chrysanthemum.
Currently, high-throughput sequencing technology has been used to study embryo abortion research [14,16] as well as for de novo genome assembly, molecular marker and genome diversity studies, the discovery of novel genes and investigations into gene expression patterns. Therefore, studying the genes or proteins related to embryo or endosperm development in various crosses is also helpful.
Transcriptome and proteome data analysis during chrysanthemum embryo development In this study, COG and GO functional annotations were performed. In the COG functional annotation (Additional file 1: Figure S1) from the three transcriptomic libraries, the top five of the 25 obtained COG categories were same as cross II [16]. Interestingly, the same situation was observed in the GO functional annotation (Additional file 2: Figure S2), suggesting that  the genes involved in the regulation of these processes had widespread expression during chrysanthemum embryonic development. Next, we detected 3276 DEGs (Fig. 2) and 23 DEPs (Table 7) between the normal and abnormal embryos at 18 DAP that might be related to embryo and endosperm development in this cross. Finally, using KEGG annotation, two pathways were chosen in NE18 and AE18: 'Citrate cycle (TCA cycle)' and 'Plant hormone signal transduction'. Between two crosses, qRT-PCR confirmed the reliability of the 35 DEGs in Tables 5  and 6. Study has shown that the DEGs detection depended on the pooled samples or individual samples [18]. Thus, pooled samples were used in this study, which might have caused the false positivity observed for the other 5 DEGs. RNA-Seq of the individual samples is a better way to detect DEGs. However, because of the difficulty collecting material and the limitation of florescence, pooled samples are also worthwhile when qRT-PCR verification is used as a complementary approach to exclude false positive of DEGs.

Expression of genes involved in energy metabolism pathways
Energy metabolism is necessary for nearly all biological process, including plant embryo development. In our results, the KEGG pathway analysis between NE18 and AE18 in cross I demonstrated the importance of energy metabolism for normal embryo development. Most of the enzymes involved in the TCA cycle exhibited reduced expression levels in abnormal embryos at 18 DAP (Fig. 4), suggesting that decreased energy was not sufficient for continued embryo growth and resulted in abortion. The genes and proteins involved in energy metabolism (especially in cytosolic glycolysis and the mitochondrial TCA cycle) were identified as differentially expressed during embryogenesis (Fig. 4, Tables 5,6,7,8). In normal embryos at 12 DAP in the two crosses, the genes related to citrate synthase, aconitase protein, NADP-isocitrate dehydrogenase and malate dehydrogenase were up-regulated in cross I (Table 5); moreover, normal embryos at 18 DAP exhibited the same expression pattern, whereas the genes associated with the TCA cycle and glycolysis pathway in cross I were also more highly expressed (Table 6). During maize embryo development, genes related to metabolism showed altered expression; these gene transcripts accumulated at higher levels between 10 and 20 DAP during the active process of metabolism [19]. Using comparative proteome analysis, we identified some proteins associated with energy metabolism, such as pyruvate kinase (the ratios of NE18/AE18 for cross I and II were 2.26 and 1.41, respectively) and the transitional endoplasmic reticulum ATPase (the ratios were 2.62 and 1.65, respectively). In this study, the higher expression of these genes in NE12 and NE18 and the obviously higher ratios (NE18/AE18) of these proteins in cross I (Table 8) suggested that more active energy metabolism occurred in cross I during the development of chrysanthemum embryos, which  would help provide sufficient energy for improved embryo development and reduced rates of abortion, which increased the production of seeds.

Expression of genes involved in hormonal signaling
Plant hormones, such as auxins, cytokinins, and gibberellins are involved in the regulation of seed development [20]. For example, in an auxin-binding protein 1 (abp1) mutant, the embryos develop abnormally after the globular embryo stage because of a failure of cell elongation [21]. In NE12 and NE18 of the two crosses, the expression of ABP1 was significantly higher in cross I (Tables 5 and 6), suggesting that ABP1 might facilitate normal embryo development in chrysanthemums by mediating auxin-induced cell elongation and cell division. In cross I, the KEGG pathway related to auxin signal transduction indicated down-regulated expression of the DNA binding ARF activators (Fig. 5), which were shown to regulate the expression of auxinresponsive genes. These auxin-responsive genes belong to three major groups: Aux/IAA, Small Auxin Up RNA (SAUR) and Gretchen Hagen 3 (GH3) [22]. In abnormal

Expression of transcription factors
Transcription factors (TFs) are critically important during plant flower and fruit development. Several types of TFs were differentially expressed during chrysanthemum embryo development in the two crosses. However, how these TFs are regulated from early embryogenesis to maturity is unknown. Transcriptome analysis revealed that some of the transcription factors were significantly up-regulated in cross I, such as AGAMOUS-LIKE62 (AGL62), AGL80, exs and LEAFY COTYLEDON1 (LEC1), which enhance important functions necessary for in plant embryonic development. In Arabidopsis, the AGL80 has been demonstrated to be involved in endosperm development. In fem111 plants, female gametophytes contain a T-DNA insertion in AGL80, resulting in the reduced size of the central cell's nucleolus and vacuole and a failure to generate endosperm [23]. In NE12, the expression level of AGL80 (CL1478.Contig3) in cross I was far greater than in cross II (Table 5). In Arabidopsis, AGL80 was expressed exclusively in the endosperm at the stage from 8 to 72 h after pollination, and the expression level was strongest in young seeds, decreasing gradually as the seeds age [23]. Similar gene expression was observed in this study, in which the expression of AGL80 (Unigene33530) decreased in NE18 compared to NE12 ( Table 6). The globular-to-heart transition stage is essential for endosperm development and provides nutrient supplies for embryo development [20]. Thus, the high expression of AGL80 at 12 DAP observed in Cross I: C. morifolium × tetraploid C. nankingense; Cross II: C. morifolium × diploid C. nankingense no hit: These proteins in cross II have not been retrieved in cross I by their peptide.
our study might promote normal chrysanthemum endosperm development by ensuring the energy supply for embryo development at 18 DAP, suggesting that AGL80 in cross I was required for endosperm development and the improved success rate of the interspecific cross. Another indispensable transcription factor involved in embryogenesis is LEC1. LEC regulates embryo development and is required for normal development during the morphogenesis and maturation stages in Arabidopsis [24,25]. Studies on the functions of LEC demonstrated that LEC was required to specify the suspensor cell fate and cotyledon identity during early embryogenesis [26,27]. However, during late embryogenesis, LEC was required for the acquisition of desiccation tolerance and the expression of many maturationspecific genes [28]. Our findings showed that LEC1 was expressed mainly in cross I, especially in NE18 ( Fig. 3 and Table 6), and was significantly reduced in cross II (Table 6). However, there was no significant difference in NE12 between the two crosses. This LEC1 expression suggested it might be a transcriptional regulator of chrysanthemum seed development and required for normal early stage embryo development. Moreover, a deficiency in LEC1 during the late development stage might cause the seeds to lose their desiccation tolerance, resulting in abnormal development or abortion in the C. morifolium × diploid C. nankingense cross.

Expression of other genes related to embryo development
Plant embryogenesis is a complicated process regulated by numerous genes and regulators. Moreover, various proteins are synthesized steadily, such as late embryogenesis abundant (LEA) protein, embryonic protein, oleosin and proteins related to senescence or cell death. In plants, the LEA protein is associated with desiccation tolerance during embryo maturation when seeds undergo a developmentally regulated dehydration period. A study found that an A. thaliana mutant with a T-DNA insertion allele of the AtEM6 gene belonging to the group 1 LEA family might be required for normal seed development. Studies on regulation and expression patterns indicated that the LEA genes in diverse species, such as rice, barley and soybean, were primarily expressed in developing or mature embryos [29], whereas OsLEA1a transcripts accumulated to high levels in dried mature embryos [30]. Here, LEA expression in NE18 from cross I was higher than that from cross II (Fig. 3), suggesting that the LEA proteins were required for normal seed development and expressed primarily at the late embryonic stage, when they might function as a buffer to protect from the water loss that occurs during embryo maturation. Therefore, more LEA proteins were expressed at NE18 when tetraploid C. nankingense was the male parent, which might be one reason that we were able to obtain seeds.
In dicotyledons, seed storage proteins (SSPs) mainly accumulate in the endosperm [31], including the 7S and 11S globulin classes [32]. Arabidopsis lec mutants exhibit defective synthesis and the accumulation of specific storage molecules, including SSPs [28]. Additionally, oleosins, which are seed-specific lipid storage proteins, have specific functions in seed tissues controlling oil body structure and lipid accumulation [33]. We showed that the expression levels of the embryonic protein (CmEM), 11S globulin seed storage protein (CmSSP) and oleosin (CmOLE) were all most abundant in NE18 in cross I, and their expression in both NE12 and NE18 was substantially higher than that in cross II (Fig. 3). This analysis provides evidence that more storage proteins accumulated quickly during the phase from 12 -18 DAP when the interspecific cross occurred between the hexaploid C. morifolium and tetraploid C. nankingense. Thus, these specific proteins might contribute to the completion of embryonic development and maturation.
In contrast, particular genes and proteins related to senescence or cell death had different expression patterns and were down-regulated in cross I. Previous evidence indicated that programmed cell death occurred concomitantly with the development of the embryo and endosperm [16,34,35]. In this study, genes related to programmed cell death in NE12 (Table 5) and the senescence-induced receptor and regulator of cell death in NE18 (Table 6) were down-regulated in the C. morifolium × tetraploid C. nankingense cross. However, the gene defender against cell death (Unigene9484) was upregulated, suggesting that chromosome doubling of the male parent might epigenetically inhibit the expression of senescence-induced genes and cause normal embryo development.

Conclusions
We sequenced and characterized the transcriptome and proteome of normal and abnormal chrysanthemum ovules and analyzed the differentially expressed genes and proteins associated with embryogenesis in two interspecific crosses in which the female was diploid or tetraploid. The comparative analysis of the two crosses demonstrated important roles for energy metabolism, auxin signal transduction, transcription factors and proteins related to cell death during chrysanthemum embryo development. The results provide valuable evidence at the molecular level that doubling the chromosome number in C. nankingense might overcome the distant hybridization barrier in the cultivated chrysanthemum.

Plant materials and artificial hybridization
The chrysanthemum 'Yuhualuoying' (2n = 6X = 54) (Fig. 7a, b) is a ground-cover cultivar with desirable ornamental traits for landscape applications. The tetraploid C. nankingense (2n = 4X = 36) (Fig. 7c, d) has a stronger tolerance to both abiotic and biotic stresses and is an autopolyploid generated by colchicine doubling of the diploid C. nankingense (2n = 2X = 18) [36] (Fig. 7e, f). The plants were grown in the Chrysanthemum Germplasm Resource Preserving Center, Nanjing Agricultural University, China. The interspecific cross C. morifolium × tetraploid C. nankingense was performed according to a previously reported method [37]. For abundant and high quality samples, we planted a total of 60 chrysanthemum plants in three areas to generate approximately 1500 inflorescences (18-23 female ligulate florets or ray florets containing one ovule per inflorescence) for artificial hybridization. At the same time, the interspecific cross C. morifolium × diploid C. nankingense was performed with approximately 100 inflorescences to generate the seed setting rate statistics. Statistical analyses for the seed setting rate was performed with a one-way analysis of variance using SPSS version 20.0 (IBM SPSS Statistics, IBM Corporation, Chicago, IL), and the means were compared using Student's t-test with alpha = 0.05.

Sample preparation and RNA extraction
We collected chrysanthemum ovules during two developmental stages. At 12 DAP, almost all of the embryos were normal; therefore, we collected 0.6 g of normal ovules from each of the three planted areas, (0.2 g was stored separately, and the remaining three 0.4 g samples were mixed for NE12). Thereafter, the proportion of normal embryos was gradually reduced concomitant with embryonic development. At 18 DAP, some embryos reached the heart-shaped embryo stage, whereas others underwent degeneration; therefore, we collected more than 1.0 g of normal and abnormal ovules identically to NE12. All of the samples were immediately frozen in liquid nitrogen and stored at -80°C.
Meanwhile, for TEM observation, we collected the ovules from NE12, NE18 and AE18 from cross I, and immediately immersed them in 2.5 % (v/v) glutaraldehyde (in 0.1 mol/L phosphate buffer, pH 7.2), gently extracted the air using a syringe, and then stored the sample at 4°C. Next, the ovules were washed five times in the same phosphate buffer and post-fixed in 1.5 % osmium tetroxide for 5 h. Then the ovules were treated in graded PHEM buffer (60 mmol/L pipes; 25 mmol/L Hepes; 10 mmol/L EGTA; 2 mmol/L MgCl2; pH 7. 0) and 100 % ethanol, and then embedded in Epon 812. The sections were cut into 80 nm thickness using an LKB-V ultramicrotome (Bromma, Sweden) and stained with uranyl acetate and lead citrate. Finally, the sections were imaged under a transmission electron microscope (Hitachi H-7650) at 80 kV [38]. Total RNA was isolated using TRIzol reagent according to the manufacturer's protocol (Takara Bio Inc., Otsu, Japan). The RNA quantity and quality were determined using an Agilent 2100 RNA 6000 Kit (Agilent Technologies, Santa Clara, CA, USA) and electrophoresis on a 1 % agarose gel.

cDNA preparation and Illumina deep sequencing
After RNA extraction and DNase I treatment, mRNA was isolated using magnetic beads with Oligo (dT) and fragmented in mixed fragmentation buffer. Then cDNA was synthesized using the mRNA fragments. Short fragments were purified and resolved with EB buffer for end repair and single nucleotide A (adenine) addition. Then, the short fragments were connected with adapters, and the suitable fragments were selected for Fig. 7 Flower morphology of C. morifolium 'Yuhualuoying' tetraploid C. nankingense and diploid C. nankingense. a, b C. morifolium 'Yuhualuoying'. c, d Tetraploid C. nankingense. e, f Diploid C. nankingense. Bar = 1 cm PCR amplification. The Agilent 2100 Bioanalyses and ABI StepOnePlus Real-Time PCR System were used in quantification and qualification of the sample library [14]. At last, the libraries were sequenced using Illumina HiSeq™ 2000 according to the manufacturer's instructions (Illumina, San Diego, CA, USA) [39] at the Beijing Genomics Institute [(BGI)-Shenzhen, Shenzhen, China; http://www.genomics.cn/index.php].

Filtering raw reads and de novo assembly
The raw reads produced from sequencing contain adapters and unknown or low-quality bases. Therefore, obtaining clean reads by removing the unqualified reads is necessary. Then, transcriptome de novo assembly was performed with Trinity software [39]. The TGICL program (version 2.0; http://sourceforge.net/projects/tgicl) was used for de novo assembly of the data by Trinity. This program can splice and remove redundant unigenes from each sample's assembly to acquire non-redundant unigenes that are as long as possible. Then, the unigenes were divided into two classes by gene family clustering. The first clusters were denoted by the prefix CL and the suffix indicating the cluster number containing several similar unigenes (more than 70 %) in one cluster. The other cluster contained singletons with the prefix representing the unigene.
In the final step, a BlastX alignment (E-value < 0.00001) was performed between databases such as NR, Swiss-Prot, KEGG and COG, and the directions of the unigenes were decided according to the best alignment results. If the results conflicted in the different libraries, the sequence direction was determined by the priority order of NR, Swiss-Prot, KEGG and COG. If a unigene was not aligned to any of the above databases, the software ESTScan [40] was used to predict the sequence direction.

Unigene functional annotation
First, unigene sequences were aligned to protein databases including NR, Swiss-Prot, KEGG and COG by BlastX and to the nucleotide database NT by BlastN. Then, the putative functional annotations of the unigenes were retrieved. Next, we used the Blast2GO program [40] to obtain gene ontology (GO) (http:// www.geneontology.org) annotations of the unigenes based on the NR annotations. After GO annotations were obtained for each unigene, WEGO software [41] was used to obtain GO functional classifications for all unigenes. GO has three ontologies: molecular function, cellular component and biological process. Finally, we used the KEGG database to study the genes' functions in cellular processes and obtain pathway annotation for the unigenes based on KEGG annotation.

Differential unigene expression analysis
To predict unigene expression levels in different samples, we calculated the unigene expression levels using FPKM [42]. After the calculation, rigorous algorithms were used to identify differentially expressed genes between the two samples by referring to the Audic and Claverie's method [43]. In our analysis, these DEGs met the criteria of an FDR ≤ 0.001 and ratio larger than 2. Then, GO functional analysis and KEGG pathway analysis were performed for the DEGs.
In the GO functional analysis, all of the DEGs were mapped to each term of the GO database and the gene numbers associated with each GO term were calculated. After obtaining a gene list and gene numbers for each included GO term, we used the hypergeometric test to identify significantly enriched GO terms in the DEGs compared to the genome background. In KEGG pathway analysis, pathway enrichment analysis identifies significantly enriched metabolic pathways or signal transduction pathways that involve the DEGs compared with the whole genome background.

Quantitative real-time PCR analysis
To ensure the libraries' reliability, we randomly selected 29 differentially expressed genes and validated the data by qRT-PCR using three biological replicate samples. The qRT-PCR assays were conducted as described by Song et al [44] on a Mastercycler ep realplex device (Eppendorf, Hamburg, Germany). In addition, qRT-PCR also performed on the 40 DEGs shown in Tables 5 and 6 using three biological replicates from two crosses. Genespecific primers (sequences shown in Additional file 7: Table S3) were designed using PRIMER3 RELEASE 2.3.4 [45]; the reference sequence for the quantitative expression analysis was the Elongation Factor 1a (EF1a) gene, which is stably expressed in chrysanthemum [16,46]. Relative transcript abundances were calculated using the 2 −ΔΔCt method [47].

Protein preparation, sample labeling and iTRAQ analysis
For each sample, approximately 0.5 g of ovules was used for protein extraction by the trichloroacetic acid/acetone method [48]. The extracted proteins were acetone precipitated and redissolved in dissolution buffer [16]. The protein concentration was determined according to the Bradford assay. Prior to digestion, 100 μg of protein from each sample was denatured and alkylated and the cysteines were blocked according to the 8-plex iTRAQ reagent kit instructions (Applied Biosystems, California, USA). Then, the protein was digested [16]. The NE12, NE18 and AE18 samples were labeled with 113, 119, and 121 iTRAQ tags, respectively. The three labeled samples from each individual replicate experiment were combined and vacuum-dried. The pooled sample was