Global transcriptome analysis and characterization of Dryopteris fragrans (L.) Schott sporangium in different developmental stages

Background Dryopteris fragrans (D. fragrans) is a potential medicinal fern distributed in volcanic magmatic rock areas under tough environmental condition. Sporangia are important organs for fern reproduction. This study was designed to characterize the transcriptome characteristics of the wild D. fragrans sporangia in three stages (stage A, B, and C) with the aim of uncovering its molecular mechanism of growth and development. Results Using a HiSeq 4000, 79.81 Gb clean data (each sample is at least 7.95 GB) were obtained from nine samples, with three being supplied from each period, and assembled into 94,705 Unigenes, among which 44,006 Unigenes were annotated against public protein databases (NR, Swiss-Prot, KEGG, COG, KOG, GO, eggNOG and Pfam). Furthermore, we observed 7126 differentially expressed genes (DEG) (Fold Change > 4, FDR < 0.001), 349,885 SNP loci, and 10,584 SSRs. DEGs involved in DNA replication and homologous recombination were strongly expressed in stage A, and several DEGs involved in cutin, suberin and wax biosynthesis had undergone dramatic changes during development, which was consistent with morphological observations. DEGs responsible for secondary metabolism and plant hormone signal transduction changed clearly in the last two stages. DEGs homologous to those known genes associated with the development of reproductive organs of flowering plants have also been validated and discussed, such as AGL61, AGL62, ONAC010. In particular, a Unigene encoding TFL1, an important flower–development regulator in flowering plants, was identified and exhibited the highest expression level in stage B in D. fragrans sporangia. Conclusions This study is the first report on global transcriptome analysis in the development of sporangia of wild D. fragrans. DEGs related to development and homologous to flower-seed development in flowering plants were discussed. All DEGs involved in DNA replication and homologous recombination were consistent with morphological observations of paraffin slices. The results of this study provide rare resources for further investigation of the D. fragrans sporangium development, stress resistance and secondary metabolism. Electronic supplementary material The online version of this article (10.1186/s12864-018-4843-2) contains supplementary material, which is available to authorized users.


Background
The sporangium of ferns, the place that produces spores, plays an important role in the propagation of the plants, similar to that of the flower and fruit of flowering plants. At present, a number of investigations of the morphological development, evolution and functions of the sporangium have been reported [1][2][3][4], but little is known regarding the transcriptome of fern sporangium.
D. fragrans (L.) Schott, also known as lava grass in China, is a perennial fern in the Dryopteris genus of the Dryopteris family. This species belongs to the single Fragrantes clade of a sister to the rest of Dryopteris in a molecular circumscription study [5]. Leaves and sori have dense glandular trichomes. The plants have a slight fragrance and are hence called D. fragrans. D. fragrans is primarily distributed in Asia including China, Japan, Korea and Russia. In China, this species mainly grows in Heilongjiang, Jilin and Liaoning Provinces. This species could survive under harsh natural environmental conditions, such as uncovered volcanic magmatic rock areas. For instance, in Wudalianchi, Heilongjiang Province, one of the most prosperous places for D. fragrans in China, this grass grows on the lava rock where lava flowed hundreds of years ago without any shade. There are insect stressors because the habitat is next to farmlands, lakes, copses and orchards with fertile soil from the molten lava, and the lowest and highest temperature were − 35°C and 35°C, respectively, in 2016.
Different from other types of ferns, wild D. fragrans faces a variety of biotic and abiotic stressors during sporangia development, which provides a unique resource for investigating both the development and the stress resistance of sporangia.
In this study, we collected three periods of sporangia from those D. fragrans growing on the lava near Bagua Lake in Wudalianchi in July 2016. A transcriptome database was established using HiSeq 4000, and it will further our understanding of the development of fern sporangia with natural stress responses.

Methods
D. fragrans sporangia (include sporangiophore and indusium) were collected from the lava rock near Bagua Lake, Wudalianchi, Heilongjiang, China located at 48.733334 N, 126.167966E as the center with a radius of 30 m between July 1, 2016 and July 6, 2016 under permission from the government.
In our research, D. fragrans sporangia were divided into three stages, A, B and C, according to their obviously different morphology. Sporangia in Stage A were light green and visible to the naked eye on spore leaves fully expanded. Those in Stage B were pale green and extended to the edge of the leaf blade, containing sand-like particles. The hoar sporangia with mature spores were assigned to Stage C (Fig. 1). To ensure the uniformity of samples, only those having at least 15 leaves, including circinate leaves and covering all of the three stages of sporangia plants, were chosen. The sporangia were kept in 1.5 ml tubes in liquid nitrogen immediately after being removed from the plant. All of the samples brought back to Northeast Agricultural University were stored in liquid nitrogen before use.
For the transcriptome, we did three biological repeats at each stage (Stage A: T01, T04, T07; Stage B: T02, T05, T08; Stage C: T03, T06, T09). In the qRT-PCR assay, three biological replicates with three technique repeats were performed using the same samples for RNAseq.
Paraffin section samples were collected from Wudalianchi, fixed with formalin/acetic acid/alcohol (FAA) immediately after collection, and later brought back to the laboratory for follow-up treatment.
The optical microscope pictures were photographed in the laboratory, and the samples were transplanted from the field.

RNA quantification and qualification
Total RNA was extracted with a TIANGEN RNAprep Pure Plant Kit.
The purity, concentration and integrity of RNA samples were determined by a Nanodrop 2.0 and Aglient 2100 to ensure the use of qualified samples for transcriptome sequencing.

Library preparation for transcriptome sequencing
The library was constructed by enrichment of eukaryotic mRNAs with magnetic beads with Oligo(dT), and mRNA was randomly interrupted by adding Fragmentation Buffer. The first cDNA chain was synthesized using mRNA as the template and random hexamers as a six-base random primer. Next, the first cDNA chain was added to the buffer solution. The second cDNA chain was synthesized by RNase H and DNA polymerase I, and cDNA was purified by AMPure XP beads. The purified double-stranded cDNA was followed by terminal repair followed by the addition of a tail and the attachment of the sequenced connector. Next, AMPure XP beads were used to select the fragment size. Finally, the cDNA library was obtained by PCR enrichment.
After the completion of the library construction, we used a Qubit 2.0 and Agilent 2100 to detect the library's concentration and the size of the inserted fragment (Insert Size). The effective concentration of the library was accurately quantified with Q-PCR. After being qualified, an HiSeq 4000 was used for high-throughput sequencing.

Transcriptome assembly and gene annotation
After obtaining high quality sequencing data, the sequence was assembled by Trinity [6].
The Unigene sequence was compared with NR, Swiss-Prot, GO, COG, KOG, eggNOG4.5 and KEGG databases using BLAST software. Using KOBAS2.0 to get the KEGG Orthology results from Unigene in KEGG. The Unigene amino acid sequence was predicted and compared with the Pfam database by HMMER software to obtain the annotated Unigene information.
The relevant software and methods for gene structure analysis such as Open Reading Frames (ORFs), Simple Sequence Repeats (SSRs) and Single Nucleotide Polymorphisms (SNPs) are shown in Additional File 1.

Sequence analysis and assembly
To obtain a comprehensive overview of the sporangia of D. fragrans (L.) Schott, we used high-throughput sequencing by HiSeq4000. After quality control, we obtained 79.81 GB clean data, and the percentage of Q30 in different simples are not less than 90.72%. The read number (pair-end reads from clean data) of each sample ranges from 25,642,892 to 34,674,344, and all GC contents were between 48.48 and 48.06% (Table 1).
A total of 94,705 Unigenes with a mean length of 913.87 bp and an N50 of 1631 bp (50% of the assembled bases were equal or greater than 1631 bp). The length distribution of sporangia of D. fragrans is shown in Fig. 2 and Table 2, with 28.80% of all Unigenes showing lengths longer than 1 kb. A Venn diagram of the expressed Unigenes is shown in Fig. 3. A total of 52,995 Unigenes were expressed in three periods, and the numbers of genes specifically expressed in stage A, B and C were 4988, 3366 and 14,516, respectively. All reads generated in this study are shown in Additional file 1. Simple sequence repeat (SSR) and single nucleotide polymorphism (SNP) analysis SSR analysis was based on Unigenes longer than 1 kb, and 10,584 SSRs were identified in 27,275 Unigenes (Unit size/minimum number repeats were setting (1/10) (2/6) (3/5) (4/5) (5/5) (6/5) and interruptions of max difference for 2 SSRs is 100). The most abundant repeat motifs were di-nucleotides (5541, 52.35%) followed by mono-nucleotides (2874, 27.15%) and tri-nucleotides (1197, 11.30%) (Fig. 4). By comparing the reads and Unigene sequences of the nine samples, a total of 349,885 single nucleotide polymorphisms (SNPs) were determined. The density statistics of SNPs in Unigenes indicated that most Unigenes (43,907, 46.36%) had 0-1 SNPs per Kb, and 8589 (9.07%) Unigenes had SNP density over 8 per Kb (Fig. 5).

Global analysis showing the difference between the three stages of sporangia
To understand the difference in samples of biological repetitions and between each stage, 36 scatter diagrams of gene pairwise correlations were drawn (Additional file 2), and a cluster was drawn showing the biological repetition correlations (Fig. 6). The correlation coefficients among biological repetitions showed high correlations, and the differences between stages were evident. These data     The SNP density distribution map. The abscissa is SNP density, that is, the number of SNP per Kb gene sequence; the ordinate is the number of genes that have the corresponding density eggNOG Annotation, and nr Annotation, respectively (Table 3).

Nr homologous species distribution
The outcome of the homology search of Unigenes in Nr databases showed 4128 (10.19%) Unigenes were homologous to sequences of Picea sitchensis, 3721 (9.18%) of which were mapped in Physcomitrella patens, and 3106 (7.67%) were Selaginella moellendorffii. Unigenes matched to the remaining species were less than 5% (Fig. 9).

DEGs between different stages
DESeq was used for differential expression analyzing the sporangia of D. fragrans at different stages, and 7126 differentially expressed genes were obtained (Fold Change > 4, FDR < 0.001) (Additional file 3). AvsC had the biggest difference. There were 3068 up-regulated and 3069 down-regulated Unigenes in stage A compared with stage C (Fig. 10), and annotated DEGs were listed in Table 4.

GO function enrichment of D. fragrans in different stages
We plotted the annotated statistical graph of differentially expressed genes GO two class nodes (Fig. 11). We performed hierarchical cluster analysis of all the DEGs and clustered them into a heatmap of enrichment of GO terms with 66 groups (Additional file 4). The expression levels of most DEGs were highly similar among the biological replicates in each stage, and certain of them were composed of larger groups (e.g., group 4, group 9, group 10, and group 63), while others were smaller groups (e.g., group 18, group 31, group 36 and group 51). Some unigenes expression levels were not stable in three repeats in the same stage, but we think it was not caused by experimental errors because these unigenes can still be clustered according to GO terms (e.g., group1, group.34, and group 66).

Enrichment analysis of DEGs of D. fragrans in different stages in the KEGG pathway
Using enrichment factor concentrations of the pathway, and after the calculation was significantly enriched by the Fisher exact test method, we obtained a KEGG enrichment sheet (Additional file 5), and the top 20 pathways with highest enrichment factor are displayed in the figure (Fig. 12

Paraffin section observations
Paraffin sections of the three stages of D. fragrans sporangia (including sporangiophore and indusium) were obtained (Fig. 13). From the paraffin sections, stage B and stage C have similar cell numbers of sporangia, sporangiophore and indusium. Compared with B period, the tapetal cells disappeared completely at stage C. Paraffin sections showed great differences in sporangia in stage A, and according to the order of development, six paraffin sections were made (Fig. 13).
Paraffin sections (Fig. 13) showed that the size of sporangia at stage B and C was similar, and except for the disappearance of tapetum cells, the number of cells in other structures had little change. By counting the number of cells in multiple paraffin sections (not yet contributed), we found that the number of sporangial cells in stage A was much smaller than that in stage B and C.
The paraffin sections showed that the sporangia of stage A could be subdivided into at least 6 more precise stages of development with the number of cells from less to more. In the six subdivisions, the number of parietal cells gradually increased (Fig. 13 1-6), outer tapetum cells produced by mitosis of sporangial initials cell continue to divide into tapetum (Fig. 13.2-5). Spore mother cells were found to be produced in the second subdivision stage (Fig. 13-2), and 8 cells were formed in the sixth subdivision stage by meiosis (Fig. 13-6).

Discussion
Global gene transcription in the D. fragrans sporangia High-throughput sequencing is an effective approach for studying the different developmental stages of organs. In this study, the expression patterns of genes in D. fragrans sporangia in different stages were identified by transcript profiling and comparative transcriptome analysis. A total of 94,705 Unigenes from 79.81 GB clean data were obtained, and approximately 46.47% Unigenes (44,006 Unigenes) could be functionally annotated in at least one public protein database (NR, Swiss-Prot, KEGG, COG, KOG, GO, egg-NOG and Pfam). Approximately 53.53% of Unigenes (50,699 Unigenes) remained for functional annotation. These Unigenes either matched an unknown function protein, or none of their homologues was found. These Unigenes may be special in D. fragrans sporangia as novel transcripts.
Heat map of all differentially expressed genes can help us clearly see what happens to each stage in future research. It is worth pointing out that some poorly repeated data should not be abandoned, but more in-depth study. So far, we speculate that certain DEGs expression levels were not susceptible to external influences, and they regulate the basic development of the spores. Certain DEGs Fig. 11 Heat map of all differentially expressed genes were clustered were susceptible to outside influence (such as group1, group.34, and group 66), and these DEGs may involve many resistance genes and metabolic pathways of some known/unknown;, and we will research a number of these DEGs in the future.
It is worth noting that Stage A has many different precise stages in sporangia, and transcription would vary by the phases. In this case, each precise stage specific expression gene may be averaged by different levels of expression at other precise stages.
Multiple database analysis reveals possible core and family/species-specific features of the sporangia developmental process Through comparative analysis of transcriptome data, we observed several characteristics of wild D. fragrans sporangia development at the transcriptome level as the core and family/species-specific features in D. fragrans or ferns.
First, secondary metabolism and plant hormone signal transduction were the main enrichment pathways between stage B and C. There were 6 KEGG pathways that were reliable at a corrected P-value less than 0.05 between stages B and C, and they were phenylpropanoid biosynthesis (4.486E-10), phenylalanine metabolism (6.370E-06), stilbenoid, diarylheptanoid and gingerol biosynthesis (0.005343), pentose and glucuronate interconversions (0.005863), cutin, suberin and wax biosynthesis (0.04584) and plant hormone signal transduction (0.04936). In addition to cutin, suberin and wax biosynthesis and glucuronate interconversions pathways, the other pathways seem to be related to plant stress resistance. We examined the detailed data in the heatmap (Additional file 4) and observed that the stress resistance-related GO term underwent dramatic changes during the BC period (most of which were up-regulated). These changes in gene expression may indicate a number of unknown plant stress resistance strategies.
Further research into these mechanisms will not only help us understand how ferns adapt to the environment but also may provide new ideas for crop improvement related to fruit and flowers.
Second, there were 660 DEGs between stage A and the B, and we observed that functional annotation and enrichment analyses of DGEs showed there were significant differences mainly associated with DNA replication and homologous recombination during this period. With further examination, we observed that all of these Unigenes in the two KEGG pathways were down-regulated. For instance, c116208.graph_c0, c109434.graph_c0, c114 931.graph_c0, c116571.graph_c0, c117358.graph, c0, c10 1985.graph_c0 (Additional File 8) were matched to MCM Fig. 12 Differential gene expression of KEGG pathway enrichment scatter diagram. a. stage A vs B b. stage A vs C c. stage B vs C. The abscissa is the enrichment factor, the bigger the enrichment factor, the more significant the enrichment level of differentially expressed genes in the pathway. The ordinate is log10 (Q value), the greater the ordinate, the greater the significance of differentially expressed genes in the pathway 2, MCM 3, MCM 4, MCM5, MCM6 and MCM 7, respectively. The minichromosome maintenance proteins (MCMs) are highly conserved proteins widely observed from ancient creatures to higher eukaryotes and involved in the initiation of eukaryotic genome replication. MCM2~7 are the components of the minichromosome maintenance (MCM) complex, which acts as a DNA helicase during replication and elongation of DNA. There is a positive correlation between the overall level of MCM and cell proliferation and growth capacity. Therefore, the level of MCM also represents the different cell proliferation states [7][8][9].
Transcriptome data showed that MCM 2-7 declined significantly during stage B compared to stage A. This finding may mean that the frequency of cell division in stage A is significantly higher than that in stage B. It can be seen that the change of DNA replication related unigenes were consistent with the morphological observation ( Fig. 13 1-6).
RAD51 and RAD54 (Additional File 8) were putative Unigenes for c110186.graph_c0 and c114077.graph_c0, respectively. Both of these Unigenes are important in meiotic recombination [10,11]. In the model plant Arabidopsis thaliana, RAD51 has higher expression in reproductive tissues than other tissues and exhibits the highest expression level in young flower buds. At a cellular level, RAD 51 has lower levels of expression in flower primordia, higher levels in young anthers, highest levels in both female meiocytes and male meiocytes and is not detected in gametopjytes [10,12,13].
RAD54 has the highest levels of expression in flower buds and is present in flower buds at the protein level in Arabidopsis thaliana [11]. Both RAD51 and RAD54 showed very pronounced down regulation during the A to B stage (dropped by approximately 19 times and 16 times, respectively). qRT-PCR also showed the same changes (dropped by approximately 75 times and 16 times, respectively) (Additional File 8).
Based on the changes in DEG expression related to DNA replication, we speculated that there was a higher mitosis frequency in stage A than stage B and C. RAD51 and RAD54 were associated with homologous recombination and usually occurred at the meiosis stage of spore mother cells. These changes were consistent with our paraffin sections (Fig. 13).
All 8 omega-hydroxypalmitate O-feruloyl transferase-related Unigenes declined significantly over the three periods (Additional File 8). This putative gene is involved in the synthesis of aromatics of the suberin polymer and specifically affects the accumulation of the ferulate constituent of suberin in seeds. It is interesting that this gene was detected at the protein level in seed coats of Arabidopsis [14,15]. Suberin is a polyester polymer in the cell walls of terrestrial plants, which protects plants from infection and other environmental pressures by controlling the transport of water and nutrients. This change may be related to the early development of sporangial stress resistance mechanism or the common characteristic of reproductive organ development of ferns and flowering plants, which is worthy of further study in the future.
Peroxygenase may be involved in the interaction between oil bodies and vacuoles during seed germination in spermatophytes. There is no visible phenotype after gene ablation but retarded growth during the first 48 h after germination. As described in literatures, peroxygenase was specific expressed in seeds and restricted to the developing embryo where its transcription was enhanced in protoderm and vasculature during the cotyledon stage of seed maturation and decreased gradually after germination [16][17][18][19]. Fatty acyl-CoA reductase 2 was involved in the synthesis of the lipid component in sporopollenin and expressed in the tapetum [20,21].
As we can see, the expression pattern of DEGs in cutin, suberin and wax biosynthesis during the development of sporangia was consistent with that in the flowering to seed development process of flowering plants. From an evolutionary point of view, flowering plants appear after ferns. Whether their reproductive organs have similar manifestations is worth further study. Paraffin sections showed that tapetal cells and spore mother cells were produced simultaneously ( Fig. 13.2), gradually collapsed when spore mother cells split into eight cells (Fig. 13.6), and disappeared completely at stage C. This finding is in accordance with the changes in the expression of c120866.graph_c0.

Discussion of genes related to the reproductive organs of flowering plants
A large number of DEGs were annotated to transcription factors associated with floral and seed development.
MADS-box genes play very important roles in the process of plant development, and we observed 3 putative DEGs annotated to known genes related to the seed and gametophyte development peak at stage C (c109839.graph_c1, c116857.graph_c0, c93709.graph_c0, and the putative genes were AGL62, AGL62, AGL61, respectively) (Additional File 8). AGL62 expressed in the endosperm but not in the embryo during syncytial endosperm development, and plants lacking AGL62 show a seed-lethal phenotype with premature endosperm cellularization [22]. AGL61 controls central cell differentiation during female gametophyte development and is expressed in the final stages of embryo sac development [23,24].
In the NAC family, c117598.graph_c0 (putative gene NAC transcription factor ONAC010) is associated with male fertility. Reduced expression of NAC5 via RNAi leads to male sterility [25]. c117039.graph_c0 (putative gene B3 domain-containing transcription factor ABI3) (Additional File 8) participates in abscisic acid-regulated gene expression during seed development. This gene regulates the transcription of SGR1 and SGR2, which are involved in leaf and embryo degreening [26,27]. These characteristics imply that there may be similarity between the sporangia development in stage C and the development of plant seeds.
We noted that there were few Unigene expression peaks in stage B, and c92612.graph_c0 putative gene TFL1) (Additional File 8) was an exception. This gene was the Unigene with the highest expression in stage B. TFL1 not only controls inflorescence meristem identity, required for maintenance of an indeterminate inflorescence, but also plays a role in the regulation of the time of flowering in the long-day flowering pathway [28,29]. The significance of such high levels of expression in sporangia is worth studying in the in future.

Conclusions
Using D. fragrans sporangia from nine samples of three periods, 79.81 Gb Clean data were obtained (a sample of not less than 7.95GB, Q30 = 90.72%). After assembly and data analysis, 94,705 Unigenes were obtained, including 27,275 Unigenes with lengths over 1 KB, and 44,006 Unigenes were annotated in the database. In all, 10,584 SSR markers and 349,885 SNPs were obtained through data analysis. Eight Unigenes were verified by qRT-PCR, and the results were consistent with transcriptome data.
In stage A, DEGs associated with DNA replication and homologous recombination suggest that plants may undergo frequent mitosis and meiosis at this time. In stage AB and stage BC, the most reliable pathway of KEGG enrichment of DEGs were dominated by secondary metabolic pathways. Several genes associated with floral and seed development may also perform biological functions in sporangium.