Identification and expression profiles of sRNAs and their biogenesis and action-related genes in male and female cones of Pinus tabuliformis

Background Small RNA (sRNA) play pivotal roles in reproductive development, and their biogenesis and action mechanisms are well characterised in angiosperm plants; however, corresponding studies in conifers are very limited. To improve our understanding of the roles of sRNA pathways in the reproductive development of conifers, the genes associated with sRNA biogenesis and action pathways were identified and analysed, and sRNA sequencing and parallel analysis of RNA ends (PARE) were performed in male and female cones of the Chinese pine (Pinus tabuliformis). Results Based on high-quality reference transcriptomic sequences, 21 high-confidence homologues involved in sRNA biogenesis and action in P. tabuliformis were identified, including two different DCL3 genes and one AGO4 gene. More than 75 % of genes involved in sRNA biogenesis and action have higher expression levels in female than in male cones. Twenty-six microRNA (miRNA) families and 74 targets, including 46 24-nt sRNAs with a 5’ A, which are specifically expressed in male cones or female cones and probably bind to AGO4, were identified. Conclusions The sRNA pathways have higher activity in female than in male cones, and the miRNA pathways are the main sRNA pathways in P. tabuliformis. The low level of 24-nt short-interfering RNAs in conifers is not caused by the absence of biogenesis-related genes or AGO-binding proteins, but most likely caused by the low accumulation of these key components. The identification of sRNAs and their targets, as well as genes associated with sRNA biogenesis and action, will provide a good starting point for investigations into the roles of sRNA pathways in cone development in conifers. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1885-6) contains supplementary material, which is available to authorized users.


Background
The functional differentiation and adaptability to different environments of cells and tissues harbouring the same genetic material are dependent on epigenetic regulation at different levels. Small RNA (sRNA)-mediated gene silencing and chromatin modification play important roles in regulation [1]. The sRNA pathways in plants mainly include the microRNA (miRNA) and shortinterfering RNA (siRNA) pathways [2]. According to the biogenesis and action mechanisms of sRNAs, the siRNA pathway is divided into trans-acting siRNA (tasiRNA), natural-antisense siRNA (natsiRNA) and RNA-directed DNA methylation (RdDM) pathways [3].
The miRNAs are a family of small endogenous noncoding single-stranded RNA molecules that regulate gene expression posttranscriptionally by directing mRNA degradation or translational repression and control many biological functions, including development and tissuespecific processes in both plants and animals [4,5]. Plant miRNAs are generally 21 nucleotides long and regulate endogenous gene expression by recruiting silencing factors assembled into the RNA-induced silencing complex (RISC) to complementary binding sites in target transcripts [6,7]. In most studied plants, such as Arabidopsis [8], rice [9], tomato [10], soybean [11], peanut [12], apple [13], miRNAs are the second most abundant sRNAs, followed by siRNAs [14]. siRNAs are distinguished from miRNAs in that they are derived from double-stranded RNA precursors. In plants, 24-nt siRNAs are associated with DNA methylation through the RdDM pathway at homologous loci guided by AGO4 proteins [15][16][17][18].
Despite this broad knowledge of sRNA biogenesis and the action mechanisms underlying growth and development of angiosperm plants, there is still a considerable lack of corresponding research on gymnosperms. With the popularisation of next-generation sequencing technology, sRNA sequencing and identification were also performed for some conifers [9,36]. The sRNA expression profiles of infectious diseases [37], somatic embryonic induction and germination [38,39], and male and female gametophytes [40,41] were analysed in different conifer trees. However, these studies focused mainly on changes in expression of specific sRNAs, while research on the sRNA biogenesis and action pathways is very limited.
To improve our understanding of the roles of sRNA pathways in male and female cones of Pinus tabuliformis, the genes associated with sRNA biogenesis and action pathways were identified and analysed, and highthroughput sequencing of sRNAs and degradome tags of P. tabuliformis male and female cones was performed. These data provide compelling new insights into the regulation of sRNA pathways involved in male and female cone development in P. tabuliformis.

Results
Identification of homologues involved in sRNA biogenesis and action in P. tabuliformis The sRNA biogenesis and action pathways are well defined in Arabidopsis [3]. Through a Blast search of the P. tabuliformis transcriptomic sequences [42] using the amino acid sequences of proteins from Arabidopsis, several highly similar sequences were selected and mapped to the Picea abies genome [43]. Specific screening primers were designed based on the longest sequence in each cluster to isolate the full-length sequences from the P. tabuliformis SMART cDNA library (Clonetech, USA). Finally, 24 candidate genes with complete coding regions were isolated, and the phylogenetic relationships between these P. tabuliformis genes and those of other land plants were inferred using the ML method. Surprisingly, the sRNA pathway genes were highly conserved during evolution, except for methyltransferases involved in the anRdDM pathway (Additional file 1). Twenty-one high-confidence homologues involved in sRNA biogenesis and action in P. tabuliformis were identified (Table 1).
Two different DCL3 genes exist in conifers DCL enzymes are large proteins that catalyse primary sRNA transcript cleavage and produce mature sRNAs of different sizes [44]. Four different AtDCL enzymes were found in Arabidopsis and were divided into four groups, corresponding to DCLs from other plants. All four classes of DCLs exist in P. tabuliformis, indicating that they evolved before the divergence of angiosperms and gymnosperms (Additional file 1).
Our results demonstrated two different DCL3 genes in P. tabuliformis (Table 1, Fig. 1). The identities between the PtDCL3a and PtDCL3b cDNA sequences are only 68.5 %; however, the identity of PtDCL3a to its Pinus taeda and Picea abies homologues are 98 % and 94 %, respectively, while the identity of PtDCL3b to its homologues are 97.0 % and 93 %, respectively. These results indicate that DCL3a and DCL3b were separated for a long time before the divergence of conifer species.
The AGO4s binding to the 24-nt DCL3-derived siRNAs were conserved during land plant evolution AGO proteins are key components of the RNA-induced silencing complex (RISC) [48,49]. Phytogenetic analyses showed that plant AGO proteins group into three clades (Fig. 2a). Five AGOs were found in P. tabuliformis. PtAGO1, 5, and 10 belong to the AGO1 clade, and PtAGO4 and PtAGO7 belong to the AGO4 and AGO7 clades, respectively (Fig. 2b). The catalytic DDH amino acid core in the PIWI domain of land plant AGOs was extremely conserved (Fig. 2c).
Despite the fact that 24-nt DCL3-derived siRNAs are only present at very low levels in conifers [43] and that  the AGO4 clade ago mutants in Arabidopsis (ago4, ago6, ago9) have no obvious developmental defects [48], AGO4s were conserved during land plant evolution. Moreover, the number, position, and size of exons of AGO4 homologues in land plants remained surprisingly consistent (Fig. 3). Greater efforts are needed to understand the specific role of AGO4 in species maintenance and evolution.
The sRNA biogenesis and action pathways have higher activity in female than in male cones of P. tabuliformis The expression profiles of genes involved in the sRNA biogenesis and action pathways in male and female cones were analysed. The results show that more than 75 % of genes have higher expression levels in female than in male cones (Fig. 4a). These differences were confirmed by microarray data (Additional file 2). Interestingly, the female structures (carpels) in Arabidopsis also had similarly higher activities than those of the male structures (stamens) (Fig. 4b). Moreover, AGO1 had the highest expression level, and AGO4 and AGO10 were highly differentially expressed between male and female structures in both P. tabuliformis and Arabidopsis, indicating that a similar sRNA regulatory mechanism probably underlies the development of male and female structures in both gymnosperms and angiosperms. sRNAs in male and female cones were then analysed by high-throughput sequencing. The results showed that 21-nt sRNAs were the major sRNAs in both male and female cones in P. tabuliformis, with more in female than male cones (Fig. 5). Proportionally, the male cones had relatively high levels of 24-nt sRNAs (Fig. 5), but AGO4, which plays a key role in the action of 24-nt sRNAs, was expressed at a very low level in male cones (Fig. 4a), indicating that both miRNA and siRNA pathways have higher activities in female than male cones.

Identification of miRNAs and targets in male and female cones of P. tabuliformis
To globally and directly identify miRNAs and miRNAdirected targets of cleavage, a parallel analysis of RNA ends (PARE), also known as degradome analysis, was applied. Twenty-six miRNA families and 74 targets were identified by sRNA sequencing and PARE analysis. Three novel miRNAs with unknown functions were isolated (Table 2, Additional file 3). When a two-fold change (FC) in expression was used to filter the differentially expressed miRNAs between male and female cones, 50 miRNAs were identified (Additional file 4). Eighteen genes had higher expression levels in male cones, while the other 32 miRNAs had higher expression levels in female cones (Additional file 4). This result is consistent with the sRNA biogenesis and action pathways having higher activities in female than in male cones in P. tabuliformis (Fig. 4).
Identification of 24-nt sRNAs containing a 5' "A" terminal differentially expressed between male and female cones in P. tabuliformis Compared with the miRNA pathway, the role of the 24nt siRNA-mediated RdDM pathway in the reproductive development of plants is largely unknown [48]. Only one AGO4 homologue, the key component of RISC associated with 24-nt siRNAs, was found in P. tabuliformis (Table 1, Fig. 2). Because AGO4 was revealed to predominantly bind 24-nt sRNAs with a 5' A [56], the 24-nt sRNAs containing 5' "A" termini differentially expressed   between male and female cones of P. tabuliformis were identified. Eleven and 35 sRNAs specifically expressed in male and female cones, respectively, were isolated (Additional file 6). The functional identification of these 24-nt sRNAs in reproductive development will be instructive to our future research.

Discussion
The sRNA-mediated transcriptional regulation of genes, including the miRNA and siRNA pathways, is an important epigenetic regulatory mechanism in plants [1].
In this study, we first isolated the key regulatory factors involved in miRNA and siRNA biogenesis and action in P. tabuliformis. Phylogenetic analysis indicated that sRNA pathways were very ancient regulatory mechanisms during the evolution of land plants, and most homologous genes, such as DCLs, AGOs and RDRs, had already diverged in the primitive vascular plants. However, the siRNA pathways probably evolved later than the miRNA pathways. The sRNA binding and guiding protein AGOs and the 24nt siRNA-mediated DNA methylation catalytic genes have expanded and diversified in angiosperms [57]. In addition to the sRNA target genes, the sRNA biogenesis and action pathways also play important roles in the regulation of growth and development in plants [58,59]. The expression profiles of the sRNA biogenesis and action pathway genes and sRNA sequencing indicated that the miRNA pathway is the main sRNA pathway in male and female cones of P. tabuliformis. Previous studies showed that the siRNA pathway has weak activity in other organs  The miRNAs that shown in the table were isolation and sequencing from at least two independent libraries and the targets cleavage by miRNAs were identified by PARE analysis. * indicate the unigenes with same name were found as same gene after cloned compared with cones [35,43]. In angiosperms, the miRNA pathway is also the most important sRNA pathway in reproductive regulation [20]. Based on sRNA sequencing and PARE analysis, the cleavage of 74 target sequences by 26 corresponding miRNA families was identified. The complete CDS of 36 genes from these target sequences were cloned, while other genes were difficult to obtain by PCR as the mRNA of these genes was almost completely degraded by the high abundance of related miRNAs (average RPM > 3700) in the cones of P. tabuliformis. The roles of turn off of these genes in reproductive development remain unclear. It is noteworthy that we found that at least a portion of these genes were probably non-coding RNAs, and may be indirectly involved in developmental regulation.
DNA methylation is involved in the control of all genetic functions including transcription, replication, DNA repair, gene transposition and cell differentiation in plants [61]. It is a common and very ancient epigenetic regulatory mechanism in plants that is found in the DNA of all archegoniates investigated; however, the degree and features of DNA methylation are species-, tissue-, organelle-and age-specific [61]. 24-nt siRNAmediated site-specific DNA methylation through the RdDM pathway is an important DNA methylation mechanism [62]. Previous studies suggested that gymnosperms have lower DNA methylation levels than those of flowering plants [63], which may be associated with the high degree of conservation and low morphological diversity between conifer species [43]. The 24-nt sRNAs involved in RdDM only represent a small proportion of all sRNAs in conifers [35,43], but the proportions are opposite in the flowering plants [9]. Therefore, some researchers have speculated that the RdDM pathway in conifers is incomplete [46]. Our results have shown that, except for methyltransferase, all RdDM pathway components are present and conserved in P. tabuliformis, including PtDCL3, PtAGO4, PtRDR2, PtHEN1, PtNRPD1a, PtNRPD1b and PtNRPD2. The low level of 24-nt sRNAs is not because of a lack of biogenesis enzymes. The real reason may be, the low expression levels of RDR2-NRPD1a-DCL3 coding genes necessary for 24-nt sRNA accumulation.
AGO proteins are sRNA binding and guiding proteins and the most important proteins downstream of the sRNA pathways [64]. Despite the RdDM pathway having only weak activity in conifers, the components of RdDM were still conserved at a high degree through time. The structures of AGO4 in moss, lycophyte, gymnosperm and angiosperm plants maintain a high level of consistency. Interestingly, the role of RdDM in mosses and lycophytes is unclear, as the ago4 mutant has no obvious developmental defects [65,66] and the evolutionary significance and selective pressure of the conservation of AGO4 and RdDM is difficult to understand. Some evidence indicates that the absence of AGO4 makes the plants more sensitive to disease [65]. Investigating the role of PtAGO4 in P. tabuliformis in disease resistance may be valuable for understanding the role of RdDM in evolution and may facilitate disease resistant breeding of P. tabuliformis.
We found 46 24-nt sRNAs with a 5' A that probably bind to AGO4 [56]. They were specifically expressed in either male cones or female cones, and more than 75 % of these sRNAs have significant accumulation in female cones but were not detected in all male samples. This is consistent with the higher activity of sRNA biogenesis and action pathway genes in female cones compared with male cones of P. tabuliformis. Because of the huge genome size, the analysis of large-scale genome methylation is difficult in conifers, and the function of these specifically expressed 24-nt sRNAs is unclear and deserves more attention in future studies.

Conclusions
Based on high-quality reference transcriptome sequences [42], 21 high-confidence homologues involved in sRNA biogenesis and action in P. tabuliformis were identified. Phylogenetic analysis indicated that the sRNA pathways are highly conserved from mosses and ferns to higher plants. The expression profiles of these genes suggested that the sRNA pathways have higher activities in female than in male reproductive structures. In contrast to the angiosperms [14], both biogenesis-and action-related gene expression and sRNA sequencing revealed that the miRNAs are the most abundant sRNAs in P. tabuliformis, rather than siRNAs. In this study, 26 miRNA families and the miRNA-directed cleavage of 74 corresponding targets were identified though correlation analysis of sRNA and PARE sequencing data. The miRNAs and their targets participating in reproductive development in angiosperms, such as miR156-SPLs, miR159-MYBs, miR172-AP2Ls, miR319-TCP and miR396-GRFs, were also found in P. tabuliformis. They have ancient evolutionary histories similar to the sRNA pathways.
In conifers, the low level of 24-nt DCL3-derived siR-NAs was not caused by the absence of DCL3 and AGO4. Two DCL3 genes and one AGO4 gene were found in P. tabuliformis, its ortholog PgAGO in Picea glauca [67] was previously identified. Forty six 24 nt sRNAs with a 5' A, which probably bind to AGO4, specifically expressed in either male or female cones were isolated. The specific, highly expressed 24-nt sRNAs identified in conifers will provide a good starting point for investigations into the function and evolution of siRNAs in conifers.

Methods
Plant material and sample collection P. tabuliformis immature male and female cones were collected from 3 individual trees selected at random (genetically distinct) in the botanic gardens in Beijing, China (116°33.9116′ E, 40°00.0861′ N and 44 m a.s.l.). Cones were sampled at 11:00 am on April 21, 2013. Each experiment was performed with at least three biological replicates per event. Samples were immediately placed in liquid nitrogen in the field after collection and all samples were stored at −80°C in the laboratory before analysis.
Identification of homologues involved in sRNA pathways in P. tabuliformis Amino acid sequences of Arabidopsis thaliana genes (Table 1) were downloaded from the TAIR database (http://Arabidopsis.org). The protein sequences of Arabidopsis were used in queries to screen the P. tabuliformis transcriptome sequences (NCBI accession number SRA 056887) based on the TBLASTN method. The candidate sequences were selected and compared with other available conifer transcriptome sequences (http://dendrome.ucdavis.edu/resources/) and the Picea abies genome (http://congenie.org). The P. tabuliformis complete-length SMART cDNA library (Clonetech, USA) was screened using specific primers. The full-length sequences were obtained and compared with the original sequences. The nucleotide sequences of candidate genes were selected for preliminary phylogenetic analysis based on the NJ method using the MEGA software [68] and renamed.

Phylogenetic analysis
Homologues of 41 land plant species, which have been whole genome sequenced (http://phytozome.jgi.doe.gov), were selected for phylogenetic analysis. Multiple alignments of protein sequences were obtained using the MUSCLE software [69] and a maximum-likelihood tree, based on the JTT model, was generated using MEGA software [68]. Bootstrap values were obtained from 1000 replicates.

sRNA sequencing and PARE analysis
Total RNA isolation from samples and cDNA library construction were performed as described previously [39]. Pooled libraries were used for cluster generation on Illumina's Cluster Station (Illumina, San Diego, USA) and then sequenced on an Illumina Hiseq2000 at YQYK-BIO (Beijing, China) following the vendor's recommended protocol. The sRNA abundance was measured as reads per million reads (RPM). The PARE library construction and sequencing were performed as described previously [70,71]. The identification of miRNA and miRNA-directed targets of cleavage though correlation analysis of sRNA and PARE sequencing results was performed as previously described [72,73]. More details are available in the supplementary material (Additional file 7).

Gene expression analysis
RNA sequencing and gene expression analysis were described previously [74]. mRNA abundance was measured as reads per kilobase per million (RPKM) [75]. Each experiment was performed with at least three biological replicates per event. The mean RPKM of three biological replicates was compared among different samples.
Identification of differentially expressed 24-nt sRNAs containing a 5' "A" terminal between male and female cones The 24-nt sRNAs containing a 5' "A" terminal were extracted. Comparison of the expressions of these sRNAs was conducted between small RNA libraries of male and female cones. We first normalised the expression of sRNA in six libraries (F and M, three biological replicates each) to obtain the expression of reads per million reads (RPM). Then, the data were analysed using Fisher's exact test with a Bonferroni correction for multiple hypothesis testing. Those sRNAs with a p-value below 0.01 and specifically expressed in either male cones or female cones were isolated.