Degradome sequencing-based identification of phasiRNAs biogenesis pathways in Oryza sativa

Background The microRNAs(miRNA)-derived secondary phased small interfering RNAs (phasiRNAs) participate in post-transcriptional gene silencing and play important roles in various bio-processes in plants. In rice, two miRNAs, miR2118 and miR2275, were mainly responsible for triggering of 21-nt and 24-nt phasiRNAs biogenesis, respectively. However, relative fewer phasiRNA biogenesis pathways have been discovered in rice compared to other plant species, which limits the comprehensive understanding of phasiRNA biogenesis and the miRNA-derived regulatory network. Results In this study, we performed a systematical searching for phasiRNA biogenesis pathways in rice. As a result, five novel 21-nt phasiRNA biogenesis pathways and five novel 24-nt phasiRNA biogenesis pathways were identified. Further investigation of their regulatory function revealed that eleven novel phasiRNAs in 21-nt length recognized forty-one target genes. Most of these genes were involved in the growth and development of rice. In addition, five novel 24-nt phasiRNAs targeted to the promoter of an OsCKI1 gene and thereafter resulted in higher level of methylation in panicle, which implied their regulatory function in transcription of OsCKI1,which acted as a regulator of rice development. Conclusions These results substantially extended the information of phasiRNA biogenesis pathways and their regulatory function in rice. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07406-7.

(LOC_Os12g42380.1 and LOC_Os12g42390.1) were also uncovered by our screening procedure (Additional le 5: Table S3), which have been identi ed as two parts of a long non-coding RNAs [11].
In addition, for all these newly found PHAS loci, only the biogenesis of LOC_Os04g25740.2-derived phasiRNAs were triggered by a known miRNA, miR2118f. The phasiRNAs generated from the other newly found PHAS loci, to our knowledge, were all triggered by novel sRNAs (Table 1). Analysis of the regulatory function of novel phasiRNAs generated from 21-nt PHAS loci 21-nt phasiRNAs have been revealed function in trans-regulation of target genes by cleaving mRNAs in plant, these phasiRNAs were named as trans-action siRNAs (ta-siRNAs).
PhasiRNAs generated from the transcripts of LOC_Os02g18750.1, LOC_Os06g30680.1 and LOC_Os12g42380.1 were detected in panicle, and the LOC_Os05g43650.1-derived phasiRNAs were detected in seedling, which suggest the requirement of these phasiRNAs in different stages of development.
The regulatory network of the phasiRNAs pathways that mentioned above were constructed based on the target information (Fig. 4). RdDM is an important regulatory event involved in repressive epigenetic modi cations that can trigger transcriptional gene silencing. In order to analysis the novel 24-nt phasiRNA mediated RdDM in rice, all the known promoter sequences in rice were employed for scanning of the target sites of novel phasiRNAs generated from the newly found ve 24-nt PHAS loci. The DNA methylation status of target promoter was further analyzed by utilizing the bisul te-seq data sets of panicle (GSM4232038) and root (GSM4232039) of rice. As a result, a promoter of LOC_Os02g40860.1 gene was found targeted by ve -LOC_Os01g37325.1-derived phasiRNAs ( Table 3).
As CG and CHG methylation contexts are maintained by DNA methyltransferases and histone modi cations, while CHH methylation is associated with 24-nt siRNA guided RdDM [25].
The CHH methylation status of promoter was found signi cantly higher in panicle than that in other tissues (Fig. 5), which consistent with the nding that LOC_Os01g37325.1-derived phasiRNAs only expressed in panicle tissue (Additional le 2: Table S2). Therefore, the results suggesting a methylation mediated transcriptional silencing of the promoter of LOC_Os02g40860.1.
LOC_Os02g40860.1 encodes a Casein kinase I1 (OsCKI1) protein, which belongs to the CKIs protein family. CKIs have been identi ed highly conserved in eukaryotes, they are believed involving in a variety of important biological events, since they have a wide substrate speci city in vitro [26]. The expression level of LOC_Os02g40860.1 in root and panicle tissues of rice have been analyzed by utilizing the RNA-seq libraries (panicle: GSM4230036 and GSM4230037; root: GSM4230038 and GSM4230039) which contributed by Zhao et al [27]. As shown in Fig. 5C, the expression level of LOC_Os02g40860.1 was signi cantly lower in panicle than that in root, therefore we speculated that, the biogenesis of LOC_Os01g37325.1-derived phasiRNAs are speci c required in panicle for the regulation of the transcriptional level of LOC_Os02g40860.1. In another word, the OSsRNA-14-LOC_Os01g37325.1-phasiRNA pathway might involve in the development of panicle in rice.

Discussion
In recent years, researches on Oryza sativa have shown that 21-or 24-nt phasiRNAs distribute in genomic clusters and preferentially accumulate in reproductive tissues [28]. To date, considerable PHAS loci have been discovered in rice [29]. According to the previous reports, the miR2118 and miR2275 triggered 21-nt and 24-nt phasiRNA biogenesis pathways seem to be the primary phasiRNA production sources in rice. However, considering there are rich sources for phasiRNA production in other plant species, we speculated that the miRNA-phasiRNA pathways have not been fully discovered in rice, it is necessary to continue the mining work for better understanding the mechanism of phasiRNA biogenesis and the miRNA derived regulatory network. In this study, we performed a degradome-based screening of novel PHAS loci in rice by utilizing sRNA libraries from different tissues. As a result, ve novel 21-nt PHAS loci and ve novel 24-nt PHAS loci were identi ed, and only one of these PHAS loci was triggered by known miRNA, miR2118, which con rmed our suspicion. In addition, the novel 21-nt PHAS loci, LOC_Os05g43650.1, is a miniature inverted-repeat transposable element (MITE) gene [30], and two 24-nt PHAS loci, LOC_Os01g37325.1 and LOC_Os02g20200.1, are two retrotransposon genes, which indicated that the transcripts of transponsons and retrotransponsons are capable of producing secondary siRNAs. Same phenomenon has also been reported by Creasey et al. in Arabidopsis [31,32].
Transponsons and retrotransponsons are ubiquitous in plants and play important roles in plant gene and genome evolution [33]. According to the information of targets of phasiRNAs, the OSsRNA-3-LOC_Os05g43650.1-phasiRNA and OSsRNA-14-LOC_Os01g37325.1-phasiRNA pathways are required for the development of rice. We speculated that, the transcripts of transponson and retrotransponson might also function as important sources of phasiRNA in plants. Further exploration of this kind of phasiRNA biogenesis pathways could bene ts the in-depth investigation of their biogenesis mechanism and the miRNA/sRNA directed regulatory networks in plants.
Recently, some protein-coding genes were identi ed as phasiRNA precursors, which draws a lot of attention [29,34]. Among the newly identi ed PHAS loci, LOC_Os04g45834.2 encoding a DUF584 domain containing protein. These protein family has been proved that involved in leaf senescence in plant [35]. LOC_Os09g14490.1 encoding aTIR-NBS type disease resistance protein, which has been identi ed in resistance to multiple viruses in plant [36][37][38].LOC_Os02g55550.1 encoding a F-box/LRR-repeat protein 14, the LRR-repeat protein is involved in plant immune response [39]. These genes have been proved to play important roles in plants, however, their capability of producing secondary phasiRNAs suggest they might be involved in much more complex function than we expected. Systematically investigation of the temporal and spatial expression speci city of phasiRNAs generated from the transcripts of protein-coding genes in our future work might gain insight into these phasiRNAs biogenesis requirement mechanism.
In this study, besides those identi ed PHAS loci and other 21-nt PHAS loci candidates (table S1a and S1b), two cDNA sequences, LOC_Os09g00999.1 and LOC_Os09g01000.1, which were able to produce plenty of Dicer-independent secondary siRNAs in almost every tissues of rice have attracted our attention. We further employed the phasiRNAs generated from LOC_Os09g00999.1 and LOC_Os09g01000.1 for target prediction and identi cation. The results indicated plenty of siRNA-target interaction pairs were discovered (data not shown). We speculated that this might be a novel pattern of secondary siRNAs biogenesis pathways. Further investigation of Dicer-independent secondary siRNAs biogenesis pathways in plant could provide more strong evidence of this biogenesis pattern, which also provide meaningful information of the small RNA regulatory mechanism in plant.

Conclusions
We performed degradome-based screening of novel phasiRNA biogenesis pathways in rice. Five novel 21-nt phasiRNA biogenesis pathways and ve novel 24nt phasiRNA biogenesis pathways were identi ed. In addition, two known 21-nt phasiRNA biogenesis pathways were also identi ed. Further analysis on the targets of the detectable novel phasiRNAs with 21-nt and 22-nt length revealed total eleven novel phasiRNAs involving in forty-one siRNA-target interactions and suggest these phasiRNAs might play important roles in rice growth and development (Table 1, Additional le 8: Figure S5). These results demonstrated the effectiveness of degradome-based screening in mining novel phasiRNA biogenesis pathways and substantially extend the information of phasiRNA biogenesis pathways in rice. We believed that, more novel phasiRNA biogenesis pathways might be identi ed if extend our approach to other plant species.

Data source
The Oryza sativa small RNA high-throughput sequencing (sRNA HTS) data sets of seedling, root, shoot and panicle samples under control and stress conditions, the sRNA HTS data sets of wild type, osdcl4 and osdcl3 mutants and the degradome sequencing data sets (Additional le 5: Table S3) were retrieved from GEO (Gene Expression Omnibus; http://www. ncbi.nlm.nih.gov/geo/).The cDNAs, full-length genomic sequences of Oryza sativa were retrieved from PlantGDB (http://plantgdb.org/XGDB/phplib/). The promoter sequence of Oryza sativa were retrieved from PlantProm DB (http://linux1.softberry.com/ ). All the high-throughput sequencing data were pre-processed before use, the data of each library was normalized in RPM (reads per million) as described in our previous report [40].

Identi cation of phasiRNA biogenesis pathways in Oryza sativa
The phasiRNA loci identi cation criteria were established based on the revised trans-acting siRNA (ta-siRNA) ta-siRNA biogenesis model as we reported previously [12]. The screening of PHAS loci in rice was followed by four steps: (1) cDNA/genome sequences-derived 21-nt phased duplexes were computational predicted by "phase processing", each of these duplexes has a 2nt overhang at 3'-end.
(2) Each of these duplexes was separated into two increments and used for matching with small RNAs from small RNA high throughput sequencing data set of Rice seedling. A potential phasiRNA production region shall contains at least 5 tandem "processing" duplexes and each of these duplexes shall contains detectable phasiRNA from sense strand (plus siRNA) and/or antisense strand (minus siRNA). (3) Degradome high throughput sequencing (HTS) libraries were employed for systematically scanning the degradome-supported cleavage signatures on the screened possible phasiRNA production regions as we described in our previous work [12], and maintain the PHAS loci candidates with cleavage signatures which located in the phasiRNA production region. (4) The sRNAs bound to the PHAS loci were analyzed by using miRU algorithm [41], and the sRNA cleavage sites on those loci were further veri ed by using degradome sequencing libraries. The degradome-supported cleavage site of a sRNA trigger shall reside within 10 to 11-nt from the 5' end of the binding site [42].
Phasing scores of phasiRNA regions were calculated based on the formula which contributed by Zheng et al [29]: Phasing score where N represents the number of phase register occupied by at least one unique 21-nt/24-nt small RNA within a ve-phase register window, p represents the total number of reads for all 21-nt/24-nt small RNA falling into a given phase in a given window, U represents the total number of unique reads for all 21-nt/24nt small RNA falling out of a given phase.

Identiti cation of phasiRNA--target interaction based on degradome sequencing
The expressed novel phasiRNAs generated from 21-nt PHAS loci were predicted based on previously modi ed model of ta-siRNA biogenesis in plant [12]. The predicted phasiRNAs were recruited for target prediction by using miRU with default parameters [41], and followed by degradome sequencing-based veri cation, as described previously [40,43].

Gene expression level analysis
The sequences of each RNA-seq data set were employed for removing adapter, and then be normalized in RPM (reads per million) as we described previously [40]. The sequences were then mapped to the reference cDNA sequences, and each gene expression level was calculated by the total RPM of mapped sequences. The gene expression were pro led with the log2 transformed values of abundance with HemI [44].
Identi cation of 24nt phasiRNA target The promoter sequences in rice were obtained from PlantProm (www.softberry.com). In order to identify the potential 24-nt phasiRNA target sites in promoter sequences, BLAST analysis was performed for nding the location of the complementary sequence of 24-nt phasiRNA with no mismatch [45]. The promoters possessed phasiRNA binding sites were remained as potential target promoters. As each of the downloaded promoter sequence containing partial mRNA sequence, we identi ed the corresponding potential target genes by mapping the partial mRNA sequence to cDNA sequences. The DNA methylation status of potential target promoters were analyzed by utilizing the bisul te-seq data sets of panicle (GSM4232038) and root (GSM4232039) of rice. The expression speci city of phasiRNA in different tissues should consistent with the occurring of increasing methylation of the target promoter.
The DNA methylation analysis of promoters were performed according to the method of Zhao et al [27]. The sequences of bisul te sequencing libraries was processed by adapter cut and normalization in RPM (reads per million) as we described previously [40]. Then the sequences were mapped to the potential promoter sequences, and the uniquely mapped sequences were used for further DNA methylation level analysis. The DNA methylation level of each cytosine was obtained by calculation of the total coverage of individual cytosines in RPM. Availability of data and materials

Abbreviations
The small RNA high-throughput sequencing data sets and degradome sequencing data sets used for the study were retrieved from GEO (     The regulatory networks of phasiRNA pathways in Oryza sativa. The OSsRNA-2-LOC_Os02g18750.1-phasiRNA (A), OSsRNA-3-LOC_Os05g43650.1 -phasiRNA (B), OSsRNA-4-LOC_Os06g30680.1-phasiRNA (C) and OSsRNA-5-LOC_Os12g42380.1-phasiRNA (D) regulatory network were constructed by Cytoscape based on the validated phasiRNAs and their targets. The phasiRNAs are the gray nodes, the orange nodes represent the targets involving plant development, stress response, disease resistance or signaling transport. The blue nodes represent the expressed proteins with unknown functions.