Drought responses and ssRNA-seq of cassava
Compared with the control (0 h), leaves of cassava seedlings were badly wilted after 24 h of PEG-simulated drought stress (Additional file 1: Figure S1). Similar phenotypes were observed in our previous study [16], which also demonstrated that physiological traits such as peroxidase activity, proline, and soluble protein content were significantly altered, and the expression levels of thousands of genes were dramatically changed after 3 and 24 h of 20% PEG treatment. As an extended research, similar PEG treatments were performed in this study but we mainly focused on the systematic identification and functional characterization of drought-responsive lncRNAs through a ssRNA-seq approach.
After trimming adapters and removing low quality and contaminated reads, in total, 1.49 billion clean reads of 150-bp in length were generated from 24 libraries (12 samples × 2 replicates) by paired-end sequencing with Illumina HiSeq 4000 platform, and ~ 81.3% of them were mapped to the cassava reference genome for further analysis. The total length of all the mapped reads was over 181.8 gigabases (Gb), representing about 351-fold coverage of the cassava genome. Subsequently, a computational pipeline based on ssRNA-seq data was implemented to identify cassava lncRNAs (Fig. 1).
Identification and characterization of lncRNA in cassava
In total, 111,585 transcripts were obtained after transcriptome re-construction of all ssRNA-seq data using cufflinks pipeline. Subsequently, a few filtering steps were applied to identify the drought-responsive lncRNAs of high-confidence (Fig. 1). Firstly, the transcripts overlapped with known protein-coding genes in the same strand were removed. In this step, a total of 92,759 (~ 83%) transcripts, which were overlapped with 33,033 protein-coding genes representing all annotated genes of the cassava genome, were filtered. Secondly, the transcripts with exon < 2 and length < 200 bp were removed, which resulted in 2761 remained transcripts. Thirdly, the transcripts with FPKM > 1 in less than two samples were removed, to make sure the remaining transcripts were expressed. Next, the transcripts with coding potential, which was evaluated by Coding Potential Calculator (CPC), Coding-Non-Coding Index (CNCI), and the protein families database (Pfam), were removed. Finally, a total of 833 transcripts were obtained, and later they were classified into 652 intergenic and 181 anti-sense lncRNAs according to their genomic locations.
To characterize the features of these lncRNAs, the distributions of chromosome location, transcript length, exon number, and expression level were evaluated in two groups of intergenic and anti-sense lncRNAs. Overall, intergenic and anti-sense lncRNAs were distributed in all 18 chromosomes of the cassava genome, although different emphases were revealed. For examples, higher percents of intergenic lncRNAs were found in chromosome 1, 2, 9, and 17, while higher percents of anti-sense lncRNAs were found in chromosome 5, 7, 13, and 14 (Fig. 2a). Notably, at the expression level, the percent of expressed anti-sense lncRNAs (FPKM > 1) was higher than that of intergenic lncRNAs in all examined samples (except sample ‘FL00’, Fig. 2b), indicating that anti-sense lncRNAs have a higher probability to be expressed in a given sample compared with intergenic lncRNAs. However, no obvious differences were observed between intergenic and anti-sense lncRNAs regarding the distribution of transcript length and exon number: the median lengths of these intergenic and anti-sense lncRNAs were 787 and 839 nucleotides (nt), respectively, and most of them were shorter than 2200 nt (Fig. 2c); approximately 55% of the cassava intergenic and anti-sense lncRNAs contained two exons, and ~ 26% and ~ 11% contained three and four exons, and only ~ 8% contained at least five exons (Fig. 2d). Together, these results provide a general overview of the characterization of lncRNA in response to drought stress in cassava.
Identification of differentially expressed (DE) lncRNAs
To explore the transcriptional changes of lncRNAs affected by PEG treatment, DE lncRNAs were identified by pair-wise comparison of samples collected at different time-points within the same tissue, respectively.
Overall, in FL, FEL, and RT, only a few lncRNAs were differentially expressed at 3 h of PEG treatment, while the numbers of DE lncRNAs were increased more than twice at 24 h. On the contrary, this tendency was clearly different in BL: a great number of lncRNAs were differentially expressed at 3 h upon PEG treatment, whereas only a few lncRNAs were significantly changed at 24 h when compared with 3 h (Fig. 3a). Consistently, a gradient change of DE lncRNA number was observed among different tissues: BL (44) > FEL (39) = FL (39) > RT (25) (Fig. 3b). These results suggested that lncRNAs had a faster and stronger response in old leaf (e.g., BL) than in young leaves (e.g., FEL and FL) and root upon PEG stress.
In total, 124 DE lncRNAs were identified in response to PEG treatment. Most of them were exclusively identified in FL (31), FEL (26), BL (27), and RT (19), and 21 were commonly identified in at least two tissues (Fig. 3c). However, none of lncRNAs were identified in all four tissues. These results indicated that lncRNAs preferred to function in a tissue-specific manner.
Functional characterization of DE lncRNAs in trans-regulation
To explore the potential functions of drought-responsive lncRNAs, a total of 124 DE lncRNAs, together with 5187 DE genes, were selected and subjected to co-expression analysis to identify trans-regulatory networks of lncRNAs. Subsequently, functional enrichment analysis was performed for the genes of each group (co-expressed module), respectively, and then the enriched functions could be used to predict the functions of lncRNAs that were co-expressed with these DE genes.
In total, 11 groups (M1-M11) were identified based on their different expression patterns (Fig. 4a). The genes/lncRNAs from group M1 to M4 were highly expressed in FL but exhibited different emphases. For examples, group M1 included genes/lncRNAs that were greatly induced at 3 h but decreased at 24 h compared with the expression levels at 0 h, while group M2 included genes/lncRNAs that were gradually decreased at 3 h and 24 h in both FL and RT. These genes from group M1 and M2 were significantly enriched in protein synthesis and RNA regulation of transcription, respectively (Fig. 4b). However, no lncRNAs were included in these two groups. There were 7 lncRNAs in group M3. The genes/lncRNAs included in this group were gradually decreased from 0 h to 24 h in FL, and they were significantly enriched in cell cycle, cell division, cell wall, DNA repair, and signaling of receptor kinases (Fig. 4b). There were 6 lncRNAs in group M4. The genes/lncRNAs from this group were gradually induced from 0 h to 24 h in FL upon PEG treatment, however, no enriched categories were found in this group.
The genes/lncRNAs from group M5 to M6 were highly expressed in FEL (Fig. 4a). The expression of the former group was greatly suppressed, while that of the latter group was dramatically induced at 24 h of PEG treatment. There were 5 lncRNAs in group M5, of which the genes were significantly enriched in lipid metabolism, degradation of major carbohydrate (CHO) metabolism, calvin cycle and light reaction, secondary metabolism, light signaling, and tetrapyrrole synthesis. The genes included in group M6 were significantly enriched in amino acid synthesis, lipid metabolism, secondary metabolism of wax, and abiotic stress, but none of lncRNAs were included in this group (Fig. 4b).
The genes/lncRNAs from group M7 to M8 were highly expressed in BL (Fig. 4a). Similar to group M5 and M6, the former group was greatly decreased whereas the latter group was significantly increased at 24 h of PEG treatment. There were 13 lncRNAs in group M7, of which the enriched categories included synthesis of major CHO metabolism, trehalose metabolism, nitrogen (N)-metabolism, photosynthesis, redox, secondary metabolism of flavonoids, and light signaling. There was only one lncRNA in group M8, and this group was significantly enriched in protein assembly and cofactor ligation (Fig. 4b).
The genes/lncRNAs from group M9 to M11 were highly expressed in RT (Fig. 4a). It was clearly observed that the expression was dramatically decreased from 0 h to 24 h upon PEG treatment in group M9, which contained only one lncRNA. The enriched categories in this group included hormone metabolisms such as gibberellin and jasmonate, mitochondrial electron transport/ATP synthesis, calcium signaling, and receptor kinases signaling. There were 4 and 9 lncRNAs in group M10 and M11, respectively. The genes/lncRNAs from group M10 were greatly induced at 3 h but suppressed at 24 h, and they were significantly enriched in abscisic acid (ABA), degradation of major CHO metabolism, secondary metabolisms, and abiotic stress. On the contrary, the genes/lncRNAs from group M11 were dramatically induced at 24 h after PEG treatment, and they were significantly enriched in raffinose metabolism, protein folding, and abiotic stress (Fig. 4b).
Taken together, these results revealed that the genes/lncRNAs were exhibited in a tissue-specific manner in response to PEG treatment in cassava, and also suggested that the genes/lncRNAs preferred to function differently in distinct tissues: e.g., cell-related metabolism, cell wall, RNA regulation of transcription in FL; degradation of major CHO metabolism, calvin cycle and light reaction, light signaling, and tetrapyrrole synthesis in FEL; synthesis of major CHO metabolism, N-metabolism, photosynthesis, and redox in BL; and hormone metabolism, secondary metabolism, calcium signaling, and abiotic stress in RT.
Functional characterization of DE lncRNAs in cis-regulation
To further explore the potential functions of drought-responsive DE lncRNAs, protein-coding genes, which were spaced 10 k/100 k upstream and downstream of these lncRNAs, were selected and subjected to co-expression analysis. The lncRNA-mRNA pairs that were highly correlated and closely located were specifically attractive in a cis-acting regulatory relationship.
In total, 27 lncRNA-mRNA pairs involved in cis-acting regulation were identified (Additional file 2: Table S1). Of which, TCONS_00033864 was located 3798 bp upstream of Manes.05G018500 encoding a SAUR-like auxin-responsive gene, TCONS_00060863 was located 2447 bp downstream of Manes.10G067700 encoding 8-hydroxylase involved in ABA catabolism (Fig. 5a), TCONS_00097416 was located 6655 bp upstream of Manes.16G102700 involved in ethylene signaling, suggesting these lncRNAs were involved in hormone regulation in response to drought stress. TCONS_00040721 was spaced 6652 bp upstream of Manes.06G036900 encoding an AP2/EREBP transcription factor (Fig. 5b), and TCONS_00069665 was spaced 91,056 bp upstream of Manes.11G134100 encoding a C2H2 zinc finger transcription factor, indicating that these two lncRNAs participated in RNA regulation of transcription. In addition, several lncRNAs involved in signaling of receptor kinase were also found, e.g., TCONS_00065400 was located 37,640 bp upstream of Manes.11G042500 encoding a member of proline-rich extensin-like receptor kinase (Fig. 5c), and TCONS_00099153 was spaced 5973 bp upstream of Manes.17G056800 encoded a leucine-rich repeat protein kinase and required for root hair elongation.
Together, these results suggested that these DE lncRNAs, which might act as regulators in cis-acting in response to PEG treatment, regulated the expression of their neighboring genes mainly through hormone metabolism, RNA regulation of transcription, and signaling of receptor kinase.
Functional prediction of lncRNAs acting as miRNA target mimics
lncRNAs have been demonstrated to function through miRNAs for transcriptional, post-transcriptional, and epigenetic gene regulation, therefore, it’s of great importance to investigate the crosstalk between lncRNAs and miRNAs by exploring the lncRNAs acting as target mimic of known miRNAs in cassava.
In total, 11 lncRNAs were identified acting as target mimics of known miRNAs, such as miR156, miR164, miR169, and miR172 (Additional file 3: Table S2). miR156 is stress-induced and it targets SPL genes (e.g., SPL9) in plant development and abiotic stress tolerance [21, 22]. As a target mimic of miR156k, TCONS_00068353 was greatly suppressed in FL, and consistently, a homolog of SPL9 (Manes.09G032800) exhibited similar expression trend of TCONS_00068353 under PEG treatment (Fig. 5d and Additional file 3: Table S2). miR172 participated in water deficit and salt stress through the expression regulation of AP2-like transcription factors [23], and its expression was promoted by SPL genes [24]. Further studies revealed that SPL/miR156 module can interact with the AP2/miR172 unit in barely [25]. It’s worthy to note that, in our study, TCONS_00068353 was bound with miR172c, coordinated with the decreased expression of miR172-targeted AP2-like gene (Manes.05G184000) in FL under PEG treatment. In addition, TCONS_00068353/Manes.09G032800 and TCONS_00068353/Manes.05G184000 showed similar expression patterns in response to PEG treatment, supporting the interactions between SPL/miR156 module and AP2/miR172 unit [25].
Besides the miRNA-mRNA interactions consistent with the previously reporters, some different and currently unknown interactions were identified. For examples, MYC2 and CSD2 were the targets of miR169 and miR398, respectively, but they were predicted as the targets of miR164a and miR171g in our study, in accordance with the similar expression patterns of TCONS_00068353 and TCONS_00072359 (Fig. 5e and Additional file 3: Table S2) which acted as the target mimics of miR164a and miR171g, respectively.
Together, these results strongly suggested that lncRNAs might function through miRNAs in the response of drought stress in cassava.
Validation of lncRNAs and genes by qRT-PCR
To validate the expression results of ssRNA-seq data, a total of five crucial lncRNAs, which were involved in trans-acting, cis-acting, or miRNA target mimics, and 13 co-expressed genes were tested by qRT-PCR method. Overall, high correlation coefficients (R = 0.78–0.99) were revealed between these two independent measurements (Fig. 6 and Additional file 4: Table S3), indicating that the expression patterns of lncRNAs and genes based on ssRNA-seq data are reliable.