Genome-wide analysis of long non-coding RNAs affecting roots development at an early stage in the rice response to cadmium stress

Long non-coding RNAs (lncRNAs) have been found to play a vital role in several gene regulatory networks involved in the various biological processes in plants related to stress response. However, systematic analyses of lncRNAs expressed in rice Cadmium (Cd) stress are seldom studied. Thus, we presented the characterization and expression of lncRNAs in rice root development at an early stage in response to Cd stress. The lncRNA deep sequencing revealed differentially expressed lncRNAs among Cd stress and normal condition. In the Cd stress group, 69 lncRNAs were up-regulated and 75 lncRNAs were down-regulated. Furthermore, 386 matched lncRNA-mRNA pairs were detected for 120 differentially expressed lncRNAs and 362 differentially expressed genes in cis, and target gene-related pathway analyses exhibited significant variations in cysteine and methionine metabolism pathway-related genes. For the genes in trans, overall, 28,276 interaction relationships for 144 lncRNAs and differentially expressed protein-coding genes were detected. The pathway analyses found that secondary metabolites, such as phenylpropanoids and phenylalanine, and photosynthesis pathway-related genes were significantly altered by Cd stress. All of these results indicate that lncRNAs may regulate genes of cysteine-rich peptide metabolism in cis, as well as secondary metabolites and photosynthesis in trans, to activate various physiological and biochemical reactions to respond to excessive Cd. The present study could provide a valuable resource for lncRNA studies in response to Cd treatment in rice. It also expands our knowledge about lncRNA biological function and contributes to the annotation of the rice genome.


Background
Long noncoding RNAs (lncRNAs) are a diverse class of molecules derived from RNA polymerases and arise as important regulators in the regulation of gene expression and transcription in animals and plants [1]. LncRNAs are also involved in the regulation of targeted genes via epigenetic, transcriptional and post-transcriptional methods [2]. LncRNAs are clearly distinguishable from mRNAs in their sequence structure, expression level, positional characteristics and evolution conservation [3][4][5]. Furthermore, their subspecies have been characterized and categorized in humans [3,6], Caenorhabditis elegans [7] and zebrafish [8]. In recent years, studies on lncRNAs representing different classes of transcripts longer than 200 nucleotides have extended our understanding of the eukaryotic transcriptome [9]. LncRNAs can help recruit the PHD-PRC2 complex to enable histone modifications of FLC via epigenetic regulation [10]. Nuclear speckle RNA-binding proteins (NSRs), together with AS competitor long noncoding RNAs, also called ASCO-lncRNAs, can interfere with AtNSRs in alternative splicing of auxinresponsive genes downstream and affect the growth of lateral roots [11]. In plants, numerous molecular functions and biological processes have been determined by lncRNAs; for example, vernalization, photomorphogenesis, fertility, protein re-localization, alternative splicing, phosphate homeostasis, modulation of chromatin loop dynamics etc. [12] Over the last decade, the role and function of small non-coding RNAs in plants have been widely studied [13]. However, the functional mechanisms of lncRNAs in several plant species remain unexplored and only a few of lncRNAs have been fully characterized until now. In Arabidopsis, lncRNAs such as cold-assisted intronic noncoding RNA (COLDAIR) and cold induced long antisense intragenic RNA (COOLAIR) have been confirmed to facilitate chromatin altering activities in transcriptional silencing of FLC during vernalization [10,14]. LncRNAs have been recognized to play an important role in many gene regulatory networks tangled in several biological processes of plant stress responses [2]. Moreover, a large number of putative stress lncRNAs have been categorized and characterized in Arabidopsis [15], maize [16], wheat [17], Populus trichocarpa [18], cucumber [19], tomato [20], cotton [21] and other plant species [22][23][24]. Recently, thousands of lncRNAs associated with rice development have been identified [25,26], and could provide a guidance that rice lncRNAs function cannot be ignored. However, the elusive role of lncRNAs in rice stress responses is still not fully described or understood. Therefore, it is necessary to investigate the function of lncRNAs in rice stress responses. Cadmium (Cd) is known to be an unnecessary metal element for plants and extensively spread Cd pollution has significantly affected human health in terms of its direct effects on crop production and its high increase in the edible part of crops such as rice [27]. When lncRNA function is well understood, scientists may realize that Cd-regulated lncRNAs may be involved in heavy metal stress responses, and some Cd-regulated lncRNAs have recently been detected [28]. Although previous studies have provided useful information on the mechanisms for rice lncRNA in Cd stress response, the regulatory mechanisms involved are mostly unknown. To improve our understanding of the likely functional roles of lncRNAs in rice Cd stress response, further studies are necessary to understand the functional genetics of lncRNAs in detail, and to determine which specific lncRNAs target selective sites for interaction in the rice genome.
Inclusive identification of plant lncRNAs at the genomic level is mainly dependent on and determined by advances in technical platforms [2]. Genome-wide screening by high-throughput RNA sequencing and computational applications for inclusive identification of lncRNAs would be a better choice for detailed mapping and structural studies to understand the RNA-protein and RNA-DNA interactions [29]. In plants, genome-wide analysis of lncRNAs with RNA sequencing transcriptomic data has been executed in only a limited plant species such as Arabidopsis thaliana [30][31][32], Triticum aestivum [22], Oryza sativa [25], tomato [33] and Zea mays [16,34].
In the current study, we used a deep RNA sequencing strategy to clarify the lncRNAs profiled associated with Cd stress using the Cd response rice genotype which could provide more insights into the regulatory role of more lncRNAs in the rice Cd stress response. The results obtained in our study provided a valuable resource to study the lncRNAs involved in Cd stress response and will increase the knowledge for a better understanding of the biological processes of rice stress response.

Plant material and growth conditions
One rice genotype (DX142) was selected from 82 different screened rice genotypes to measure phenotypic traits and for gene expression in this study. DX142 is a pure line and shows the highest sensitivity to Cd stress.
Full seeds of DX142 were surface sterilized with 0.5% NaClO solution for 30 min, rinsed five times with distilled water and maintained for 2 days at 25-30°C in the dark, thereby inducing germination. Seedlings were then exposed to treatments without 100 mg/L CdCl 2 (as a control) and with 100 mg/L CdCl 2 . Growth conditions were as follows: 16/8 h day/night photoperiod under 28/ 25°C day/night temperatures. Roots were collected for gene expression analysis on the 5th day after treatments.

RNA extraction, sequencing and database access
Total RNA was extracted from 5-day-old rice roots with Trizol reagent (Invitrogen, Carlsbad, CA, USA). Six samples (three replicated samples for each treatment; CK_R1, CK_R2 and CK_R3 for the normal condition: SY_R1, SY_R2 and SY_R3 for the stress condition) were used for sequencing, and differentially expressed lncRNAs and mRNAs were obtained.
RNA isolation, quantification and library preparation for lncRNA sequencing, clustering and sequencing and quality control were analysed as described by Ren et al. (2016) [35]. After these analyses, the purified data with high quality were mapped to the reference genome of Japonica variety Nipponbare [36] using Bowtie v2.0.6 and TopHat v2.0.9 software [37]. The clean data were uploaded into the NCBI Sequence Read Archive under the accession number SRP099996. The mapped reads of each sample were assembled by both Scripture (beta2) [38] and Cufflinks (v2.1.1) [39] in a reference-based approach. Cuffdiff (v2.1.1) was used to calculate FPKMs (fragments per kb per million reads) of lncRNAs in each sample [39]. Furthermore, the negative binomial distribution method was used to detect the lncRNAs obtained from the three biological replicates of each treatment. Finally, the lncRNAs developed were referred to as Cd stress (SY_R) and control (CK_R)-related lncRNAs.

lncRNA identification pipeline
To attain putative lncRNAs, we initially filtered the transcripts according to the class code annotated by Cuffcompare; only the transcripts that occurred in at least two samples were retained. To acquire lncRNAs, we only retained novel (not overlapping with known genes in sense), large (longer than 200 nucleotides), expressed (for multiple-exon transcripts FPKM ≥0.5, for single-exon transcripts FPKM ≥2) transcripts. Then, to obtain high-quality data, CPC (0.9-r2) [40] and Pfam-scan [41] were used to identify the candidate lncRNAs. Transcripts with coding potential predicted by any of the two tools previously described were filtered out, and those without coding potential were retained. Finally, we selected those shared by the two tools as the final candidate lncRNAs and used them for further analysis.

Identification of differentially expressed lncRNAs
The fragment per kb per million reads (FPKM) for lncRNAs was calculated by using Cuffdiff (v2.1.1) software [42]. For biological replicates, transcripts or genes with an adjusted p<0.05 (q<0.05) were designated differentially expressed among the two groups of rice roots.

LncRNAs target gene prediction and functional enrichment analysis
We searched coding genes 10 kb/100 kb upstream and downstream of lncRNAs as the cis target gene, and then analysed their function. The trans role of lncRNAs was identified by the expression level. We calculated the expressed correlation between lncRNAs and coding genes with custom scripts; further, we clustered the genes from different samples with WGCNA [43] to search for common expression modules and then analyzed their function through functional enrichment analysis.
To understand the functional roles of the target genes of lncRNAs, we used the Gene Ontology (GO) seq [44] R package to implement enrichment analysis, in which gene length bias was corrected. In addition, Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis on target genes was performed on KOBAS software [45] using a hypergeometric test. GO terms and KEGG pathways with a corrected p<0.05 (q<0.05) were considered significantly enriched.

Real-time quantitative PCR
Total RNA from 6 sequenced samples (include 3 biological duplication 2 treatment) were extracted using a PrimeScript™ RT reagent Kit with gDNA Eraser. SYBRbased qRT-PCR reactions (SYBR Green I, Osaka, Japan) were performed on a ABI VIIA@7 using the following reaction conditions: 95°C for 10 min followed by 40 cycles at 95°C for 15 s and 60°C for 30s. All qRT-PCR reactions were performed in triplicate samples, and the results were analyzed with the system's relative quantification software (ver.1.5) based on the (ΔΔCT) method. The detection of threshold cycle for each reaction was normalized against the expression level of the rice ACTB gene.

Morphology under cd stress and control conditions
First, 82 different rice genotypes were used in this study to check their Cd stress sensitivity for acquiring additional insights into the rice transcriptomic response to environmental Cd stress. One genotype DX142, showed the highest sensitivity and was used for further analyses.
The average root length of DX142 under Cd stress and control conditions is significantly different. The average root length of DX142 under the control condition was significantly higher than under the Cd stress condition from the second day (Fig. 1a), indicating that the rice root length was obviously inhibited by Cd. Furthermore, the root length of DX142 under control condition increased slowly after six days. Therefore, to obtain as much as lncRNA information as possible regarding rice Cd stress, the roots in five-day-old roots were harvested to detect lncRNAs and for further analysis (Fig. 1b).

Identification of lncRNAs
A total of 679,838,650 raw reads were generated using the Illumina HiSeq 2000 platform. Low-quality paired-end sequences and adapter sequences were trimmed off, and 660 million clean reads (99 Gb) were obtained with an average of 110 million reads (16.5 Gb) per sample (Table 1). Subsequently, we mapped the clean reads to the Nipponbare reference genome [36] to identify the transcripts.
Considering the characteristics of lncRNA sequences (≥200 nt) and their differences from other classes of RNA (mRNA, tRNA, rRNA, snRNA, snoRNA, pre-miRNA, and pseudogenes), we used Scripture (beta2) and Cufflinks (v.1.1) software to classify transcripts into different subtypes. Overall, 3558 transcripts out of all the 46,933 identified transcripts were predicted to be lncRNAs. We performed coding potential analysis using the software CPC and Pfam-scan to confirm these 3558 lncRNAs. After screening using harsh criteria and two analytic tools, a total of 2580 lncRNAs from Cd stress and control conditions in rice were identified and subjected for further analysis (Fig. 2).

Identification of differentially expressed lncRNAs
Genome-wide analysis of lncRNA expression under Cd stress and control conditions was performed to profile differentially expressed lncRNAs associated with Cd stress. We first assessed the lncRNA expression profiles in two different conditions (Cd stress vs control condition), and 144 differentially expressed lncRNAs from 143 lncRNA genes were identified between Cd stress (SY_R) and control (CK_R) (Additional file 1: Table S1). The 144 lncRNAs consisted of 120 large intergenic non-coding RNAs (lincRNAs), 1 intronic lncRNA, and 23 anti-sense_lncRNAs (Additional file 1: Table S1). Among them, 69 lncRNAs were up-regulated and 75 lncRNAs were down-regulated ( Fig. 3 and Additional file 1: Table S1).

The cis role of differentially expressed lncRNAs in target genes
To examine the lncRNA function, we predicted the potential targets of lncRNAs in cis. We examined proteincoding genes 10 and 100 kb upstream and downstream of the lncRNAs, respectively. In total 386 matched lncRNA-mRNA pairs for 120 differentially expressed lncRNA genes and 362 differentially expressed mRNAs were found (Additional file 2: Table S2).
GO analysis predicted that there was no significant enrichment in GO terms targeted by lncRNAs. The pathway analyses revealed 17 different pathways corresponding to the target genes (Table 2 and Additional file 3: Table S6); one of them is the cysteine and methionine metabolism pathway, which was significant in the Cd stress condition compared with the normal condition with an enrichment (q < 0.05, Fig. 4; Table 2).

The trans role of lncRNAs in target genes
On the other hand, we examined the trans role of 143 lncRNAs genes on the basis of its expressed correlation coefficient (Pearson correlation ≥0.95 or ≤ − 0.95). In a total, 28,276 interaction relationships were identified in trans between 144 lncRNAs and the protein-coding genes (Additional file 4: Table S3 and Additional file 5: Table S7).
GO analysis showed that the highly enriched GO terms targeted by lncRNAs are single-organism metabolic process, photosynthesis and response to stimulus (Fig. 5). The pathway analyses revealed 118 different pathways corresponding to the target genes, and four of them are biosynthesis of secondary metabolites, phenylpropanoid biosynthesis, phenylalanine metabolism and photosynthesis signalling pathways in the Cd stress condition  compared with the normal condition (q < 0.05) (Additional file 6: Figure S1 and Table 3).

Validation of gene expression by quantitative real-time PCR
To validate the findings from sequencing data, 10 genes correlated with mRNAs, including lncRNAs were selected randomly and analyzed by quantitative real-time polymerase chain reaction (qRT-PCR) (Fig. 6). We randomly selected 10 genes in the lncRNA-seq data. According to the random criteria, we have selected genes that were significantly different and genes that were not significantly different to enhance the reliability of the results by lncRNA-seq. The primer sequences are listed in Additional file 7: Table S4. The results showed that the expression trends were consistent for all transcripts in both analyses, with a correlation coefficient of R 2 of 0.9643 (Additional file 8: Figure S2).
The study of differentially expressed lncRNAs controlling gene expression in the rice stress response at the whole transcriptome level especially in Cd stress, is reported to be limited. Thus, our data systemically predict the lncRNAs at the whole transcriptome level and showed that which specific lncRNAs seek out selective sites in the genome for interaction in rice Cd stress. Compared to the previous studies, which were based on two or three replicated samples for each treatment for RNA sequencing, our sequencing data were obtained from three individually replicated samples for each treatment. Then, the negative binomial distribution method was used to detect the RNA information for each treatment, which could minimize the experimental error. Thus, our results could provide more comprehensive information.
Here, 143 lncRNA genes were detected that showed differential expression in the Cd stress response in the root, and 83.9 and 100% of them were identified that contained at least one differentially expressed mRNA in cis and trans, respectively, which suggests that these lncRNAs may play an important role in the rice Cd stress response. Furthermore, lncRNAs showed a higher ratio than those of the other two subtypes of all the differentially expressed Fig. 2 Screening of the candidate lncRNAs. Venn diagrams of coding potential analysis by using stringent criteria. Two tools (CPC and PFAM) were employed to analyse the coding potential of lncRNAs. Those simultaneously shared by two analytical tools were designated as candidate lncRNAs and used in subsequent analyses lncRNAs, indicating that lncRNAs may be the main form of lncRNAs in the Cd stress response in rice. Previous studies confirmed that altered splicing of lncRNA genes could quantitatively modulate, gene expression in development and other physiological processes through co-transcriptional coupling mechanisms [50][51][52], but there is less information available on the effects of heavy metal stress on altered splicing regulation particularly when global lncRNA profiles are induced by treatment with certain heavy metals. In this study, 144 lncRNA transcripts from 143 lncRNA genes were identified, only one lncRNA coding gene (XLOC_057621) was alternatively spliced and yielded two different lncRNAs transcripts, and therefore it can be predicted that alternative splicing might not be the major form of regulation in the Cd stress process for rice. In addition, we compared the lncRNA data with the study of Fei et al. [28] and found that XLOC_003991 and XLOC_044902 shared the same changed pattern (Additional file 1: Table S1).
The role of lncRNAs in stress processes creates an insistence to understand the mechanisms by which these RNAs seek their targets [15][16][17][18][19][20][22][23][24]. To improve the accuracy of target prediction, a detailed co-expression network between differentially expressed mRNA and differentially expressed lncRNAs was constructed, which showed that one lncRNA could target one or more coding genes. The result indicated that the regulation of mRNA by lncRNAs is involved in Cd-induced root development. Therefore, lncRNAs should be given more attention in heavy metal stress response studies in the future.
To gain more insight into the function of targets of lncRNAs, GO term and KEGG pathway annotations were applied to their target gene pool. KEGG analysis showed a significant change in the cysteine and methionine metabolism pathway in the Cd stress compared to normal condition. Plants produce cysteine-rich (Cys-rich) peptides that Fig. 4 Scatter plot of KEGG pathway enrichment statistics for differentially expressed target genes in cis in rice roots. Rich Factor is the ratio of differentially expressed gene numbers annotated in this pathway term to all gene numbers annotated in this pathway term. Greater Rich Factor means greater intensiveness. q-value is corrected p-value ranging from 0~1, and its less value means greater intensiveness. We displayed KEGG pathways significantly enriched due to exposure to Cd in our experiments chelate Cd to procedure non-toxic complexes which are then sequestered into the vacuole to avoid high levels of free cytotoxic Cd in the cytosol in response to Cd stress [53]. In this study, we observed that OS03G0196600, which is involved in the cysteine and methionine metabolism pathways, was clearly up-regulated (Additional file 9: Table  S5) and might contribute to the production of cysteine-rich (Cys-rich) peptides. It is interesting that XLOC_086307 (the lncRNA targeted OS03G0196600 in cis) was also up-regulated significantly, which suggests that XLOC_086307 likely participates in Cd response processes in rice by regulating the cysteine-rich peptide metabolism-related gene OS03G0196600. Previous studies showed that exposure to Cd stress will lead to impairment of the photosynthetic function in many plant species. Both chlorophylls and carotenoid contents decrease when exposed to Cd [54][55][56]. It was noticed that OS03G0184000 (the target of XLOC_086119 and XLOC_066284 in cis), which is involved in carotenoid biosynthesis, is up-regulated (Additional file 9: Table S5), which may result in increased carotene content. Oxidative cleavage of carotenoids will produce apocarotenoids, while the phytohormone ABA is an apocarotenoid derivative [57]. With the increase in apocarotenoids, ABA content was increased, and the signalling pathway was activated. Furthermore, ABA was found to be involved in the regulation of antioxidative defence systems and Cd-induced oxidative stress in mung bean seedlings [58]. Therefore, XLOC_086119 and XLOC_066284 might be involved in carotenoid biosynthesis associated with cadmium stress in rice. In trans, GO and pathway analyses predicted that the regulated transcripts of lncRNAs are mainly associated with metabolic process (ontology: biological process), intracellular part (ontology: cellular component) and catalytic activity (ontology: molecular function), which are associated with four significant gene pathways that correspond to the transcripts. Among these pathways, we found that secondary metabolites such as phenylpropanoids and phenylalanine-related genes were significantly altered by Cd stress, and our results were consistent with a previous study [53]. In the current study, the trans role of lncRNAs, including XLOC_058523, XLOC_104363 and XLOC_059778, targeted phenylpropanoids and the phenylalanine related-gene OS11G0552000, which indicated that lncRNAs may regulate the genes of the secondary metabolites in far distance and then activate the various transporters to successively guide removal of excessive Cd from the cell. Furthermore, our analysis revealed that differentially expressed mRNA OS07G0148900 (Additional file 9: Table S5) with the trans targets of differentially expressed lncRNAs in trans such as XLOC_122123, XLOC_125848 and XLOC_098316, is highly enriched in photosynthesis. The results are consistent with the fact that Cd can damage the photosynthetic apparatus and lead to impairment of photosynthetic function [59,60]. Because our samples were  grown in solution, the roots may have been passively exposed to light, which could strongly activate photosynthesis in root tissues. A similar result can be seen in the study of Zhai et al. [61]. These results suggest that the photosynthesis pathway-related genes are involved in the Cd stress response in rice and may also be regulated by lncRNAs in trans. In addition, the further experiments of the interaction among these lncRNAs and their targets both in cis and trans are underway.

Conclusions
Takn all together, our study systematically determines the genome-wide lncRNA expression profile in Cd-induced rice roots by deep sequencing. Our results showed that some lncRNAs are aberrantly expressed in Cd-treated rice roots when compared with untreated roots. In addition, after pathway analyses of the target genes of these differentially expressed lncRNAs, cysteine and methionine metabolism pathway, carotenoid biosynthesis, ABA signalling pathway (in cis), and secondary metabolites and photosynthesis (in trans) were enriched, which indicated that lncRNAs may play an important role in these pathways in response to Cd stress. Therefore, further studies are necessary needed to fully understand these lncRNAs to effectively control rice Cd pollution in the future.

Availability of data and materials
The data supporting the conclusions of this article are included within the article and its additional files. The sequence data generated during the current study are available in the NCBI Sequence Read Archive under the accession number of SRP099996 (https://www.ncbi.nlm.nih.gov/sra/SRP099996).
Authors' contributions HH conceived and designed the experiments. JB conceived and designed the experiments, and wrote the manuscript. LC, SS, and NJ performed the experiments. CZ, XP, QY, XH, JF, XC, LH, LO, XS, and JX analyzed the data. HK and GMW revised the manuscript for the language. All authors read and approved the final manuscript.

Ethics approval and consent to participate
The rice line DX142 (including the seeds) obtained from the native cultival line and all plants were grown in the test fields of Jiangxi Agriculture University. Sampling of plant materials were performed in compliance with institutional, national, and international guidelines.