Characterization of the small RNA component of the transcriptome from grain and sweet sorghum stems
© Calviño et al; licensee BioMed Central Ltd. 2011
Received: 4 May 2011
Accepted: 8 July 2011
Published: 8 July 2011
Skip to main content
© Calviño et al; licensee BioMed Central Ltd. 2011
Received: 4 May 2011
Accepted: 8 July 2011
Published: 8 July 2011
Sorghum belongs to the tribe of the Andropogoneae that includes potential biofuel crops like switchgrass, Miscanthus and successful biofuel crops like corn and sugarcane. However, from a genomics point of view sorghum has compared to these other species a simpler genome because it lacks the additional rounds of whole genome duplication events. Therefore, it has become possible to generate a high-quality genome sequence. Furthermore, cultivars exists that rival sugarcane in levels of stem sugar so that a genetic approach can be used to investigate which genes are differentially expressed to achieve high levels of stem sugar.
Here, we characterized the small RNA component of the transcriptome from grain and sweet sorghum stems, and from F2 plants derived from their cross that segregated for sugar content and flowering time. We found that variation in miR172 and miR395 expression correlated with flowering time whereas variation in miR169 expression correlated with sugar content in stems. Interestingly, genotypic differences in the ratio of miR395 to miR395* were identified, with miR395* species expressed as abundantly as miR395 in sweet sorghum but not in grain sorghum. Finally, we provided experimental evidence for previously annotated miRNAs detecting the expression of 25 miRNA families from the 27 known and discovered 9 new miRNAs candidates in the sorghum genome.
Sequencing the small RNA component of sorghum stem tissue provides us with experimental evidence for previously predicted microRNAs in the sorghum genome and microRNAs with a potential role in stem sugar accumulation and flowering time.
Small RNAs (18-25 nt) regulate many developmental and physiological processes in plants through the regulation of gene expression at either the transcriptional or post-transcriptional level [1–3]. They can be subdivided into short-interfering RNAs (siRNAs) and microRNAs (miRNAs) [3–5].
MicroRNAs are derived from capped and polyadenylated primary (pri)-miRNA transcripts that are transcribed by RNA polymerase II and can form a hairpin-loop structure by intramolecular pairing [4, 6]. Two sequential cleavages mediated by DICER LIKE 1 (DCL1) are required to produce a mature miRNA [4, 7]. In the first cleavage, DCL1 cleaves near the base of the hairpin-loop stem of the pri-miRNA to produce a miRNA precursor (pre-miRNA). The second cleavage takes place near the loop of the pre-miRNA to produce a miRNA/miRNA* duplex. The mature miRNA is then loaded into the RNA-induced silencing complex (RISC) and can guide the sequence-specific cleavage or translational inhibition of target mRNAs [2, 4, 7, 8], as well as gene silencing through DNA methylation [9, 10], whereas the non-incorporated miRNA* strand is usually degraded.
Through the use of next-generation sequencing, the small RNA component of the Arabidopsis and rice transcriptomes has been well characterized, more than in any other plant species [11, 12]. This is reflected in the miRBase database (http://www.mirbase.org, release 16: September 2010), where 213 miRNAs are described for Arabidopsis whereas 462 miRNAs are described for rice. Besides rice, the identification of miRNAs through deep sequencing in other grasses including maize, wheat, and Brachypodium have been described [13–15]. The identification of rice, maize and wheat miRNAs from different tissues, developmental stages and stress-treatments [12, 13, 15–20], provides an opportunity to understand how miRNAs regulate the expression of genes influencing traits of agronomic importance. Currently, a trait of particular relevance for biofuel production is that of sugar accumulation in the stem of sorghum [Sorghum bicolor (L.) Moench] and sugarcane (Saccharum spp.), two closely related C4 grasses that diverged from each other about 8-9 million years ago .
In both species, sucrose is the main type of sugar and accumulates in the parenchyma tissue of the juicy stems [22, 23]. High sucrose content is a highly desirable trait since the accumulated sugar can be fermented to produce bioethanol as a source of renewable energy . Although sugarcane has been extensively used as a source of biofuel, its use as a model system to understand the genetics of sugar accumulation is hampered by its complex genome, with several cultivars differing greatly in their ploidy levels . Sorghum instead, is a diploid species and its genome has been recently sequenced . In addition, the intra-species variation for sugar content is much more pronounced in sorghum than in sugarcane , with sorghum cultivars known as sweet sorghums accumulating high levels of sugars relative to grain sorghums . This makes sorghum a more suitable system to study the genetic basis of sugar accumulation. Still, the gene repertoire involved in sugar accumulation is not well characterized in sorghum due to the low heritability of the trait and its quantitative inheritance. In addition, previous reports have suggested the existence of trade-offs between sugar content and other plant traits such as flowering time [28, 29].
We also observed that sugar accumulation (measured as Brix degree and referred herein as Brix) in the stem of grain sorghum BTx623 and sweet sorghum Rio cultivars differed at the time of flowering. Interestingly, 80% of the differentially expressed genes in stem tissue between the two cultivars had orthologous counterparts in syntenic positions in rice [30, 31]. This suggested that the ability of sorghum to accumulate soluble sugars relative to rice could not be explained by differences in their gene content but rather due to gene regulation at either the transcriptional or post-transcriptional level. To address the latter possibility, we characterized the small RNA portion of transcriptomes derived from stem tissues of grain and sweet sorghum in order to investigate the microRNA-mediated regulation of genes involved in sugar accumulation and flowering time. Using the SOLiD next generation sequencing system, we sequenced with an unprecedented depth small RNAs libraries from BTx623 and Rio, and from a pool of selected F2 plants derived from their cross that differed in sugar content and flowering time. We also reasoned that plant stems would provide us with a representative tissue to experimentally validate the previously predicted miRNAs of the sorghum genome . Indeed, we not only detected the expression of 25 miRNA families from the 27 predicted families in the sorghum genome but also discovered 9 new miRNA candidates. Furthermore, we could correlate genotypic variation of miRNA expression with the sugar and flowering phenotypes. In addition, we found that the size distribution of small RNAs in sorghum stems was quite heterogeneous, characterized by RNAs with at least 25 nt in length that were mainly derived from ribosomal and transfer RNAs not annotated in the sorghum genome.
Deep sequencing statistics of stem-derived small RNAs
# raw sequences
# perfect matches
The frequency and size distribution of small RNAs from sorghum stems revealed two interesting aspects: a peak of 25 nt small RNAs with similar abundance as the 24 nt class, and a second peak of small RNAs with 22 nt that were more abundant than the 20 and 21 nt classes, respectively (Figure 2b). This finding contrasted with the size distribution of small RNAs described for several monocot species (including small RNAs from sorghum inflorescence), in which the most abundant small RNAs were 21 and 24 nt in length, with maize being the exception, showing a larger 22 nt peak relative to the 21 nt peak . This led to the hypothesis that the 22 nt class of small RNAs are specific to maize . However, we have shown here that a 22 nt peak is also present in sorghum stem tissue. Furthermore, we found that a high proportion of the 22 nt reads were derived from miR172c, accounting for approximately 15% of all the 22 nt reads in the BTx623 library (Figure 2c). Our results differ from the predicted length of 20 nt for miR172c annotated in the miRBase database. Interestingly, MIR172c is located within the third intron of the Sb04g037375 gene.
The finding of small RNAs of 25 nt in length with such high abundance was unexpected. This prompted us to investigate whether they could be derived from ribosomal and/or transfer RNA genes that had not yet been annotated in the sorghum genome. Furthermore, since the sequencing read length of the SOLiD system at the time of our experiment was limited to a maximum of 25 nucleotides, it is possible that these RNAs are longer. In order to address this question, we analyzed several loci in the genome that accumulated more than thousand reads (defined as 25 nt hotspots) and found indeed that they were derived from non-annotated rRNA and tRNA genes (Additional File 1, Table S1).
In summary, we showed that the small RNA component from the stem transcriptome of sorghum is characterized by small RNAs of 22 nt in length that are mainly derived from miR172c, and by a size class of RNAs with at least 25 nt in length that are predominantly derived from rRNAs and tRNAs genes that had not been annotated in the sorghum genome.
The sequencing consortium of the sorghum genome identified 149 predicted miRNAs belonging to 27 miRNA families , and we could detect the expression of miRNA members from 25 families based on the following criteria: a miRNA family was considered expressed only if its sequencing reads were detected in at least three libraries and with a frequency of 10 reads or more for the sum of the five libraries. A list with the reads count for each known miRNA family is provided in Additional file 2, Table S2.
Although miR169 and miR395 have known roles in drought stress and sulphur starvation, respectively [37, 38], our data suggested a possible function for these miRNAs in sugar accumulation and flowering time. Because the pool of F2 plants used for library construction were selected based on both phenotypes, it was not possible to assign the expression inheritance pattern of both miRNAs to either sugar accumulation or flowering time alone. For this reason, additional plants from the same F2 population differing in sugar content but with similar flowering time were selected and the expression of a representative member from each miRNA family, miR169d and miR395f respectively, was quantified using the TaqMan assay. We found that high expression of miR169d in BTx623 correlated with low Brix (Figure 3d). This suggested that high expression levels of miR169 might lead to a reduction in stem sugar content regardless of flowering time. Surprisingly, high expression of miR395f in Rio relative to that in BTx623 did not correlate with sugar content in F2 plants (Figure 3e). This might indicate that high expression of miR395 would be required for flowering regardless of sugar content in the stem. Consistent with the role of miR172 in flowering, we did not observe any difference in the expression of miR172a in F2 plants with the same flowering time but different Brix (Figure 3f).
In summary, high expression of miR172 in BTx623 correlated with early flowering in the F2, whereas the opposite was true for miR395, high expression of this miRNA in Rio correlated with late flowering in the F2 plants selected. Regarding sugar content in the stem, high expression of miR169 in BTx623 correlated with low Brix in the F2 plants selected.
Although our data might suggest a possible function of miR169 in sugar content and miR395 in flowering time, we could not detect any predicted target related to carbohydrate metabolism and flowering time respectively (Additional file 3, Table S3 and Additional file 4, Figure S1). Thus, the expression of miR169 and miR395 target genes, and their correlation with Brix and flowering phenotypes remains to be elucidated. Regarding the miR172-predicted targets, we detected cleavage products for the genes INDETERMINATE SPIKELET 1 (IDS1) and an AP2 transcription factor (Additional file 3, Table S3; Additional file 4, Figure S1; and Additional file 5, Figure S2). Furthermore, when the expression of these two miR172 target genes was tested, we found that they were expressed higher in Rio compared with BTx623 as expected. However, we could not find a correlation between their expression levels with the flowering phenotype in the F2 pools of plants selected (data not shown).
A FRIGIDA-like 2 (FRL2) and a TYPE A RESPONSE REGULATOR 3 (RR3) were predicted as new targets of miR172 with the cleavage product of FRL2 experimentally validated in this study (Additional file 5, Figure S2). The FRIGIDA-related genes are a major determinant of natural variation in the winter-annual habit between Arabidopsis accessions [39, 40], whereas the TYPE A RESPONSE REGULATOR 3 (ARR3) has a function in the circadian clock . Although sorghum is a crop from semi-arid regions , the miR172-mediated post-transcriptional regulation of FRL2 might have a role in the adaptation of sorghum to temperate climates. Consistent with this, a role of miR172 in the regulation of flowering time by ambient temperature in Arabidopsis has been recently described .
List of new predicted MIR genes in the sorghum genome
MIR gene ID
miRNA sequence 5'-3'
miRNA* sequence 5'-3'
From the newly identified miRNAs, miR5386, and miR5388 displayed allelic variation in expression between BTx623 and Rio that was inherited in the F2 offspring (Figure 5). However, the predicted target genes for miR5386 did not include any transcript involved neither in flowering nor in carbohydrate metabolism. This was a similar case with miR5388, with no predicted targets involved in flowering but with two genes involved in carbohydrate metabolism as predicted targets, encoding the beta subunit 1 and 2 of the Snf-1 related protein kinase (SnRK1) respectively  (Additional file 8, Figure S5).
We next attempted to experimentally validate the miRNA-mediated cleavage of predicted targets. Potential miRNA target sites were scored from 0 to 8 (see Methods), with higher scores indicating less confidence in the predictions. We tested 14 predicted targets with scores less than 4 but we could not detect the miRNA-mediated cleavage for any of them. A low rate in target validation has also been observed for newly predicted miRNAs in tomato, with three targets validated from 65 predicted targets that were tested . Recently, a similar case of very low rate in target validation was reported for predicted targets of new miRNAs identified in Arabidopsis lyrata .
Here we have described the first characterization of the small RNA component of the transcriptome from sorghum stems. The choice of stems as plant material is interesting not only because it is the tissue were fermentable sugars do accumulate, but it is also the venue for the movement of small RNA duplexes (siRNAs and miRNAs) from source to sink tissues, as have been recently demonstrated [48, 49]. Thus, one could expect the small RNA component of the stem to be quite diverse or heterogeneous. Indeed, the unexpected finding of a high abundance peak of RNAs with 25 nt or more in length lead us to the finding of rRNA and tRNA genes that have not been annotated yet in the sorghum genome. We have also shown that the abundance of the 22 nt small RNAs in sorghum stem tissue was greater than the 20 and 21 nt small RNAs respectively. Our results contrast the recently proposed notion that the 22 nt peak of small RNAs is exclusive of maize . Furthermore, we found that up to 15% of all the 22 nt small RNAs in the BTx623 library were derived from miR172c, which has been previously predicted to have a length of 20 nt (Paterson et al. 2009). Recently, 22 nt miRNAs have been described to trigger siRNA biogenesis from target transcripts in Arabidopsis [50, 51]. Thus, it would be interesting to test if miR172c can also trigger siRNA biogenesis in sorghum.
As expected, the specific genetic material, tissue sample and developmental stage used in our study, allowed us to capture a broad spectrum of the small RNA component of the sorghum transcriptome. On the other hand, the specificity of the material also permitted us to gain new insights into how complex traits like sugar accumulation and flowering time might be regulated at the post-transcriptional level. Such regulation of gene expression could provide an opportunity to manipulate biofuel traits, where stem sugar rather than cellulose and increased biomass because of delayed flowering could be enhanced . By taking a genetic approach in conjunction with deep-sequencing of stem-derived small RNAs, we were able to correlate variation in miRNA expression between grain and sweet sorghum, with the sugar and flowering phenotypes of selected F2 plants derived from their cross. However, analysis of the differential accumulation of potential target genes did not exhibit a simple correlation with miRNA levels. Therefore, further studies will be required to unveil the underlying mechanisms between genotype and phenotype.
In the case of miR395, it is interesting to note that there was genotypic variation in the miR395/miR395* ratio, with the Rio genotype expressing both strands at equal proportions in contrast to a clear predominance of miR395 abundance over miR395* in BTx623 (Figure 4b). This is reminiscent of the recently proposed "arm switching" model of miRNA evolution described for nematodes species , in which the mature miRNA is produced from the 5' arm of the miRNA hairpin in a particular species but in a different nematode species the 5' arm of the same MIR gene gives rise to the miRNA* instead. Interestingly, it has been shown recently that miRNA* species have physiological relevance in Drosophila, since a significant number of them are well conserved, can be loaded into the RISC complex through their preferential association with ARGONAUTE2 (AGO2) rather that AGO1, and can also regulate the expression of target genes . Furthermore, the regulatory potential of miRNA* species in vertebrates has been recently demonstrated as well .
Based on the above, several interesting questions can be formulated. First, does miR395* have any regulatory potential? Second, what is the mechanism behind the genotypic difference in miR395/miR395* ratio? Third, is this ratio altered in a developmental and/or tissue dependent manner? Fourth, is this an example of a general phenomenon? If this is the case, we would envision that other miRNAs families as well will display differences in their miRNA/miRNA* ratio dependent on the genotype and/or condition. Future work will be required to provide a better understanding of miR395's involvement in processes other than its previously described role in sulfur metabolism.
The grain (BTx623) and sweet (Rio) sorghum cultivars together with F2 plants derived from their cross were grown in the field of the Waksman Institute during the summer of 2008. The juice from three internodes of the main stem was harvested at the time of flowering and the Brix degree measured as previously described [30, 31]. The average Brix degree from three internodes per plant was used. Flowering time was measured as the number of leaves in the main stem at the time of anthesis.
In total, 15 plants for each parent and 553 F2 plants were scored for Brix degree and flowering time. The F2 plants selected for sequencing had either low Brix (Brix ≤ 5)/early flowering (N0 leaves ≤ 9) or high Brix (Brix ≥ 13)/late flowering (NO leaves ≥ 14).
Total RNA from internode tissue was extracted at the time of flowering with the mirVana miRNA isolation kit (Ambion). RNA extraction was performed in 5 independent plants for each BTx623 and Rio, and 11 independent plants for each low Brix/early flowering and high Brix/late flowering F2 plants respectively. The total RNA (1 μg per sample) was pooled and then fractionated with the flashPage fractionator (Ambion) to isolate RNAs smaller that 40 nt in length. The isolated small RNAs were used to construct small RNA cDNA libraries with the SOLiD small RNA library construction kit (Ambion). The sequencing was carried out at the Waksman genomics laboratory on the SOLiD 3 platform, which has a read length limit of 25 nt http://solid.rutgers.edu.
We mapped the 25 nt long reads to the sorghum genome using the SHRiMP program version 1.0.5 , with default parameter settings except that the number of matches was limited to 10. SHRiMP allowed us to perform the alignments in SOLiD's colorspace. For the further analyses we used only alignments that matched perfectly to the genome starting from the first position in the read up to the sequencing primer. Because the SOLiD 3 platform had a read length limit of 25 nucleotides, adaptor sequences did not have to be trimmed prior mapping to the genome. As a consequence, we could estimate the length of an individual sequence read by one base with a probability of 0.25. These reads were then clustered with Vmatch http://vmatch.de/ to reduce the number of identical reads for downstream analyses. We required 100% identity among the sequences of a cluster. We have further filtered the clustered reads against the repetitive elements of sorghum and used the remaining sequences for de novo prediction of miRNA using miRDeep.
We defined a 25 nt "hotspot" as those loci in the genome that displayed a high coverage of 25 nt reads, in our case thousand reads. The length of the hotspot was determined as each consecutive interrogated base that had more than thousand 25 nt reads mapped to it.
The TaqMan MicroRNA Assays (Applied Biosystems) was used to quantify the expression of miR172a, and the Custom TaqMan Small RNA Assays (Applied Biosystems) was used to quantify the expression of miR169d and miR395f respectively. The qRT-PCR reaction was done using the MyiQ Real-Time PCR Detection System (BIO-RAD Laboratories, Inc.). A relative quantification normalized against unit mass (10 ng total RNA) was performed as previously described .
For de novo prediction of potential miRNAs, we have used the miRDeep package . As miRDeep does not take colorspace alignment as input, we had to reshape the output to miRDeep's blastparse format. Moreover, the SHRiMP alignment scores and the score used had to be recalculated to fit miRDeep's blastparse format. We used the same formula and method as described . At this point, we also had to translate the color space two base encoding sequences into standard nucleotide base space sequences. As we considered only perfectly matching reads after the initial alignment to the genome, we could easily translate from color space to base space sequence format. The subsequent de novo calling of miRNAs was carried out as described [43, 57].
Finally, the coordinates of de novo miRNAs that were predicted on the minus strand were corrected as miRDeep refers the coordinates to the 5' end of the minus strand. Though, conventionally the coordinates refer always to the 5' end of the plus strand. The GenBank accession numbers for the new miRNAs are sbi-MIR538.sqn sbi-MIR5381 JN205291; sbi-MIR538.sqn sbi-MIR5382 JN205292; sbi-MIR538.sqn sbi-MIR5383 JN205293; sbi-MIR538.sqn sbi-MIR5384 JN205294; sbi-MIR538.sqn sbi-MIR5385 JN205295; sbi-MIR538.sqn sbi-MIR5386 JN205296; sbi-MIR538.sqn sbi-MIR5387 JN205297; sbi-MIR538.sqn sbi-MIR5388 JN205298; sbi-MIR538.sqn sbi-MIR5389 JN205299.
We have also validated all potential new miRNAs according to the annotation criteria proposed by .
We have used the novel miRNAs for a target prediction. Firstly, we compared the sequences to the unspliced transcripts of sorghum , with BLASTN using these parameters: -F F -W 7 -e 1 -q -2 -G -1. We scored each base of the alignment according to these criteria: match as 0; GU pairs as 0.5; gaps as 2; all other pairs were scored as 1. We doubled the score within the first 13 bases of the miRNA/alignment. We considered the gene as a potential target if the total score of the alignment was equal to or less than 8.
The miRNA-mediated cleavage of mRNAs was performed through a modified procedure of the RLM-RACE protocol from Invitrogen. The sequences of the reverse primers used in the modified RACE are: Sb01g044240 (5' GCCCATATGGACGGAAGATA 3'); Sb02g007000 (5' CTGGTAGCCGGAGAACAACT 3') and Sb06g030670 (5' TTTCATCAGTGCTTGCCAAT 3'). The validation of predicted targets was performed in BTx623 or Rio cultivars only. Annotation of the miRNA gene targets was based on the Phytozome database http://www.phytozome.net.
We thank Randy Kerstetter and the Waksman Genomic Laboratory for the service of SOLiD sequencing production and Marc Probasco for his technical assistance with greenhouse and field activities. This work was supported by Selman A. Waksman Chair in Molecular Genetics to JM and in part by the sponsorship from the Institute of International Education (IIE) and the Fulbright Commission in Uruguay to MC.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.