Characterization of the small RNA component of the transcriptome from grain and sweet sorghum stems

Background Sorghum belongs to the tribe of the Andropogoneae that includes potential biofuel crops like switchgrass, Miscanthus and successful biofuel crops like corn and sugarcane. However, from a genomics point of view sorghum has compared to these other species a simpler genome because it lacks the additional rounds of whole genome duplication events. Therefore, it has become possible to generate a high-quality genome sequence. Furthermore, cultivars exists that rival sugarcane in levels of stem sugar so that a genetic approach can be used to investigate which genes are differentially expressed to achieve high levels of stem sugar. Results Here, we characterized the small RNA component of the transcriptome from grain and sweet sorghum stems, and from F2 plants derived from their cross that segregated for sugar content and flowering time. We found that variation in miR172 and miR395 expression correlated with flowering time whereas variation in miR169 expression correlated with sugar content in stems. Interestingly, genotypic differences in the ratio of miR395 to miR395* were identified, with miR395* species expressed as abundantly as miR395 in sweet sorghum but not in grain sorghum. Finally, we provided experimental evidence for previously annotated miRNAs detecting the expression of 25 miRNA families from the 27 known and discovered 9 new miRNAs candidates in the sorghum genome. Conclusions Sequencing the small RNA component of sorghum stem tissue provides us with experimental evidence for previously predicted microRNAs in the sorghum genome and microRNAs with a potential role in stem sugar accumulation and flowering time.

MicroRNAs are derived from capped and polyadenylated primary (pri)-miRNA transcripts that are transcribed by RNA polymerase II and can form a hairpinloop structure by intramolecular pairing [4,6]. Two sequential cleavages mediated by DICER LIKE 1 (DCL1) are required to produce a mature miRNA [4,7]. In the first cleavage, DCL1 cleaves near the base of the hairpin-loop stem of the pri-miRNA to produce a miRNA precursor (pre-miRNA). The second cleavage takes place near the loop of the pre-miRNA to produce a miRNA/miRNA* duplex. The mature miRNA is then loaded into the RNA-induced silencing complex (RISC) and can guide the sequence-specific cleavage or translational inhibition of target mRNAs [2,4,7,8], as well as gene silencing through DNA methylation [9,10], whereas the non-incorporated miRNA* strand is usually degraded.
Through the use of next-generation sequencing, the small RNA component of the Arabidopsis and rice transcriptomes has been well characterized, more than in any other plant species [11,12]. This is reflected in the miRBase database (http://www.mirbase.org, release 16: September 2010), where 213 miRNAs are described for Arabidopsis whereas 462 miRNAs are described for rice. Besides rice, the identification of miRNAs through deep sequencing in other grasses including maize, wheat, and Brachypodium have been described [13][14][15]. The identification of rice, maize and wheat miRNAs from different tissues, developmental stages and stress-treatments [12,13,[15][16][17][18][19][20], provides an opportunity to understand how miRNAs regulate the expression of genes influencing traits of agronomic importance. Currently, a trait of particular relevance for biofuel production is that of sugar accumulation in the stem of sorghum [Sorghum bicolor (L.) Moench] and sugarcane (Saccharum spp.), two closely related C4 grasses that diverged from each other about 8-9 million years ago [21].
In both species, sucrose is the main type of sugar and accumulates in the parenchyma tissue of the juicy stems [22,23]. High sucrose content is a highly desirable trait since the accumulated sugar can be fermented to produce bioethanol as a source of renewable energy [24]. Although sugarcane has been extensively used as a source of biofuel, its use as a model system to understand the genetics of sugar accumulation is hampered by its complex genome, with several cultivars differing greatly in their ploidy levels [25]. Sorghum instead, is a diploid species and its genome has been recently sequenced [26]. In addition, the intra-species variation for sugar content is much more pronounced in sorghum than in sugarcane [27], with sorghum cultivars known as sweet sorghums accumulating high levels of sugars relative to grain sorghums [28]. This makes sorghum a more suitable system to study the genetic basis of sugar accumulation. Still, the gene repertoire involved in sugar accumulation is not well characterized in sorghum due to the low heritability of the trait and its quantitative inheritance. In addition, previous reports have suggested the existence of trade-offs between sugar content and other plant traits such as flowering time [28,29].
We also observed that sugar accumulation (measured as Brix degree and referred herein as Brix) in the stem of grain sorghum BTx623 and sweet sorghum Rio cultivars differed at the time of flowering. Interestingly, 80% of the differentially expressed genes in stem tissue between the two cultivars had orthologous counterparts in syntenic positions in rice [30,31]. This suggested that the ability of sorghum to accumulate soluble sugars relative to rice could not be explained by differences in their gene content but rather due to gene regulation at either the transcriptional or post-transcriptional level. To address the latter possibility, we characterized the small RNA portion of transcriptomes derived from stem tissues of grain and sweet sorghum in order to investigate the microRNAmediated regulation of genes involved in sugar accumulation and flowering time. Using the SOLiD next generation sequencing system, we sequenced with an unprecedented depth small RNAs libraries from BTx623 and Rio, and from a pool of selected F2 plants derived from their cross that differed in sugar content and flowering time. We also reasoned that plant stems would provide us with a representative tissue to experimentally validate the previously predicted miRNAs of the sorghum genome [26]. Indeed, we not only detected the expression of 25 miRNA families from the 27 predicted families in the sorghum genome but also discovered 9 new miRNA candidates. Furthermore, we could correlate genotypic variation of miRNA expression with the sugar and flowering phenotypes. In addition, we found that the size distribution of small RNAs in sorghum stems was quite heterogeneous, characterized by RNAs with at least 25 nt in length that were mainly derived from ribosomal and transfer RNAs not annotated in the sorghum genome.

Results
Deep-sequencing of small RNAs from grain and sweet sorghum stems We constructed five small RNAs libraries from sorghum stem tissue at the time of flowering and sequenced them using the SOLiD platform. The libraries comprised samples from BTx623, Rio, low Brix and early flowering F2 plants (LB/EF F2s), high Brix and late flowering F2 plants (HB/LF F2s), and a "mixed library" (Mix), where small RNAs from the previous four libraries were mixed in equal proportions ( Figure 1).
We obtained a total of 38,336,769 sequence reads, from which 23,008,945 (60%) matched perfectly to the BTx623 reference genome ( Table 1). The reads with perfect matches that derived from repeats constituted 74 to 77% of the total reads depending on the library (Figure 2a). The non-redundant set of reads comprised 2,539,403 sequences, and the reads that were sequenced only once (termed here "singlets") comprised 2,167,946 sequences, corresponding only to 9% of the perfect matches ( Table 1), suggesting that our sequencing reached a high level of saturation. If we define a cluster as two or more reads with identical sequences, the number of clusters found ranged from 20,056 in the BTx623 library to 164,623 in the HB/LF F2s library (Table 1).

Diversity in the small RNA content of sorghum stems
The frequency and size distribution of small RNAs from sorghum stems revealed two interesting aspects: a peak of 25 nt small RNAs with similar abundance as the 24 nt class, and a second peak of small RNAs with 22 nt that were more abundant than the 20 and 21 nt classes, respectively ( Figure 2b). This finding contrasted with the size distribution of small RNAs described for several monocot species (including small RNAs from sorghum inflorescence), in which the most abundant small RNAs were 21 and 24 nt in length, with maize being the exception, showing a larger 22 nt peak relative to the 21 nt peak [13]. This led to the hypothesis that the 22 nt class of small RNAs are specific to maize [13]. However, we have shown here that a 22 nt peak is also present in sorghum stem tissue. Furthermore, we found that a high proportion of the 22 nt reads were derived from miR172c, accounting for approximately 15% of all the 22 nt reads in the BTx623 library ( Figure 2c). Our results differ from the predicted length of 20 nt for miR172c annotated in the miRBase database. Interestingly, MIR172c is located within the third intron of the Sb04g037375 gene.
The finding of small RNAs of 25 nt in length with such high abundance was unexpected. This prompted us to investigate whether they could be derived from ribosomal and/or transfer RNA genes that had not yet been annotated in the sorghum genome. Furthermore, since the sequencing read length of the SOLiD system at the time of our experiment was limited to a maximum of 25 nucleotides, it is possible that these RNAs are longer. In order to address this question, we analyzed several loci in the genome that accumulated more than thousand reads (defined as 25 nt hotspots) and found indeed that they were derived from non-annotated rRNA and tRNA genes (Additional File 1, Table S1).
In summary, we showed that the small RNA component from the stem transcriptome of sorghum is characterized by small RNAs of 22 nt in length that are mainly derived from miR172c, and by a size class of RNAs with at least 25 nt in length that are predominantly derived from rRNAs and tRNAs genes that had not been annotated in the sorghum genome.
Genotypic variation in the expression of known miRNAs between grain and sweet sorghum correlated with sugar content and flowering time in the F2 population The sequencing consortium of the sorghum genome identified 149 predicted miRNAs belonging to 27 miRNA families [26], and we could detect the expression of miRNA members from 25 families based on the following criteria: a miRNA family was considered expressed only if its sequencing reads were detected in at least three libraries and with a frequency of 10 reads or more for the sum of the five libraries. A list with the reads count for each known miRNA family is provided in Additional file 2, Table S2.
The most abundantly expressed miRNA family was miR172 (Figure 3a), comprising almost 6% of the total reads with perfect match to the BTx623 genome. The rest of the known miRNAs had abundances below 0.5% (Figure 3b). When the ratio of miRNA abundances between the BTx623 and Rio libraries was compared to the ratio between the LB/EF F2s and HB/LF F2s libraries, we could identify miRNA families whose expression differences between the parents were  inherited in the F2 plants (Figure 3c). Considering a cutoff level of two-fold change in miRNA expression, we found that miR169 and miR172 were expressed higher in BTx623 relative to Rio, and higher in LB/EF F2s compared to HB/LF F2s. This means that high expression of these miRNAs in BTx623 correlated with low Brix and early flowering in the F2 plants selected, and the opposite was true for miR395 ( Figure 3c). Although the expression difference of miR160, miR164 and miR319 between BTx623 and Rio was inherited in the F2, and thus of interest for further analysis, it was less than two fold; so we decided to focus on miR169, miR172 and miR395 instead. The observation that high expression of miR172 correlated with early flowering is consistent with the reported role of this miRNA in the promotion of flowering [32][33][34][35][36]. Although miR169 and miR395 have known roles in drought stress and sulphur starvation, respectively [37,38], our data suggested a possible function for these miRNAs in sugar accumulation and flowering time. Because the pool of F2 plants used for library construction were selected based on both phenotypes, it was not possible to assign the expression inheritance pattern of both miRNAs to either sugar accumulation or flowering time alone. For this reason, additional plants from the same F2 population differing in sugar content but with similar flowering time were selected and the expression of a representative member from each miRNA family, miR169d and miR395f respectively, was quantified using the TaqMan assay. We found that high expression of miR169d in BTx623 correlated with low Brix ( Figure  3d). This suggested that high expression levels of miR169 might lead to a reduction in stem sugar content regardless of flowering time. Surprisingly, high expression of miR395f in Rio relative to that in BTx623 did not correlate with sugar content in F2 plants ( Figure  3e). This might indicate that high expression of miR395 would be required for flowering regardless of sugar content in the stem. Consistent with the role of miR172 in flowering, we did not observe any difference in the expression of miR172a in F2 plants with the same flowering time but different Brix (Figure 3f).
In summary, high expression of miR172 in BTx623 correlated with early flowering in the F2, whereas the opposite was true for miR395, high expression of this miRNA in Rio correlated with late flowering in the F2 plants selected. Regarding sugar content in the stem, high expression of miR169 in BTx623 correlated with low Brix in the F2 plants selected.

Genotypic variation in the miR395/miR395* ratio
We detected the expression of the miRNA* for all MIR395 gene copies and this was more evident in Rio compared to BTx623, and in some instances the abundance of miR395* was even higher than that of miR395 such as the case of miR395l* for instance ( Figure 4a). Indeed, when the miR395/miR395* ratio was calculated for each library, we found that miR395 reads were approximately 6 times more abundant than miR395* reads in the BTx623 library (Additional file 2, Table S2). By contrast, the abundance of miR395 relative to miR395* was in equal proportions in the Rio library. Our data highlighted a genotypic difference in the ratio between miR395 and miR395*, with a switch in strand abundance from BTx623 to Rio (Figure 4b).
The FRL2 and RR3 genes are novel targets of miR172 Although our data might suggest a possible function of miR169 in sugar content and miR395 in flowering time, we could not detect any predicted target related to carbohydrate metabolism and flowering time respectively (Additional file 3, Table S3 and Additional file 4, Figure  S1). Thus, the expression of miR169 and miR395 target genes, and their correlation with Brix and flowering phenotypes remains to be elucidated. Regarding the miR172-predicted targets, we detected cleavage products for the genes INDETERMINATE SPIKELET 1 (IDS1) and an AP2 transcription factor (Additional file 3, Table  S3; Additional file 4, Figure S1; and Additional file 5, Figure S2). Furthermore, when the expression of these two miR172 target genes was tested, we found that they were expressed higher in Rio compared with BTx623 as expected. However, we could not find a correlation between their expression levels with the flowering phenotype in the F2 pools of plants selected (data not shown).
A FRIGIDA-like 2 (FRL2) and a TYPE A RESPONSE REGULATOR 3 (RR3) were predicted as new targets of miR172 with the cleavage product of FRL2 experimentally validated in this study (Additional file 5, Figure S2). The FRIGIDA-related genes are a major determinant of natural variation in the winter-annual habit between Arabidopsis accessions [39,40], whereas the TYPE A RESPONSE REGULATOR 3 (ARR3) has a function in the circadian clock [41]. Although sorghum is a crop from semi-arid regions [26], the miR172-mediated posttranscriptional regulation of FRL2 might have a role in the adaptation of sorghum to temperate climates. Consistent with this, a role of miR172 in the regulation of flowering time by ambient temperature in Arabidopsis has been recently described [42].

Identification of new miRNAs
The miRDeep pipeline [43] was adapted for de novo detection of miRNAs in sorghum (Additional file 6, Figure S3). From an original set of 223 predicted hairpins in the sorghum genome, 9 met the miRNA annotation criteria previously established [44], (Table 2 and Additional file 7, Figure S4). All the new miRNAs have predicted genes as targets except miR5389 (Additional file 8, Figure S5). All predicted 9 miRNAs met the expression criteria used above for known miRNAs ( Figure 5 and Additional file 9, Table S4). From all miRNAs whose expression could be detected in sorghum stems, two of them were found to be within introns of protein coding genes, these included miR172c and miR437g. From the newly identified miRNAs, miR5386, and miR5388 displayed allelic variation in expression between BTx623 and Rio that was inherited in the F2 offspring ( Figure 5). However, the predicted target genes for miR5386 did not include any transcript involved neither in flowering nor in carbohydrate metabolism. This was a similar case with miR5388, with no predicted targets involved in flowering but with two genes involved in carbohydrate metabolism as predicted targets, encoding the beta subunit 1 and 2 of the Snf-1 related protein kinase (SnRK1) respectively [45] (Additional file 8, Figure S5).
We next attempted to experimentally validate the miRNA-mediated cleavage of predicted targets. Potential miRNA target sites were scored from 0 to 8 (see Methods), with higher scores indicating less confidence in the predictions. We tested 14 predicted targets with scores less than 4 but we could not detect the miRNAmediated cleavage for any of them. A low rate in target validation has also been observed for newly predicted miRNAs in tomato, with three targets validated from 65 predicted targets that were tested [46]. Recently, a similar case of very low rate in target validation was reported for predicted targets of new miRNAs identified in Arabidopsis lyrata [47].   The rest of the known miRNAs were expressed at very low abundance (less that 0.5% of the total reads in the library) in stem tissue. (c) The miRNA abundances were used to calculate their relative fold change in expression between BTx623 and Rio, and between the LB/EF F2s and HB/LF F2s libraries, respectively. Positive values in the y-axis of the graph denote fold changes in miRNA expression that are higher in BTx623 relative to Rio and higher in LB/EF F2s relative to HB/LF F2s libraries, respectively; the opposite is true for negative values. The expression of miR169 and miR172 was at least twice as high in BTx623 relative to that in Rio and this difference was inherited in the F2. The opposite was true for miR395 expression.

Discussion
Here we have described the first characterization of the small RNA component of the transcriptome from sorghum stems. The choice of stems as plant material is interesting not only because it is the tissue were fermentable sugars do accumulate, but it is also the venue for the movement of small RNA duplexes (siR-NAs and miRNAs) from source to sink tissues, as have been recently demonstrated [48,49]. Thus, one could expect the small RNA component of the stem to be Sulfur metabolism Sulfur metabolism Any functional role? Figure 4 miR395* is highly abundant in Rio. (a) Small RNA reads derived from MIR395l are depicted. The miR395l strand sequence is shown in red whereas the miR395l* strand sequence is in orange color. In green and blue color are small RNA reads sequenced from BTx623 and Rio libraries respectively. The designation next to the small RNA reads refer to the library (bc01: Mix; bc02: BTx623; bc03: Rio; bc04: LB/EF F2s and bc05: HB/LF F2s), followed by the number of times the small RNA read was sequenced. In the BTx623 library, only reads derived from miR395l were detected whereas in the Rio library, most of the reads where derived from miR395l* instead. (b) Model depicting the genotypic variation in miR395/miR395* ratio where in Rio a switch towards miR395* strand production has occurred relative to BTx623. Based on miR395* high abundance in Rio, we postulate here the hypothesis that miR395* species could have a functional role in the regulation biological processes other than the sulfur metabolism previously described for miR395.
quite diverse or heterogeneous. Indeed, the unexpected finding of a high abundance peak of RNAs with 25 nt or more in length lead us to the finding of rRNA and tRNA genes that have not been annotated yet in the sorghum genome. We have also shown that the abundance of the 22 nt small RNAs in sorghum stem tissue was greater than the 20 and 21 nt small RNAs respectively. Our results contrast the recently proposed notion that the 22 nt peak of small RNAs is exclusive of maize [13]. Furthermore, we found that up to 15% of all the 22 nt small RNAs in the BTx623 library were derived from miR172c, which has been previously predicted to have a length of 20 nt (Paterson et al. 2009). Recently, 22 nt miRNAs have been described to trigger siRNA biogenesis from target transcripts in Arabidopsis [50,51]. Thus, it would be interesting to test if miR172c can also trigger siRNA biogenesis in sorghum. As expected, the specific genetic material, tissue sample and developmental stage used in our study, allowed us to capture a broad spectrum of the small RNA component of the sorghum transcriptome. On the other hand, the specificity of the material also permitted us to gain new insights into how complex traits like sugar accumulation and flowering time might be regulated at the post-transcriptional level. Such regulation of gene expression could provide an opportunity to manipulate biofuel traits, where stem sugar rather than cellulose and increased biomass because of delayed flowering could be enhanced [52]. By taking a genetic approach in conjunction with deep-sequencing of stem-derived small RNAs, we were able to correlate variation in miRNA expression between grain and sweet sorghum, with the sugar and flowering phenotypes of selected F2 plants derived from their cross. However, analysis of the differential accumulation of potential target genes did not exhibit a simple correlation with miRNA levels. Therefore, further studies will be required to unveil the underlying mechanisms between genotype and phenotype.
In the case of miR395, it is interesting to note that there was genotypic variation in the miR395/miR395* ratio, with the Rio genotype expressing both strands at equal proportions in contrast to a clear predominance of miR395 abundance over miR395* in BTx623 (Figure 4b). This is reminiscent of the recently proposed "arm switching" model of miRNA evolution described for nematodes species [53], in which the mature miRNA is produced from the 5' arm of the miRNA hairpin in a particular species but in a different nematode species the 5' arm of the same MIR gene gives rise to the miRNA* instead. Interestingly, it has been shown recently that miRNA* species have physiological relevance in Drosophila, since a significant number of them are well conserved, can be loaded into the RISC complex through their preferential association with ARGONAUTE2 (AGO2) rather that AGO1, and can also regulate the expression of target genes [54]. Furthermore, the regulatory potential of miRNA* species in vertebrates has been recently demonstrated as well [55].

Conclusions
Based on the above, several interesting questions can be formulated. First, does miR395* have any regulatory potential? Second, what is the mechanism behind the genotypic difference in miR395/miR395* ratio? Third, is this ratio altered in a developmental and/or tissue dependent manner? Fourth, is this an example of a general phenomenon? If this is the case, we would envision that other miRNAs families as well will display differences in their miRNA/miRNA* ratio dependent on the genotype and/or condition. Future work will be required to provide a better understanding of miR395's involvement in processes other than its previously described role in sulfur metabolism.

Plant material
The grain (BTx623) and sweet (Rio) sorghum cultivars together with F2 plants derived from their cross were grown in the field of the Waksman Institute during the summer of 2008. The juice from three internodes of the main stem was harvested at the time of flowering and the Brix degree measured as previously described [30,31]. The average Brix degree from three internodes per plant was used. Flowering time was measured as the number of leaves in the main stem at the time of anthesis.
In total, 15 plants for each parent and 553 F2 plants were scored for Brix degree and flowering time. The F2 plants selected for sequencing had either low Brix (Brix ≤ 5)/early flowering (N0 leaves ≤ 9) or high Brix (Brix ≥ 13)/late flowering (NO leaves ≥ 14).

Construction of small RNA libraries
Total RNA from internode tissue was extracted at the time of flowering with the mirVana miRNA isolation kit (Ambion). RNA extraction was performed in 5 independent plants for each BTx623 and Rio, and 11 independent plants for each low Brix/early flowering and high Brix/late flowering F2 plants respectively. The total RNA (1 μg per sample) was pooled and then fractionated with the flash-Page fractionator (Ambion) to isolate RNAs smaller that 40 nt in length. The isolated small RNAs were used to construct small RNA cDNA libraries with the SOLiD small RNA library construction kit (Ambion). The sequencing was carried out at the Waksman genomics laboratory on the SOLiD 3 platform, which has a read length limit of 25 nt http://solid.rutgers.edu.

Bioinformatic analysis
We mapped the 25 nt long reads to the sorghum genome using the SHRiMP program version 1.0.5 [56], with default parameter settings except that the number of matches was limited to 10. SHRiMP allowed us to perform the alignments in SOLiD's colorspace. For the further analyses we used only alignments that matched perfectly to the genome starting from the first position in the read up to the sequencing primer. Because the SOLiD 3 platform had a read length limit of 25 nucleotides, adaptor sequences did not have to be trimmed prior mapping to the genome. As a consequence, we could estimate the length of an individual sequence read by one base with a probability of 0.25. These reads were then clustered with Vmatch http://vmatch.de/ to reduce the number of identical reads for downstream analyses. We required 100% identity among the sequences of a cluster. We have further filtered the clustered reads against the repetitive elements of sorghum and used the remaining sequences for de novo prediction of miRNA using miRDeep.
We defined a 25 nt "hotspot" as those loci in the genome that displayed a high coverage of 25 nt reads, in our case thousand reads. The length of the hotspot was determined as each consecutive interrogated base that had more than thousand 25 nt reads mapped to it.

Quantification of miRNA expression
The TaqMan MicroRNA Assays (Applied Biosystems) was used to quantify the expression of miR172a, and the Custom TaqMan Small RNA Assays (Applied Biosystems) was used to quantify the expression of miR169d and miR395f respectively. The qRT-PCR reaction was done using the MyiQ Real-Time PCR Detection System (BIO-RAD Laboratories, Inc.). A relative quantification normalized against unit mass (10 ng total RNA) was performed as previously described [30].

De novo discovery of sorghum miRNAs
For de novo prediction of potential miRNAs, we have used the miRDeep package [43]. As miRDeep does not take colorspace alignment as input, we had to reshape the output to miRDeep's blastparse format. Moreover, the SHRiMP alignment scores and the score used had to be recalculated to fit miRDeep's blastparse format. We used the same formula and method as described [57]. At this point, we also had to translate the color space two base encoding sequences into standard nucleotide base space sequences. As we considered only perfectly matching reads after the initial alignment to the genome, we could easily translate from color space to base space sequence format. The subsequent de novo calling of miRNAs was carried out as described [43,57].
Finally, the coordinates of de novo miRNAs that were predicted on the minus strand were corrected as miR-Deep refers the coordinates to the 5' end of the minus strand. Though, conventionally the coordinates refer always to the 5' end of the plus strand. We have also validated all potential new miRNAs according to the annotation criteria proposed by [44].

Target prediction and validation
We have used the novel miRNAs for a target prediction. Firstly, we compared the sequences to the unspliced transcripts of sorghum [26], with BLASTN using these parameters: -F F -W 7 -e 1 -q -2 -G -1. We scored each base of the alignment according to these criteria: match as 0; GU pairs as 0.5; gaps as 2; all other pairs were scored as 1. We doubled the score within the first 13 bases of the miRNA/alignment. We considered the gene as a potential target if the total score of the alignment was equal to or less than 8. The miRNA-mediated cleavage of mRNAs was performed through a modified procedure of the RLM-RACE protocol from Invitrogen. The sequences of the reverse primers used in the modified RACE are: Sb01g044240 (5' GCCCATATGGACGGAAGATA 3'); Sb02g007000 (5' CTGGTAGCCGGAGAACAACT 3') and Sb06g030670 (5' TTTCATCAGTGCTTGCCAAT 3'). The validation of predicted targets was performed in BTx623 or Rio cultivars only. Annotation of the miRNA gene targets was based on the Phytozome database http://www.phytozome.net.

Additional material
Additional File 1: Table S1 -25 nt hotspots in the sorghum genome.
Additional file 2: Frequency counts of small RNA reads for known microRNA families.  The miRNA abundances were used to calculate their relative fold change in expression between BTx623 and Rio, and between the LB/EF F2s and HB/LF F2s libraries, respectively. Positive values in the y-axis of the graph denote fold changes in miRNA expression that are higher in BTx623 relative to Rio and higher in LB/EF F2s relative to HB/LF F2s libraries, respectively; the opposite is true for negative values. The miRNA miR5383 was not included in the graph because it was not detected in the Rio library (see Additional file 9, Table S4).
Additional file 3: Predicted targets of miR169, miR172, and miR395. Table S3 provides a list of predicted target genes of miR169, miR172, and miR395.
Additional file 4: Targets of predicted for miR169, miR172 and miR395 microRNAs. Figure S1 displays an alignment between miR169, miR172 and miR395 microRNAs and their target sequences.
Additional file 5: Mapping of miR172-guided cleavage sites in predicted target genes. Figure S2 displays an alignment of miR172 with its target sequences and cleavage sites. The locations of the miRNAcleavage sites are indicated with downward arrows and the frequency of the cleavages are indicated as the number of clones for each RACE product with respect to the total clones sequenced.
Additional file 6: Pipeline for the de novo miRNA detection. Figure  S3 presents a diagram of computational steps involved in de novo miRNA detection. All reads from SOLiD sequencing were mapped in colorspace to the sorghum genome using SHRiMP. Perfect matching reads were clustered with Vmatch then filtered against the sorghum repeat sequences and compared with know sorghum miRNAs to classify them. The remaining sequences were taken for de novo miRNA prediction using miRDeep. Additional file 7: Hairpin structures of the newly discovered miRNAs. Figure S4 presents a collection of hairpin structures from newly discovered miRNAs. Sequences are depicted together with the frequency distribution of the small RNA reads aligned to the hairpin. The 2D hairpin structure produced by the miRDeep software is also shown.
Additional file 8: Predicted targets for the newly discovered miRNAs in sorghum. Figure S5 presents a list of alignments between new discovered miRNAs and their predicted targets in sorghum.
Additional file 9: Frequency counts of small RNA reads for new MIR genes. Table S4 displays a quantitative analysis of new MIR genes.