High-throughput sequencing of Medicago truncatula short RNAs identifies eight new miRNA families

Background High-throughput sequencing technology is capable to identify novel short RNAs in plant species. We used Solexa sequencing to find new microRNAs in one of the model legume species, barrel medic (Medicago truncatula). Results 3,948,871 reads were obtained from two separate short RNA libraries generated from total RNA extracted from M. truncatula leaves, representing 1,563,959 distinct sequences. 2,168,937 reads were mapped to the available M. truncatula genome corresponding to 619,175 distinct sequences. 174,504 reads representing 25 conserved miRNA families showed perfect matches to known miRNAs. We also identified 26 novel miRNA candidates that were potentially generated from 32 loci. Nine of these loci produced eight distinct sequences, for which the miRNA* sequences were also sequenced. These sequences were not described in other plant species and accumulation of these eight novel miRNAs was confirmed by Northern blot analysis. Potential target genes were predicted for most conserved and novel miRNAs. Conclusion Deep sequencing of short RNAs from M. truncatula leaves identified eight new miRNAs indicating that specific miRNAs exist in legume species.


Background
Gene expression is regulated at several layers in plants to ensure optimal temporal and spatial accumulation of proteins. One of the latest discovered regulatory layers involves short RNA (sRNA) molecules 21-24 nucleotides in length that act post-transcriptionally [1]. There are surprisingly many different sRNAs in plant cells indicating an extensive role for these molecules [2]. Plant sRNAs are produced from double stranded RNA (dsRNA) by one of the four Dicer-like proteins (DCL1-4). The different DCL proteins process dsRNAs generated by diverse pathways [1]. MicroRNAs (miRNAs) are produced from partially complementary dsRNA precursor molecules (pre-miRNA) [3]. Pre-miRNAs are originally single stranded RNAs with hairpin structures and recognized by DCL1 [4]. The other large class of plant sRNAs is small interfering RNAs (siR-NAs). siRNAs are processed from dsRNAs usually generated by one of the RNA Dependent RNA Polymerases (RDRs). Trans-acting siRNA (ta-siRNAs) precursors are generated by RDR6 [5,6] and heterochromatin siRNA pre-cursors are made by RDR2 [7]. Another group of siRNAs, the natural antisense siRNAs (nat-siRNAs), are processed from dsRNA produced by overlapping antisense mRNAs [8].
MiRNAs are the best characterized sRNAs in plants [9]. The primary transcript (pri-miRNA) is transcribed by RNA polymerase II and contains an imperfect hairpin structure. DCL1 trims this hairpin structure producing the pre-miRNA and then a second cleavage by DCL1 produces the miRNA/miRNA* duplex [10]. This molecule has a two nucleotide 3' overhang at each side of the duplex and contains a few mismatches [9]. One of the strands of the miRNA/miRNA* duplex is integrated into RISC (RNA induced silencing complex). This strand is called mature miRNA and the partially complementary miRNA* strand gets degraded, although in most cases the miRNA* strand also accumulates at a lower level [9]. RISC finds specific mRNAs because the incorporated mature miRNA can anneal to partially complementary target sites [3]. Target sites show near perfect matches to plant miRNA sequences and initially it was thought that all target mRNAs are cleaved by RISC [3]. Recently it was shown that the translation of plant mRNAs is also suppressed without a cleavage [11].
Most plant miRNA families have been identified by traditional Sanger sequencing method in model species with known genome sequences (Arabidopsis, rice and poplar) and most miRNAs are conserved across plant families [12]. However, some miRNAs are species/family specific and Allen et al. [13] suggested that these "young" miRNAs have evolved recently, in contrary to the conserved miR-NAs ("old" miRNAs). Since non-conserved miRNAs are often accumulated at a lower level than conserved miR-NAs, traditional small-scale sequencing primarily reveals conserved miRNAs. Establishment of high-throughput technologies has allowed the identification of several non-conserved or lowly expressed miRNAs through deep sequencing, e.g. in Arabidopsis, wheat and tomato [14][15][16][17]. Here we describe the deep sequencing of short RNAs extracted from M. truncatula leaves and the experimental validation of eight novel miRNAs.

Deep sequencing of M. truncatula short RNAs
Two separate cDNA libraries of short RNAs were generated from Medicago truncatula leaves and both libraries were sequenced by Solexa (Illumina). The first PCR product was quantified by nanodrop and 5 pM was loaded to Solexa, which yielded 5942 clusters and 872,048 sequence reads. The second sample was also quantified on polyacrylamide gel because the capacity of Solexa is higher than this. Since the nanodrop gave an approximately 50 times higher concentration than quantification on a gel we concluded that the nanodrop overestimates the concentration of the PCR product. To obtain more sequences, we loaded four times of the amount that we would have loaded based only on the nanodrop reading and this yielded 23301 clusters and 3,076,823 sequence reads. It is worth mentioning that loading based on only quantification on gel would have led to overloading the Solexa and would have produced many but unreliable sequences.
The two sets of reads were combined and analysed together ( Table 1) using the miRCat pipeline we have developed earlier [18]. The size distribution of sequence reads showed that the 24 nt class was the most abundant group of sRNAs followed by the 21 nt sequences and then the 23, 22 and 20 nt reads ( Figure 1). The almost four million reads represented 1,563,959 distinct sequences suggesting that the library is still not saturated. Out of the four million reads 2,168,937 matched to the genome without any mismatches, representing 619,175 sequences.

Conserved miRNAs
First we looked for known miRNAs by comparing our library to known miRNAs from other plant species. 174,504 reads corresponding to 25 conserved miRNA families showed perfect matches to known miRNAs. We analysed the number of reads for conserved miRNAs and miR156, 159 and 166 were represented most frequently in the library ( Table 2). Allowing one or two mismatches between sequences in our library and sequences in miR-Base increased the number of conserved miRNA families in M. truncatula to 31 ( Table 2). Next we predicted target genes and putative targets were identified for 27 out of 31 conserved families (Additional File 1). No targets were found in M. truncatula for miR163, miR169, miR394 and miR894. We found homologs of known miRNA target genes for several conserved M. truncatula miRNAs, such as SBP for miR156, NAC for miR160, AGO1 for miR168, bZIP for miR165, GRAS for miR171, AP2 for miR172 and low affinity sulphur transporter for miR395. On the other hand we predicted many genes with unknown function and hypothetical genes for miRNA targeting (Additional File 1) and careful analysis of these potential targets will contribute to our understanding of the role of miRNAs in legume plants.

Novel miRNAs
The flanking regions of the 2,168,937 sequence reads matching the genome were subjected to secondary structure analysis. We used the strict criteria suggested by Jones-Rhoades et al. [9] to identify potential miRNA loci. 32 short RNA producing loci corresponding to 26 sequence reads could be folded into step-loop structures (Additional File 2). Nine of these loci produced eight dis-tinct sequences, for which we also sequenced the miRNA* sequences (Table 3). These eight sequences were considered as novel legume specific miRNAs based on Rajagopalan et al. [15], who demonstrated that loci producing sequenced mature miRNAs and miRNA* sequences are miRNA loci. The other 23 loci producing 18 sRNAs were considered as putative new miRNA genes since at the moment there is not sufficient evidence to classify them as miRNA genes. Accumulation of all eight new miRNAs and most miRNA*s were validated by Northern blot analysis ( Figure 2).
We also predicted target genes for putative miRNAs (Additional file 3) and the new miRNAs with sequenced miRNA* (Additional file 4). Five out of the eight new miRNAs have potential targets in the available M. trunca-Size distribution of sequenced short RNAs Figure 1 Size distribution of sequenced short RNAs.  generates shorter reads (up to 35 bp) but yields 1-3 million reads per sample [22]. The other very high-throughput technique, massively parallel sequencing (MPSS), gives more reads than Solexa but the reads are even shorter, only 17 bp [23]. miRNAs are only 21-23 nt sequences therefore even MPSS can identify them, although this technology generates shorter than 21 bp reads. Because of this it is not the ideal choice for miRNA discovery but it is useful for profiling conserved miRNAs. We decided to use the Solexa platform to test whether new miRNAs can be found in the model legume M. truncatula.
We found the quantification of PCR product problematic because the nanodrop overestimated the concentration and using quantitative markers on polyacrylamide gel seemed to underestimate the concentration. The difference between the two approaches was 50 fold. Loading 5 Figure 2 Expression of new miRNAs. Total RNA from stems and leaves of M. truncatula was analysed by Northern blot. The numbers correspond to ID numbers in Table 3. Size markers (19 and 24 nt RNA oligonucleotides) are shown on the left. The membranes were also hybridised with a probe specific to U6 [17]. Number of reads are shown for sequences that are similar to conserved miRNAs identified in other species. Figures are shown separately for sequences with exact matches and length, exact matches but shorter or longer than known miRNAs and for sequences with one or two mismatches to known miRNAs.   048 reads), well below the capacity of Solexa. However, loading 50 times more would have given poor quality sequences due to overloading. We decided to load four times of the amount suggested by the nanodrop and this approach yielded sufficient number (more than 3 million) and good quality reads. Many short sequences did not map to the genome but it is not known what the explanation is for that. One possibility is that the rate of sequencing error is high and many short reads contains mistakes. The other possibility is that there are polymorphisms between the species we used and the one that was used for the genome sequencing. It is also possible that plants used in these experiments were infected symptomless pathogens and genomes of those organisms contaminated our libraries.

Expression of new miRNAs
First we searched for conserved miRNAs in the combined, almost four million reads from the two separate sRNA libraries. MiRNAs showing perfect matches to 25 conserved families were found in our combined library, which is more than was identified using bioinformatics approaches [24][25][26]. Allowing up to two mismatches further increased this number to 31 families. This work experimentally verified all conserved miRNAs predicted by [ [24,25] and [26]]. However, none of the predicted new miRNAs [26] were found and the eight new miRNAs we found were not predicted by any of the three studies [ [24,25] and [26]] illustrating the benefit of the sequencing approach. Target prediction for conserved miRNAs found several genes showing homology to genes validated in other species. However, several predicted targets are new without known function indicating that more work is required to elucidate the role of miRNAs in legumes.

New miRNAs
Validation of non-conserved miRNAs can be achieved through several ways. One possibility is providing evidence of biogenesis characteristic of miRNAs. This involves the identification of stem-loop structure of the potential pre-miRNAs and sequencing of the miRNA* molecules [9]. In the absence of evidence for biogenesis, functional data can complement the predictions based on structural analysis. MiRNA mediated cleavage is located in a specific site within the target site and this can be identified by 5'RACE analysis. We have found eight new nonconserved miRNAs supported by biogenesis data since the miRNA* strands were sequenced and detected by Northern blot analysis. We also found 18 other putative miR-NAs based on structural data. However, we did not sequence the miRNA* strands for these and we do not have functional proof therefore at the moment these are only putative miRNAs.
The role of miRNAs in development and abiotic stress response is well documented. Several transcription factors involved in development are targeted by miRNAs [27].
Other miRNAs play a role in nutrient assimilation and responses to drought, cold and other abiotic stresses [1,28]. Reports of miRNAs involved in biotic stress response are less common. Navarro et al [29] showed that bacterial flagellin expression induced miR393 and F-box auxin receptor genes were regulated by miR393. Two of the new miRNAs are predicted to target disease resistance genes suggesting a more extensive role for miRNAs in biotic stress response. Targets of the other new miRNAs also include beta-glucan-binding protein, peptidyl-prolyl cis-trans isomerise and a hypothetical protein. The biological importance of the potential regulation of these genes by miRNAs needs further investigation.

Conclusion
High-throughput sequencing analysis of sRNAs from M. truncatula leaves identified eight new miRNAs, which were not found in other species. Sequencing sRNAs from specific tissues of legume plants by deep sequencing is expected to reveal more new miRNAs. The first column (sRNA) shows the ID number of the sequences. The second and third columns show the genomic location of the putative miRNAs (Chr: chromosome; Start: starting position on the chromosome). The number of reads are indicated in the fourth column and the sequences themselves are listed in the fifth column. The results of the Northern blot analysis are shown in the sixth column (+: detected; -: not detected). Whether the miRNA* sequence was sequenced or not is indicated in the last column. riod of 25/18°C day/night and humidity of 40%. Small RNA fraction between 19-24 nt was isolated from 15% denaturing polyacrylamide gel and 15 μg was ligated to adaptors without de-phosphorylating and re-phosphorylating [30]. The short RNAs were converted to DNA by RT-PCR and the DNA was sequenced on a Solexa machine (Illumina). 15 μg of RNA extracted from M. truncatula leaves was analysed by Northern blot as described by Pall et al. [31].
The sequences can be found in GEO under GPL7704 platform (High-throughput sequencing of small RNAs in Medicago truncatula), series GSE13761, samples GSM346592 and GSM346593.