A computational-based update on microRNAs and their targets in barley (Hordeum vulgare L.)

Background Many plant species have been investigated in the last years for the identification and characterization of the corresponding miRNAs, nevertheless extensive studies are not yet available on barley (at the time of this writing). To extend and to update information on miRNAs and their targets in barley and to identify candidate polymorphisms at miRNA target sites, the features of previously known plant miRNAs have been used to systematically search for barley miRNA homologues and targets in the publicly available ESTs database. Matching sequences have then been related to Unigene clusters on which most of this study was based. Results One hundred-fifty-six microRNA mature sequences belonging to 50 miRNA families have been found to significantly match at least one EST sequence in barley. As expected on the basis of phylogenetic relations, miRNAs putatively orthologous to those of Triticum are significantly over-represented inside the set of identified barley microRNA mature sequences. Many previously known and several putatively new miRNA/target pairs have been identified. When the predicted microRNA targets were grouped into functional categories, biological processes previously known to be regulated by miRNAs, such as development and response to biotic and abiotic stress, have been highlighted and most of the target molecular functions were related to transcription regulation. Candidate microRNA coding genes have been reported and genetic variation (SNPs/indels) both in functional regions of putative miRNAs (mature sequence) and at miRNA target sites has been found. Conclusions This study has provided an update of the information on barley miRNAs and their targets representing a foundation for future studies. Many of previously known plant microRNAs have homologues in barley with expected important roles during development, nutrient deprivation, biotic and abiotic stress response and other important physiological processes. Putative polymorphisms at miRNA target sites have been identified and they can represent an interesting source for the identification of functional genetic variability.


Background
MicroRNAs (miRNAs) are a class of non-coding small RNAs with fundamental roles in key plant biological processes such as development, signal transduction and environmental stress response [1]. miRNAs act on gene regulation at post-transcriptional level, a phenomenon known in plants as PTGS (Post Transcriptional Gene Silencing), through sequence-based interaction with target mRNAs.
Many plant species have been investigated during recent years for miRNAs identification and characterization. The current information available on barley refers to two papers [2,3]. In particular, the paper of Dryanova et al. reports detailed information on both targets and miRNA coding sequences from Hordeum vulgare and for other members of Triticeae tribe, to which barley belongs [2]. However, extensive studies describing the organization of miRNA families, specifically in barley, are not yet available (at the time of this writing) and no miRNAs have been deposited in the publicly available miRNA database (miRBase, http://www.mirbase.org), this despite the economic importance of barley and its role as model species for Triticeae [4].
The conservation of miRNA sequences across species provides a powerful tool for the identification of novel miRNA genes based on homology with miRNAs previously described in other species. Search based on evolutionary conservation has allowed the identification of miRNA families in many plant species, including those where the complete genome sequence is not available, as it is currently the case of barley. Without genome sequence information a powerful alternative data source comes from ESTs (Expressed Sequence Tags): currently 501,616 ESTs are available in barley http://www.ncbi.nlm.nih.gov/ dbEST/dbEST_summary.html [5].
The identification of target genes is a fundamental step for the determination of the biological function of microRNAs, besides being an indirect evidence for their existence. Evolutionary conserved targets have proven very helpful to test the effectiveness of miRNA target detection. The perfect or near perfect complementarity between a miRNA and its target mRNA, that is a peculiar feature of plant miRNAs, gives a powerful tool for the identification of target genes through BLAST analysis of miRNA mature sequences vs EST/genomic sequences. A large part of the "in silico" predicted targets have then been confirmed as bona fide targets by experimental approaches including Northern, 5'-RACE and, more recently, degradome analysis via NGS (Next Generation Sequencing) [6,7].
The correct binding of miRNA to its cognate mRNA is critical for regulating the mRNA level and protein expression. This binding can be affected by singlenucleotide polymorphisms or indels in the miRNA target site leading to the suppression of existing binding sites or the generation of illegitimate ones. Therefore, small polymorphisms in miRNA targets can have a relevant effect on gene and protein expression and represent a type of genetic variability that can influence agronomical traits. As an example, overexpression of miR156b and miR156h in rice results in severe dwarfism, strongly reduced panicle size and delayed heading date [8].
To extend and to update information about miRNAs and their targets in barley and to identify candidate polymorphisms at microRNA target sites, barley EST sequences have been screened and related to Unigene clusters. UniGene is an experimental system for partitioning transcript sequences into a non-redundant set of gene-oriented clusters. Thus each UniGene cluster contains sequences that appear to come from the same transcription locus (gene or expressed pseudogene) http://www.ncbi.nlm.nih.gov/UniGene/index.html. Mining SNPs from ESTs allows the exploitation of genetic variability based on published sequences and the analysis of Unigene clusters can be very helpful for this purpose [9].

Barley miRNAs
Since only mature miRNA sequences rather than precursor sequences are conserved among plant species, mature miRNA sequences have been used as queries for BLAST search against Hordeum vulgare ESTs [10]. One hundred-fifty-six microRNA mature sequences belonging to 50 miRNA families have been found to significantly match at least one EST sequence in barley (the total number of matching ESTs was 855 -as reported in additional files 1 and 2) and could actually be related both to target or miRNA sequences, even if the probability is lower for the latter. Indeed the estimated frequency of pri-miRNAs in T. aestivum EST collection is as low as 0.003% [2].
The results illustrated above have been compared with those reported by Dryanova et al. where miRNAs and their targets have been searched in the Triticeae tribe [2]. Among the 33 miRNA families identified by Dryanova et al. in at least one species of the Triticeae tribe, 22 families were found in barley and 17 of them overlap with the present findings. Regarding barley, some miRNA families were found in just one of the two papers. Dryanova et al. found evidences for 5 additional miRNA families while the present work has found evidences in barley for miR390 and miR396 previously reported only in T. aestivum, and for additional 31 families not found by Dryanova et al. in anyone of the investigated species (i.e. miR442, miR529). The reasons for these discrepancies can be ascribed to the different miRBase release used (miRBase Release 8.0 for Dryanova et al., 2008 and miRBase Release 13.0 in the present work) and partially to differences in the BLAST settings adopted. Monocot-specific miRNAs (i.e. miR444) have also been found in both works [11].
Statistical analysis was employed to identify over and under-represented plant species from which the corresponding barley miRNA comes from. As reported in table 1 and 2, barley miRNA sequences putative orthologous to those of Triticum are significantly overrepresented in our data also when very stringent pvalue, e.g. 0.001, was used. Hordeum and Triticum genera are both members of the Poaceae family, Pooideae subfamily, Triticeae tribe. H. vulgare is often used as a model species for Triticeae, thanks to its diploid genome that could facilitate genome-wide searches of miRNAs.
Zea mays is also closely related to barley being part of monocot group and Poaceae family. Oryza sativa although is part of Poaceae family is under-represented, when a low stringent p-value (0.05) was used.
Some ESTs have matched to more than one miRNAs belonging either to the same family or to different families (additional file 3). The first case can be due to the high level of similarity among mature sequences from different members of the same family, while ESTs matching to different miRNA families could represent examples of multi-microRNA based control.
Transcripts targeted by more than one miRNA have also been found also in other plant species such as rice [12]. These findings are common in animals where many different miRNAs recognize the same target mRNA, usually at the 3'UTR [13].
To identify and annotate potential microRNA-regulated genes in barley, the 855 matching ESTs were related to Unigene clusters. Clusters annotated as protein-coding sequences were then selected for subsequent analysis and listed in tables 3 and 4. A total of 121 different Unigene clusters putatively representing the targets for 37 miRNA families has been found. Similar results (e.g. on average more than 1 putative target/ miRNA family) were reported by Zhang et al. in maize (115 target for 26 miRNA families) [14]. Sometimes different targets for a specific miRNA are members of the same gene family (e.g. miR156-SBP family), while in other cases there is no evident relationship among the putative targets of a given miRNA (e.g. miR1121). Previous studies report six targets or fewer for most Arabidopsis miRNAs, a number significantly lower than in animals, for example, in Drosophila each miRNA has on average over 50 predicted targets [13,15].
Although several of the candidate miRNA/target pairs here identified have the same functional annotation reported in previously studied species (table 3) and specifically in barley some putative novel microRNA/target pairs have been discovered (table 4) [2]. Actually, some of these novel targets were reported by literature as regulated by a different microRNA. Most of the novel miRNA/target pairs refer to miRNAs recently discovered and thus probably less studied (i.e. miR1120, miR1122, miR1134). The Argonaute-like protein found as a novel target for miR408 in H.vulgare by Dryanova et al. has been confirmed also in the present work.
Transcription factor families comprise most of the highly conserved miRNA targets (see table 3) such as SBP family for miRNA 156, AP2 family for miR172, GRAS family for miR171, myb family for miR159, GRF family for miR396 and ARF family for miR160. These results confirmed what previously observed in Triticeae and in other species [2]. In rice about 70% of conserved miRNA targets are transcription factors, while in wheat one-third of the predicted targets was found to encode for transcription factors [11,12]. Conserved miRNAs also target genes involved in their own biogenesis and function: as an example miR168 targets AGO1 which is part of the RISC complex responsible for the miRNAmediated mRNA cleavage [15]. miRNA regulate gene expression also by targeting enzymes of the ubiquitination pathway (ubiquitin conjugating enzyme E2 and TIR1/ubiquitin ligase): barley miR393, miR399, miR1128, miR1133, miR1135 can be considered putative regulators of gene expression at protein level. The number of target genes identified as different Unigene clusters (tables 3-4) is very different among the miRNA families. In rice Zhou et al. have found a high number of targets for miR156 and miR396 and a low number for miR162, miR167, miR395, miR398 and miR399 [12]. This finding could indicate that the former miRNAs are nodes in gene regulation networks, while the latter could act on specialized pathways.
The predicted targets have been grouped into functional categories and reported in figures 1 and 2 where the target annotations based on GO terms are shown. Biological processes known to be regulated by miRNAs, such as development and response to biotic and abiotic stress, have been highlighted both in known (figure 1a) and in novel targets (figure 2a). Moreover, most of the molecular functions are related to transcriptional regulation and DNA/nucleotide binding in both groups (figures 1b-2b). These findings suggest that the predicted target genes can be considered a reliable dataset to be used in subsequent analysis.
For some Unigene clusters the annotation was related to transcribed genes rather than protein coding sequences. These Unigenes could represent miRNAcoding genes as shown by other authors [16,17]. Table 5 reports the Unigene clusters candidate to encode miRNA coding genes on the basis of the precursor sequence secondary structure (MFEI >0.85, see Materials and Methods) and of the presence of the miRNA* (miRNA passenger sequence). It cannot be excluded that the clusters unable to fold with a miRNA-like structure (e.g. Hv.8579, Hv.11623) are false negatives for several reasons, such as truncated precursor sequences in EST database. Putative microRNA sequences have also been BLASTed against previously known precursors available from mirBASE: the analysis found similarities with 6 different miRNA families. The secondary structures of the putative microRNA precursors are reported in the additional file 4. Linking together sequences containing miRNA precursors from Dryanova et al. and from the present work, information on several micro-RNA putative secondary structures, belonging to 10 miRNA families are now available [2]. The mature miR-NAs predicted from these data are 18 to 24 nt long, with a higher frequency for 20 and 21 nt.

Genetic variation at miRNA target sites
A single nucleotide change in the sequence of a target site can affect miRNA regulation: as a consequence naturally occurring SNPs in target sites are candidates for relevant functional variations. Nair et al. established  a perfect association between a SNP at the miR172 targeting site and cleistogamy in barley [18]. Overall few papers have been published to date describing variations among plant genotypes at miRNAs and their target sites, while plenty of information is available for humans [19][20][21][22][23]. Genome-wide studies in humans have shown that the levels of polymorphism at miRNA and miRNA target sites are lower than at coding or neutral regions, however beneficial miRNA-target site polymorphisms also exist [19]. In this study, publicly available SNP data have been analyzed in context with miRNAs and their target sites. EST-derived SNPs can provide a rich source of biologically useful genetic variation due to the redundancy of gene sequence, the diversity of genotypes present in the databases and the fact that each putative polymorphism is associated with an expressed gene. Variations both in functional regions of putative miRNAs (mature sequence) and at miRNA target sites have been found. Previous works in human have highlighted a relatively low level of variation in functional microRNA regions and an appreciable level of variation at target sites [21].
Hv.5064, the candidate for miR1137 coding sequence, has been tested for modifications of pre-miRNA structure due to a base substitution in position 13 (C/G, table 6, figure 3). To evaluate the possible impact of this SNP on pre-miRNA secondary structure, Gibbs free energy (ΔG) and MFEI from each version of pre-miRNA were calculated using mfold program. Data in figure 3 show the structural variation obtained when moving from "C variant" to "G variant" with a higher MFEI for the second one and thus a greater stability of the molecule (miRNA-miRNA* pairing enhanced in the G variant). Difference in ΔG moving from C to G and vice versa were calculated according to Ehrenreich and Purugganan [19]. ΔΔG was +1.3 for the former change and -1.3 for the latter suggesting that some SNPs can stabilize/destabilize pre-miRNA structure. No target gene has been reported in literature for miR1137.
In plants most of the miRNA-based regulation relies on the cleavage of target mRNAs that normally occurs at the tenth nucleotide of the complementary region and numerous studies on miRNA-target interaction have highlighted the importance of positions 2 to 12, more frequently 10 and 11 [24]. Although most of the putative polymorphisms highlighted in this work are outside those critical positions, several examples of putative functionally relevant polymorphisms have been detected. Table 6 reports the putative polymorphisms detected after comparison among EST sequences inside Unigene clusters, without any selection against false positives. Some of these nucleotide variation could be due to sequencing errors or related to very similar genes belonging to a specific family, nevertheless when the SNPs/indels rely on two or more copies of independent sequences it can be considered a good candidate for a true positive polymorphic target site [25]. For example, a polymorphism in miRNA 408 target site detected by AutoSNP in contig 2094 (coding for a plastocyanin) is based on sequences from two different cultivars reporting the same allelic variant as part of a haplotype where a SSR (Simple Sequence Repeat) polymorphism is located upstream the target sequence ( figure 4). Some polymorphisms also showed an evolutionary conserved position, the nucleotide variation identified in Hv.2498 (targeted by miR393) has also been found in the orthologous gene of Arabidopsis in the same position by Ehrenreich and Purugganan [19].
The Squamosa-promoter Binding Protein (SBP) is a known target family for miR156. Many plant transcription Oryza sativa [39] 408 miR408 Hv.10831 ARPN (PLANTACYANIN), copper ion binding (Cu-bindlike superfamily) Medicago truncatula [43] Populus trichocarpa [44] miR408 Hv.24052 Plastocyanin-like domain-containing protein (Cu-bindlike superfamily) Oryza sativa [45] Hordeum vulgare [2] Triticum aestivum [2] miR408   factors involved in the regulation of the transition from the vegetative to the reproductive phase belong to this family and it has been shown that overexpressing SBP genes can lead to increased leaf initiation, decreased apical dominance and delayed flowering time [15]. The increase of the activity of some miRNAs (among which miR156) is part of the infection strategy performed by the Turnip mosaic virus in Arabidopsis [26,27]. miR156 performs a critical function in mediating developmental processes and it is also related to the response to biotic stress. The  screening of barley databases has identified two SBP genes targeted by miR156 for which two nucleotide variations occur in critical positions (11)(12). If these SNPs will be experimentally confirmed, they could have the effect of destabilizing the interaction between the miRNA and the mRNA, which could consequently avoids cleavage and lead to phenotypical variations in developmental features or in the resistance to viral infection. A SNP also occurs in a crucial point of the experimentally confirmed NAC1 target for miR164. NAC1 is a transcription factor involved in shoot apical meristem formation and auxin-mediated lateral root formation. Guo et al. showed that the overexpression of miR164 leads to reduced lateral rooting; conversely the disruption of the regulation mediated by this miRNA increases the number of lateral roots [28]. The authors have reported that miR164 directs cleavage in vivo at a position complementary to the 10 th nucleotide from the 5' end of the mature sequence [28]. The SNP found in barley is in the 11 th position, therefore it is likely to prevent the cleavage and produce phenotypic effects on root development.
SNPs have been identified also in other two conserved miRNA targets, TIR1 and AGO4, targeted respectively by miR393 and miR408. TIR1 is an auxin-receptor negatively regulated by miRNAs in response to bacterial flagellin, as a defence mechanism against Pseudomonas syringae [29]. AGO4 is a protein involved in the siRNA mediated gene silencing, and it is required for the resistance to the same pathogen [30]. Therefore, miR393 and miR408 are likely to work in a coupled manner during P. syringae infection. The two SNPs identified are in the 12 th position and could potentially alter the levels of pathogens resistance. SNPs were also found in previously not reported miRNA targets, such as the AWPM-19-like protein matching to the miRNA 1134. AWPM-19 accumulates in wheat plasma membrane during cold acclimation in response to abscisic acid [31]. If this miRNA really controls the synthesis of this protein, a deleterious SNP in the 11 th position could then change resistance to cold stress.

Conclusions
This study has thus provided an update of the information on barley miRNAs and their targets representing a foundation for future studies.
Novel putative target genes have been identified and most of them are involved in stress and hormone response. Indeed, the role of plant miRNAs in abiotic and biotic stress response as well as in auxin signalling is well known [32,33]. In particular, protein kinases such as protein kinase C and serine/threonine kinase, known to be important regulator on abiotic stress resistance, are largely present in novel microRNA/target pairs identified.
The results have also shown that microRNA target sites can be an interesting source for the identification of functional genetic variability, representing an interesting source of candidate molecular markers for application in barley breeding. Putative polymorphisms have now to be verified by amplification and sequencing of the target sequences on a larger set of genotypes.
Sequence analysis based on known miRNAs can obviously give insights only on conserved mRNAs and related targets. Future work will thus be based on the construction of a degradome library for parallel analysis of RNA end (PARE), a powerful approach for high-throughput identification/validation of conserved and non conserved targets.

miRNA reference dataset
The initial miRNA dataset has been obtained by extracting the mature sequences (1929 entries) of the Viridiplantae group from the miRBase release 13 http:// www.mirbase.org [34]. By removing identical mature sequences, the size of this dataset has been subsequently reduced to 1014 non-redundant sequences related to 468 miRNA families.

Searching for mature miRNAs matching sequences in barley
The full collection of non-redundant mature miRNA sequences was used in a BLASTn search against dbEST http://www.ncbi.nlm.nih.gov, accepting a number of mismatch lower than 4.
The set of miRNA mature sequences (including the identical sequences removed at the first step of the work) with at least one matching EST have been classified on the basis of the species of origin. The binomial distribution was used to assess the statistical significance for the represented plant species; this allowed identifying those species chosen from the initial dataset more or less frequently than random. Four different thresholds for the p-values were applied (0.05, 0.01, 0.005, 0.001).
Matching ESTs have then been related to Unigene clusters and the corresponding annotations were recorded (if available). The GO slimmer tool available on the Gene For each cluster, the table shows details about the putative precursors: the free energy ΔG, the minimal folding free energy index (MFEI), the number of mismatches in miRNA/miRNA* duplex (NM), the mature length (ML), the precursor length (PL) and the location of mature miRNA (3' or 5'). Moreover, it is also reported the more similar known precursor in miRBase, with the alignment score and p-value.   protein coding # C(T) AACATGATATCAGAGCCA UGGCUCUGAUAUCAUGUUG (19) Letters in bold refer to SNPs represented by at least two independent sequences, while the numbers in brackets refer to the position of the SNP in the sequence. Plus means insertion, minus means deletion. Figure 3 Predicted secondary structures of the two variants of the miR1137 precursor identified in the Unigene cluster Hv.5064. The variants with a C and a G in the 13 th position are respectively reported in the left and in the right side of the figure. The table shows for each variant: the free energy ΔG, the length of the precursor, the GC content, the MFEI (Minimal Folding Energy Index), the number of mismatches between the mature sequence and the paired miRNA* passenger and the arm of the hairpin where the mature sequence is located.