What do all the (human) micro-RNAs do?

Background Micro-RNAs (miRNA) are attributed to the systems biological role of a regulatory mechanism of the expression of protein coding genes. Research has identified miRNAs dysregulations in several but distinct pathophysiological processes, which hints at distinct systems-biology functions of miRNAs. The present analysis approached the role of miRNAs from a genomics perspective and assessed the biological roles of 2954 genes and 788 human miRNAs, which can be considered to interact, based on empirical evidence and computational predictions of miRNA versus gene interactions. Results From a genomics perspective, the biological processes in which the genes that are influenced by miRNAs are involved comprise of six major topics comprising biological regulation, cellular metabolism, information processing, development, gene expression and tissue homeostasis. The usage of this knowledge as a guidance for further research is sketched for two genetically defined functional areas: cell death and gene expression. Results suggest that the latter points to a fundamental role of miRNAs consisting of hyper-regulation of gene expression, i.e., the control of the expression of such genes which control specifically the expression of genes. Conclusions Laboratory research identified contributions of miRNA regulation to several distinct biological processes. The present analysis transferred this knowledge to a systems-biology level. A comprehensible and precise description of the biological processes in which the genes that are influenced by miRNAs are notably involved could be made. This knowledge can be employed to guide future research concerning the biological role of miRNA (dys-) regulations. The analysis also suggests that miRNAs especially control the expression of genes that control the expression of genes. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-976) contains supplementary material, which is available to authorized users.


Background
Micro ribonucleic acids (miRNAs), first described in 1993 [1], have been recognized as a major player in cellular regulation by conferring RNA interference [2]. MiRNAs are initially transcribed from host genes as longer primary transcripts or pri-miRNAs, from which shorter approximately 70 nucleotide-long pre-miRNAs are excised by the RNase III enzyme "Drosha" [3], pri-miRNA transcripts may code for more than one miRNA [4]. Pre-miRNAs are exported from the nucleus to the cytoplasm by the RNA-binding protein exportin 5 [5]. There, they are cleaved to the~22 nucleotides-long mature miRNAs by the endoribonuclease "Dicer" [6]. Mature miRNAs impede gene translation by binding at complementary messenger RNA sequences, thereby initiating mRNA cleavage or obstructing mRNA incorporation in ribosomes.
More than 2000 human miRNAs have been identified [7,8], potentially regulating the transcription of the 21,000 human protein-encoding genes [9]. Research during the last decade [10,11] identified miRNAs dysregulations in several pathophysiological processes [12] such as cancer [13], cardiovascular diseases [14], viral infections [15] and pain [16]. In these and further context, miRNAs have been repeatedly found to modulate a wide range of physiological functions such as cellular differentiation, proliferation and apoptosis [17]. This suggests that miRNAmediated control targets a range of typical biological processes hinting at a distinct systems-biology function of miRNAs.
The present analysis approached the role of human miRNAs from a genomics perspective and assessed the biological roles of those genes that can be considered to interact with miRNAs, based on empirically evidence [18,19] or computational prediction [20]. Computational methods, publicly available databases and data mining tools (Table 1) were used to combine the knowledge about miRNA versus gene interactions with the acquired knowledge about higher-level organization of gene products into biological pathways [21], of which the gold-standard is the Gene Ontology (GO) knowledge base [22].

Empirical validated miRNA/gene interactions
The genes likely to be regulated by miRNAs were identified by connecting several lines of evidence using publicly available computational methods, databases and data mining tools (Table 1). A first source of miRNA regulated genes consisted of empirically shown interactions of miRNA with genes. The majority of genes with empirical evidence for interaction with a miRNA was identified from miRTarBase database [18] that hosts the currently largest amount of experimentally validated miRNA versus target interactions. From this database the miRNA versus gene interactions were used for which strong experimental evidence was indicated, which in this database was defined as being provided in the form of reporter assays or western blots (file: miRTarBase_SE_WR.xls, Release 4.5 from http://mirtarbase.mbc.nctu.edu.tw/php/download.php). This gave a set of n = 360 different miRNAs acting on n = 1472 different genes. Additional miRNA regulated genes were queried from the TarBase database [19] that hosts further experimentally validated miRNA-gene interactions. In that database, experimentally validated, or supported, interactions are derived from specific, as well as high throughput experiments, such as microarrays and proteomics (for full details, see http://diana.cslab.ece.ntua. gr/?sec=home). From this database the reported direct interactions were used. This gave a set of n = 136 different miRNAs acting on n = 798 different genes. The size of unions and intersections of these gene sets are given in Figure 1.

Computational prediction of miRNA/gene interactions
To reduce the impact of a possible research bias on the results, a second source of miRNA regulated genes was added from a computational prediction of miRNA regulated genes. A sufficiently credible prediction of miRNA regulated genes was obtained by querying the TargetScan Human software (version 6.2 [20]) for all human miR-NAs known to this database. To obtain valid predictions an intensive correction against false positive predictions was performed. Considering the complexity of computational identification of miRNA targets [4,27], a subsequent analysis of the distribution of the output of TargetScan, the so called "Total Context + scores" (TCP scores) [28,29] was performed. To minimize the risk of false positive predictions, this distribution was compared with the scores for empirical validated miRNA targets and only those interactions were kept for which a probability of more than 98% for a valid interaction could be derived (Additional file 1). This filtering reduced the n = 14610 unique genes and n = 1539 human miRNA for which TargetScan predicted a miRNAa interaction to only n = 1355 genes and n = 548 human miRNA for which the computer prediction is sufficiently reliable. The union of the miRNA form empirical evidence and filtered computational predictions resulted in n = 788 different human miRNAs with interactions on n = 2954 different genes (Additional file 2: Table S1).

Biological roles of miRNA regulated genes
To assess the role of miRNA regulation, the biological roles of the genes were identified based on the Gene Ontology (GO) knowledgebase [22] where the knowledge about genes is formulated using a controlled vocabulary of GO terms (categories), to which the genes [30] are annotated [31]. GO terms are related to each other by "is-a", "part-of" and "regulates" relationships forming a polyhierarchy (i.e., a directed acyclic graph (DAG [32], knowledge representation graph). Particular biological processes, cellular localizations or molecular functions annotated to the miRNA-regulated genes were found by means of an overrepresentation analysis (ORA [33]) using the web-based GeneTrail [24] tool. This tool calculated the significance of the occurrence of the genes of the set of miRNA regulated genes at each term of the GO with respect to the expected occurrence of the genes given by all GO annotations. Statistical significance (p-values) was calculated by the GeneTrail program by applying Fisher's exact test with Bonferroni α correction [34]. The result was a representation of the complete knowledge about the biological roles of miRNA-regulated genes (complete DAG).
To perform this information more intelligible, functional abstraction [35] was applied identifying a special set of GO terms, i.e., "functional areas", that represent the knowledge contained in the complete DAG at a maximum of coverage, certainty, information value and conciseness [35]. Finally, for GO terms describing biological processes the functional areas could be subsumed to topics to further enhance the conciseness of the description.
To assess the validity of the GO overrepresentation analysis (ORA) in a ten-fold repeated experiment n = 3000 genes were randomly chosen from the set of all n = 17794 genes for which GeneTrail contained annotations. For a p-value threshold of t p = 0.05 and Bonferroni α correction none of these gene sets produced any significant go term. It could be observed that a small subset of miRNA interacts with many, i.e. up to 229, genes and on the other hand a large subset of miRNA (n = 304 of the n = 788 miRNA) interacts only with one gene. To address a potential bias of this unequal distribution the set of n = 788 miRNA was split into two separate subsets A and B. Set A contains 23% (n = 181) miRNA which interact with 75% of the n = 2954 genes. Set B contained the other miRNA that interacted with only a few (n <6) genes. Set A produced the same set of functional areas as the set of all n = 2954 genes with a median p-value of 1.0 · 10 −38 . Set B reproduced the functional areas of the set of all genes (median p-values of 1.0 · 10 -13) with the exception of "biological adhesion" and "response to stimulus" (details given in the supplement).

Results
The analysis of the biological roles by the human miRNA regulated genes could be based on a total of 2954 genes obtained by unifying ( Figure 1) the evidence-based sets of miRNA-interacting genes of n = 1472 queried from the miRTarBase database [18] and n = 898 queried from TarBase database [19]. This set of empirical evidences was augmented by n = 1355 genes obtained by computational prediction on the basis of the output of TargetScan [20] Figure 1 Venn diagram [26] visualizing the sets of genes and the sizes of their intersections. The present analysis was based on the miRNAs that resulted as the union of the three sources, i.e., evidence-based miRNA interacting genes from the miRTarBase database [18] evidence-based miRNA interacting genes from the TarBase database [19] and computationally predicted miRNA regulated genes based on an analysis using the TargetScan Human [20] software (for details of the prediction method, see appendix).

Discussion
Published literature attributes miRNAs to a systems biological role by a direct regulatory mechanism on classic protein coding genes, mainly by RNA interference impeding gene translation via destabilizing messenger RNA transcripts [36]. Considering that a miRNA may target Table 2 Functional areas (GO terms of the category "biological process"), topically sorted (left column), of the genes interacting with miRNAs, i.e., for which a gene versus miRNA interaction has been experimentally shown, sorted for the number of genes included  [23]) resulted from over-representation analysis (ORA) of the 2945 genes with experimentally shown or computationally predicted miRNA interaction that were annotated to the GO category "biological process".

Figure 2
Directed acyclic graph (DAG [32]) representing the nested Gene Ontology (GO) classification showing the polyhierarchy of functional annotations (GO terms) assigned in the GO category "biological process" to the 2954 genes ( Figure 1) that supported by empirical evidence from the miRTarBase [18] or TarBase [19] databases or computationally predicted using the TargetScan Human [20] software interact with miRNAs. The figure is based on the GeneTrail web-based analysis tool [24] and represents the results of an over-representation analysis with parameters p-value threshold, t p = 1.0 10 −5 and Bonferroni α correction. The figure shows a particular aspect of the polyhierarchy, namely the ORA for the n = 243 (8% of all miRNA regulated genes) that were annotated with the GO term "cell death". Significant terms are shown as red colored or framed ellipses, with the number of member genes indicated in line three, the expected number of genes in line five and the significance of the deviation between the two numbers given as minus log10 p. The functional area (Table 2) is indicated in yellow, and the leaves of this polyhierarchy at the select p-value threshold are shown in blue indicating the most specific significant GO terms. The vertical succession reflects the height of the terms in the GO polyhierarchy.

Figure 3
Directed acyclic graph (DAG [32]) representing the nested Gene Ontology (GO) classification showing the polyhierarchy of functional annotations (GO terms) assigned in the GO category "biological process" to the 2954 genes ( Figure 1) that supported by empirical evidence from the miRTarBase [18] or TarBase [19] databases or computationally predicted using the TargetScan Human [20] software interact with miRNAs. The figure is based on the GeneTrail web-based analysis tool [24] and represents the results of an over-representation analysis with parameters p-value threshold, t p = 1.0 10 −5 and Bonferroni α correction. The figure shows a particular aspect of the polyhierarchy, namely the ORA for the n = 554 (one fifth of all miRNA regulated genes) that were annotated with the GO term "gene expression". Significant terms are shown as red colored or framed ellipses, with the number of member genes indicated in line three, the expected number of genes in line five and the significance of the deviation between the two numbers given as minus log10 p. The functional area (Table 2) is indicated in yellow, and the leaves of this polyhierarchy at the select p-value threshold are shown in blue indicating the most specific significant GO terms. The vertical succession reflects the height of the terms in the GO polyhierarchy.
many different genes and vice versa, a gene may be targeted by several different miRNAs [37,38], the~2000 miRNAs identified for Homo sapiens [7,8] may potentially regulate the transcription of all 21,000 human proteinencoding genes [9] and thus be involved in any biological process known to the GO database. However, the present analysis suggested that only a seventh of the human genes seem to be miRNA regulated. Moreover, while the analysis also suggested that this regulation might be involved in any biological process, which is supported by the absence of under-represented GO terms in the ORA, the observed significant over-representation of GO terms clearly indicates that miRNAs play distinct biological roles, which exceed a general evenly-distributed function in gene regulation. In the present work, a precise and comprehensive view of the systems biological role of miRNAs was obtained via analyzing the functions of a set of genes supported by published evidence for direct miRNA interaction [18,19] combined with a trustworthy computational prediction of miRNA interactions. Using the knowledge about the biological processes, cellular localizations and molecular functions related to genes in the Gene Ontology (GO) knowledge base, the analysis provided a complete and precise description of the involvement of miRNAs in particular physiological and pathophysiological processes. The identification of these distinct roles, represented by functional areas (Tables 2 and 3), was a major finding of this analysis. These functional areas can be considered as a primary answer to the question "What do all those miRNAs do?" from a genomics point of view. Moreover, a further finding of this analysis was, that miRNAs, while exported from the nucleus as pre-miRNAs and in the cytosol processes to mature miRNAs where they exert their RNA interfering function, importantly regulate genes with products acting in the nucleus. Table 3 Functional areas (GO terms of the categories "cellular component" and "molecular function") of the genes interacting with miRNAs, i.e., for which a gene versus miRNA interaction has been experimentally shown, sorted for the number of genes included Significant and remarkable gene ontology (GO) terms (for definition see the AmiGO search tool for GO [23]) resulted from over-representation analysis (ORA) of the 2945 genes with experimentally shown or computationally predicted miRNA interaction.

Figure 4
Directed acyclic graphs (DAG [32]) representing the nested Gene Ontology (GO) classification showing the polyhierarchy of functional annotations (GO terms) assigned in the GO categories "cellular component" and "molecular function" (right) to the 2954 genes ( Figure 1) that supported by empirical evidence from the miRTarBase [18] or TarBase [19] databases or computationally predicted using the TargetScan Human [20] software interact with miRNAs. The figure is based on the GeneTrail web-based analysis tool [24] and represents the results of an over-representation analysis with parameters p-value threshold, t p = 1.0 10 −5 and Bonferroni α correction. Significant terms are shown as red colored or framed ellipses, with the number of member genes indicated in line three, the expected number of genes in line five and the significance of the deviation between the two numbers given as minus log10 p. The GO category is indicated in yellow, and the leaves of this polyhierarchy at the select p-value threshold are shown in blue indicating the most specific significant GO terms.
regulation and DNA binding. The cellular components where the products of the miRNA regulated genes are located, were more often than expected found in the nucleus. When considering the definition of the GO term "gene expression" (GO:0010467) as the biological processes in which a gene's genomic sequence is converted into a mature gene product or products (proteins or RNA) from the production of an RNA transcript, the processing toward a mature RNA and the translation into proteins [23], miRNA-regulation covers it completely. Thus, miRNA control applies in particular to the expression of genes that control the expression of genes, which we propose as "hyper-regulation" (Figure 5). The accepted role of miRNAs is the steering (inhibition) of the abundance of gene products, which is mechanistically Figure 5 Proposed "hyper-regulation" of gene expression by miRNAs. The figure shows the role of miRNAs in the complex transcriptional network (blue arrow). By regulating (blue arrow) the expression of genes that are involved in the regulation of the expression of genes, a miRNAdependent regulatory mechanism of gene regulation is formed on top of the miRNA-independent regulation of gene expression (green arrow). By this regulatory mechanism, proposed as "hyper-regulation" of gene expression (blue arrow), miRNAs interfere with the whole transcriptome mainly including intranuclear mechanisms besides the well-known extranuclear (red arrow) mechanisms. Hyper-regulation accommodates observations of global gene down-regulation in the absence of miRNAs [47] which downregulate gene product that reduce gene transcription such as DNA methyltransferases (Additional file 2: Table S1. RegulatedGenes_vs_miRNAs_Matrix.xlsx). exerting its functional infraction mainly in the cytoplasm. Hyper-regulation adds to mechanisms of gene expression control. It points at so far unappreciated increased complexity of gene expression control exceeding current paradigms. It can be hypothesized that miRNA mediated control represents an ancient major mechanism of cellular control providing small versatile molecules at comparably less metabolic effort for respective synthesis compared to protein translation. These systems are being found at all levels of gene expression from transcriptional finetuning. This was shown for the transcription activator Ets-1 where variable phosphorylation serves to finetune transcription at the level of DNA binding [44], the increasingly populated system of non-protein-coding regulatory RNAs increasing the diversity of control of genome dynamics and developmental programming [45], and the tight control of p53 as "guardian of the genome" shown to be closely regulated by miR-34a [46]. When considering that regulatory mechanisms may also repress genes that repress gene expression, such as all three DNA methyltransferases (DNMT 1, 3a and 3b; Additional file 2: Table S1. RegulatedGenes_vs_miRNAs_Matrix.xlsx), present findings also accommodate observations of genes being down-regulated following the deletion of dicer and thus abolishing the presence of miRNAs [47].
Based on the broad basis of current knowledge, the present data mining and computer science-based approach extends laboratory approaches to the role of miRNAs human biology. However, the analyses relied on external information and therefore, crucially depended on the accuracy and completeness of the empirical evidence entered into the queried databases. Limitation of possible research or publication bias was attempted by adding computational predicted miRNA/gene interactions (TargetScan), which were conservatively filtered to reduce false positives. The consequences of this have been discussed above, revealing that parts of the results cannot exclude a research bias whereas other parts such as the hyperregualtion of gene expression prevail regardless of the source of miRNA versus gene interactions. While the intention to exclude false positives nevertheless required conservative statistics throughout all analyses, the procedure might have triggered underestimations of the number of miRNAs versus gene interactions which could affect the results.
Finally, the present computational approach to the role of miRNAs emphasizes the increasing use of bioinformatics in the interpretation of miRNA functions. This accommodates the vast complexity of the acquired information about the role of miRNAs in biology and pathophysiology that probably exceeds human comprehension. Therefore, advances in research increasingly require computer science. This has been shown, for example, in two recent reports where current knowledge from databases was included in generating the research results via computational means. Specifically, the biological role of miRNAs found by array analyses in regenerating lungs was approached using integrative systems biology assessments including a GO analysis [48]. Interestingly, although this research was aimed towards the role of miRNAs in lung injury and tissue regeneration, one of the results was, that the GO term "gene expression" appeared as an important functional area of those genes that are influenced by the miRNAs particular identified in that experiments (see Figure six in [48]. Thus, the result that miRNAs seem to preferentially regulate genes that regulate the expression of genes obtained presently seems to appear in other analysis on a completely independent data basis as well, supporting its generality and improbability to merely present a bias in the presently queried evidence based miRNA versus gee interactions, which is further supported by the above-mentioned persistence of this results in the computationally predicted miRNAs regulated genes. A further recent example of the utility of computational biology is successful prediction of survival of glioblastoma patients by analyzing the inter-relation between miRNA and gene expression [49].

Conclusions
Laboratory research identified contributions of miRNA regulation to several distinct biological processes. The present analysis transferred this knowledge to a systemsbiology level. A comprehensible and precise description of the biological processes in which the genes influenced by miRNAs are notably involved was obtained. This identified seven different topics subsuming 17 functional areas for the genetic role of miRNA regulations: biological regulation, cellular metabolism, information processing, development, gene expression and tissue homeostasis. The present analysis explicitly intended to exploit all the current knowledge about miRNAs versus gene interactions and about the function of genes. This includes the knowledge gathered in databases and the computational means to make predictions. Indeed, the use of knowledge from different sources, when analyzed separately such as for the regulation of genes that regulate the expression of genes, agreed between empirical and predicted interactions, however, bears the potential of disagreements which need to be addressed in the laboratory. Therefore, the knowledge that has emerged from the present analysis can be employed to guide future research concerning the biological role of miRNA (dys-) regulations.

Additional files
Additional file 1: An appendix with the detailed description of computational prediction of miRNA versus gene interactions.