SNOntology: Myriads of novel snornas or just a mirage?
© Makarova and Kramerov; licensee BioMed Central Ltd. 2011
Received: 23 March 2011
Accepted: 3 November 2011
Published: 3 November 2011
Skip to main content
© Makarova and Kramerov; licensee BioMed Central Ltd. 2011
Received: 23 March 2011
Accepted: 3 November 2011
Published: 3 November 2011
Small nucleolar RNAs (snoRNAs) are a large group of non-coding RNAs (ncRNAs) that mainly guide 2'-O-methylation (C/D RNAs) and pseudouridylation (H/ACA RNAs) of ribosomal RNAs. The pattern of rRNA modifications and the set of snoRNAs that guide these modifications are conserved in vertebrates. Nearly all snoRNA genes in vertebrates are localized in introns of other genes and are processed from pre-mRNAs. Thus, the same promoter is used for the transcription of snoRNAs and host genes.
The series of studies by Dahai Zhu and coworkers on snoRNAs and their genes were critically considered. We present evidence that dozens of species-specific snoRNAs that they described in vertebrates are experimental artifacts resulting from the improper use of Northern hybridization. The snoRNA genes with putative intrinsic promoters that were supposed to be transcribed independently proved to contain numerous substitutions and are, most likely, pseudogenes. In some cases, they are localized within introns of overlooked host genes. Finally, an increased number of snoRNA genes in mammalian genomes described by Zhu and coworkers is also an artifact resulting from two mistakes. First, numerous mammalian snoRNA pseudogenes were considered as genes, whereas most of them are localized outside of host genes and contain substitutions that question their functionality. Second, Zhu and coworkers failed to identify many snoRNA genes in non-mammalian species. As an illustration, we present 1352 C/D snoRNA genes that we have identified and annotated in vertebrates.
Our results demonstrate that conclusions based only on databases with automatically annotated ncRNAs can be erroneous. Special investigations aimed to distinguish true RNA genes from their pseudogenes should be done. Zhu and coworkers, as well as most other groups studying vertebrate snoRNAs, give new names to newly described homologs of human snoRNAs, which significantly complicates comparison between different species. It seems necessary to develop a uniform nomenclature for homologs of human snoRNAs in other vertebrates, e.g., human gene names prefixed with several-letter code denoting the vertebrate species.
Small nucleolar RNAs constitute one of the largest groups of ncRNAs. They guide 2'-O-methylation and pseudouridylation of target RNAs, mainly rRNAs. SnoRNAs are divided into two groups according to the modification type: C/D box snoRNAs guide 2'-O-methylation, while H/ACA box snoRNAs guide pseudouridylation [1, 2]. To date, ~200 RNAs of both groups have been described . C/D box snoRNAs contain conserved C (UGAUGA) and D (CUGA) boxes brought together by complementary interactions between the snoRNA termini . In addition, their (often imperfect) copies C' and D' are located internally . Four core proteins bind these boxes, NOP56, NOP58, 15.5 kDa protein, and fibrillarin that catalyzes 2'-O-methylation . Upstream of the D and/or D' box there is an antisense element of 9-20 nucleotides that is complementary to one of the cellular RNAs and is able to interact with it. A nucleotide in the cellular RNA located four nucleotides from the D/D' box in the resulting RNA/RNA duplex is 2'-O-methylated [2, 7]. H/ACA box snoRNAs carry boxes H (ANANNA) and ACA (ACA) located at the base of two hairpins. The hairpins contain the antisense elements that are complementary to the target RNAs and are capable to interact with them. Four core proteins bind the H and ACA boxes, NHP2, NOP10, Gar1, and dyskerin; the latter catalyzes pseudouridylation [1, 8]. Some C/D and H/ACA RNAs called scaRNAs are localized to Cajal bodies rather than to the nucleolus and guide modification of the snRNAs . According to the new nomenclature accepted for human snoRNAs and scaRNAs, C/D snoRNAs, H/ACA snoRNAs, and scaRNAs are designated as SNORD, SNORA, and SCARNA, respectively .
Nearly all snoRNAs and scaRNAs genes in vertebrates are located within introns of other genes called host genes. The small RNAs are processed from pre-mRNAs of host genes [6, 11]. Only SNORD3, SNORD13, SNORD118, SCARNA2, and SCARNA17 are transcribed from intrinsic promoters . Most snoRNAs guide rRNA modifications. These modifications are essential for the ribosome function and probably contribute to rRNA folding, maturation, and stability [12, 13]. The modification pattern is conserved in vertebrates: most 2'-O-methylation sites are identical between Xenopus laevis and human . Homologous snoRNAs in different vertebrate species share the same antisense elements.
Recently, vertebrate snoRNAs have attracted the attention of several research groups [15–18]. In particular, our study of C/D snoRNAs in vertebrates demonstrated a trend towards low copy numbers of C/D snoRNA genes in placental mammals . We have also demonstrated that the set of C/D snoRNAs is well conserved among vertebrates and that species-specific snoRNAs guiding rRNA modifications are extremely rare. Shortly after this publication, Zhu and coworkers reported opposite results [18, 19]. Here, we demonstrate that their conclusions are incorrect due to a number of technical errors. We have mainly focused our criticism on their paper in BMC Genomics ; however, we also considered two other recent publications from the same group which are based on the same erroneous approaches [19, 20].
Zhang et al. cloned 64 rhesus monkey snoRNAs encoded by 80 genes . All of them were homologs of known human snoRNAs. Expression of these RNAs was tested by Northern hybridization in the muscle of several vertebrate species. Based on the results, Zhang et al. claimed that most of the cloned snoRNAs are not expressed in chicken, and some were not detected even in human and mouse (Table one in Zhang et al. ). Stated differently, they claimed lineage- or species-specific expression pattern for most of the cloned snoRNAs (59 out of 64).
This statement is contrary to the following. First, all snoRNAs cloned from rhesus monkey have been previously found in human (which allowed Zhang et al. to identify them) . Second, the pattern of rRNA modifications as well as the set of snoRNAs guiding these modifications are conserved in vertebrates [14–17, 21].
Examples of similarity variation between mammalian and avian snoRNAs
Human snoRNA identity to
mouse snoRNA, %
chicken snoRNA, %
The failure to detect snoRNA expression in human and mouse can be explained similarly. As one would expect, the closer genomic sequences, the more snoRNAs can be detected. Rhesus snoRNA probes detected more snoRNAs in human than in mouse, and more snoRNAs in mouse than in chicken (Table one in Zhang et al. ). Note that some snoRNAs whose expression was not detected in mouse (7 out of 17) had been described before (Additional file 1) [23–25]. Due to the same reasons, the attempt of Zhang et al. to detect snoRNAs that were not detected in muscle, in other human and mouse tissues also failed since the same rhesus probes were used.
The cases when snoRNA expression was not detected in human look particularly odd considering that all these snoRNAs have been initially described in human (Additional file 1). Moreover, the names specified, SNORA and SNORD, correspond to the new nomenclature specifically designed for human snoRNAs , a fact that alone indicates their expression in human. Thus, the lineage-specific and species-specific expression patterns of rhesus snoRNAs reported by Zhang et al. are experimental artifacts.
A similar mistake was made by Zhang et al. in their publication describing chicken snoRNAs . They cloned 125 chicken ncRNAs, mainly snoRNAs, and attempted to detect these RNAs in chicken, mouse, and human tissues by Northern hybridization. Similarly to the results discussed above, positive signal was largely observed in chicken only.
Chicken ncRNAs cloned and presented as novel RNAs by Zhang at al  are homologs of well-known human ncRNAs
Identifiable by Rfam search
Identifiable by snoRNAbase search
Cloned and properly identified by Shao et al. 
fragment of SNORA84
fragment of SCARNA11
fragment of SNORD46B
Zhang et al. performed a computer search for the targets of rhesus C/D snoRNAs (Additional file three in Zhang et al.). However, the targets for these snoRNAs were identified long ago, and the methylation of most of them was demonstrated . For instance, SNORD87 RNA can guide modification of G-3723 in 28S rRNA, and this nucleotide is actually 2'-O-methylated [14, 22] (Figure 2). With a few exceptions, the targets identified by Zhang et al. do not correspond to the confirmed ones. For example, the nucleotide in rhesus U6 RNA putatively modified by SNORD87 RNP is not methylated in human RNA  and, considering the conserved pattern of RNA modifications, is almost surely unmethylated in rhesus monkey (Figure 2). Zhang et al. identified methylation targets in 5S rRNA, whereas it has no 2'-O-methylated nucleotides in eukaryotes . In addition, due to a small size of antisense elements, hundreds of potential targets can be proposed; and presenting some of them without experimental verification of their methylation status is unsubstantiated.
It was shown that a modified base is located four nucleotides upstream of the D/D' box in the C/D snoRNA/target RNA duplex [2, 7]. In many cases presented by Zhang et al., e.g., in the putative SNORD87 target in SSU rRNA (Figure 2), a complementary sequence is more than four nucleotides away from the D/D' box, which makes the modification of these putative target RNAs by the proposed snoRNAs impossible.
In contrast to the consecutive increase in the number of snoRNAs from fish to mammals alleged by Zhang et al., we found that most mammalian C/D snoRNA genes have homologs in the genomes of other vertebrate classes (Figures 4, 5 and 6). This is not surprising considering that most snoRNAs are involved in rRNA modifications, and that the pattern of rRNA 2'-O-methylation and, likely, pseudouridylation is rather conserved in vertebrates . The cases when some snoRNA gene is not found in a particular species can be attributed to the gaps in the genome sequences (which are abundant in the genomes of vertebrates excluding human and mouse). A minor fraction of snoRNA genes can be missing in some vertebrate classes considering some variations in the pattern of rRNA modifications between vertebrates. For instance, differential rRNA 2'-O-methylation between human and frog is observed in 9 out of ~100 sites . It is of interest that about a half of missing snoRNA genes is observed in fishes (Figures 4, 5 and 6), which can point to a specific pattern of their rRNA methylation relative to other vertebrate classes.
Zhang et al. stated that the number of snoRNA genes steadily increases in the series from fish to mammals, and that there is a burst in their number in mammals . Again, ENSEMBL annotations based on the Rfam database were used rather than their own data. For each ncRNA, Rfam specifies all homologs in different species without specifying if a particular sequence is a gene or a pseudogene. This problem requires detailed examination of both the proper sequence and its genomic environment which is not covered by Rfam. Accordingly, Rfam records do not necessarily represent ncRNA genes, but may represent their pseudogenes as well, and this is clearly indicated in the Help section of the database . However, Zhang et al. considered all corresponding Rfam and ENSEMBL entries as snoRNA genes: they reported the identification of 744 snoRNA genes in rhesus monkey, 922 genes in mouse, more than 1000 genes in human, and ~2200 genes in platypus. The problem of snoRNA gene copy numbers in mammals is discussed in several publications by different groups (see review  and references therein). All these data agree with each other, as well as with our data : while the number of known mammalian snoRNAs is about 200, the total number of their genes does not exceed ~450 (i.e., some snoRNAs are encoded by single genes, and others are encoded by two, three, or more). This is substantially less than proposed by Zhang et al. Most mammalian-specific snoRNA genes found by them reside in intergenic regions rather than in introns. It is generally accepted that nearly all snoRNA genes of vertebrates are localized in introns of host genes, and only SNORD3 (U3), SNORD118 (U8), SNORD13 (U13), SCARNA2, and SCARNA17 are transcribed from their own promoters. It has been well documented that expression of the intronic snoRNAs requires transcription of the host genes (e.g., review  and references therein). That is why any sequence similar to an intronic snoRNA gene outside of introns is most likely a nonfunctional pseudogene. Only full-length copies with intact conserved regions and specific secondary structure can be considered as putative snoRNA genes. In addition, a search for a host gene, which may remain unannotated, should be done. Zhang et al. made no such analysis for the intergenic sequences annotated by ENSEMBL as snoRNA genes. Screening the human genome for snoRNA-like sequences revealed that most of them proved to be nonfunctional retrogenes with substitutions in the conserved regions [16, 30]. Clearly, Zhang et al. considered such pseudogenes as snoRNA genes. We have demonstrated that the number of C/D snoRNA pseudogenes is much higher in mammals than in other vertebrates . Therefore, the burst in mammalian snoRNA gene numbers alleged by Zhang et al. most likely represents the burst in the number of their pseudogenes.
Thus, Zhang et al. overestimated the number of snoRNA genes in mammals but underestimated the numbers of snoRNAs and their genes in other vertebrates. This led to a false conclusion that the numbers of snoRNAs and their genes increase in the series from fish to mammals.
SnoRNA pseudogenes with intact conserved regions could, in theory, be functional even when located outside of host gene introns, i.e. in intergenic regions. For that to happen, they should possess their own promoters that would allow independent transcription. Li et al. attempted to find such promoters for intergenic snoRNA-like sequences as well as independent promoters for snoRNA genes located within introns of the host genes . They selected 745 putative human snoRNA genes, 326 of which were located in intergenic regions. This is much a higher number than the generally accepted estimate of the number of snoRNA genes (~450, see above). Again, Li et al. used ENSEMBL annotations, thus, combining snoRNA genes and pseudogenes. The search for snoRNA promoters using the CoreBoost_HM program  identified them in 179 out of 745 loci: 155 intronic loci and 24 intergenic ones (Table two in Li et al. ).
Based on these results, Li et al. proposed five models of snoRNA transcription. The first model assumes that transcription of a snoRNA and a host gene occurs from a common promoter and is generally accepted. This model describes most of the snoRNAs studied. Other models assume that transcription of a snoRNA gene occurs from an independent promoter.
The second model suggests an intronic snoRNA gene with its own promoter independent of a host gene promoter. This model was exemplified by one of SNORD3 (U3) genes located in an intron of the TEX14 gene on chromosome 17 (Model I, Figure one in Li et al. ). However, it is well known that SNORD3 always possesses its own promoter and requires no host gene for its transcription. Therefore, SNORD3 can not be used as an illustration of the proposed model. Moreover, the sequence on chromosome 17 has numerous substitutions in the functional regions and, hence, is a nonfunctional SNORD3 pseudogene (Additional file 6).
Other genes identified by Li et al. as independently transcribed snoRNA genes are presented in Additional file 6. In each case, there is either an unnoticed host gene harboring snoRNA genes in its introns or a snoRNA pseudogene with substitutions questioning its functionality. A few exceptions are SNORA26-like sequence with intact functional regions and seven SNORD115 genes. However, there are no ESTs confirming independent transcription of these genes, whereas for all independently transcribed human snoRNAs ESTs marking their transcription can be found.
Thus, all examples of snoRNA independent transcription presented by Li et al. (possibly, excluding SNORA26-like sequence and SNORD115 genes) are inadequate.
Studies by Zhu and coworkers attracted our attention since their results were at variance with our data. The main contradiction was the estimated number of snoRNA genes in vertebrates. Our estimation of the number of mammalian C/D snoRNA genes  agrees with the data obtained by other groups: the total number of mammalian snoRNA genes known to date does not exceed ~450 (review  and references therein). In addition, we have shown a lower number of C/D snoRNA genes guiding rRNA modifications in mammals relative to other vertebrate classes . Conversely, Zhang et al. stated that the number of mammalian snoRNA genes sharply increased to ~1000 compared to other vertebrate classes . Here we demonstrated inadequacy of their techniques, which invalidates their conclusions. In particular, they considered numerous pseudogenes as snoRNA genes in mammals and failed to detect many snoRNA genes in other vertebrate classes.
Possible existence of species-specific ncRNAs is extremely interesting, and it is being explored by many groups. Zhang et al. reported numerous lineage-specific and species-specific snoRNAs in chicken  and in rhesus monkey . Here we demonstrated that their conclusions were based on a systemic error: Zhang et al. detected snoRNA homologs in vertebrate species using a probe for snoRNA of another vertebrate species, while the sequence identity of such homologs can go below 60% (Table 1). Under these conditions, standard Northern hybridization technique can not be used for homologs detection.
While application of genomic and EST sequence collections has become routine in bioinformatic studies, using automatic annotations of genes, especially ncRNA genes, requires great caution. For instance, ENSEMBL ncRNA annotations based on the Rfam data are excellent landmarks for genome researchers. However, the rates of false positives and missed genes in these annotations, at least in snoRNA annotations, make their application unacceptable for studies specifically designed to identify new ncRNA genes. For example, Rfam makes no distinction between snoRNA genes and pseudogenes, but Zhang et al. considered all annotated snoRNA sequences as snoRNA genes, which led them to erroneous conclusions [18, 20]. In addition, existing automatically generated databases still do not include all ncRNA homologs in different species. Therefore, special studies are needed to prevent underestimation of ncRNA number. E.g., Rfam lacks many snoRNA sequences presented here (Additional file 4) or available in the snoRNABase . Zhang et al. made no attempt to overcome this problem, and, as a result, missed many snoRNA genes in different vertebrates. Thus, relying only on automatic annotations can lead to erroneous conclusions. Actually, most researchers pursue their own way through the genomic thicket to succeed in snoRNA studies [25, 32–34].
We especially focused on this issue since at least one more publication reported questionable conclusions concerning vertebrate snoRNAs based on the Rfam and ENSEMBL annotations as well as multispecies whole-genome alignments . Again, the fact that snoRNA genes and pseudogenes are not distinguished in the Rfam entries was not taken into account.
Lots of snoRNAs have been described in different vertebrates to date, which necessitates the unification of their nomenclature. Zhang et al. gave a new name to each chicken homolog of human snoRNA . This practice is not exclusive to Zhang et al. but is common in almost all publications describing snoRNAs in vertebrates apart from human. This was justified during the period of time when novel snoRNAs rather than homologs of known ones were being identified (e.g., ). Presently, a convenient nomenclature has been developed for human snoRNAs , and identification of novel snoRNAs has become extremely rare. In this context, giving new names to snoRNAs, whose homologs have been identified in other vertebrates, is highly confusing. It gives an erroneous impression that novel snoRNAs have actually been found and confuses the overall picture. For instance, a special investigation should be conducted to understand that the GGgCD37b snoRNA identified in chicken by Shao et al.  corresponds to Ggn109 found by Zhang et al. in chicken, too , and is a homolog of human SNORD38. The analysis of the whole set of data presented in these papers becomes hardly practicable. Finally, it is very hard to recognize the rare cases of a truly novel RNA identification. A positive practice in the field can be exemplified by the Rfam database specifying all homologs of human snoRNAs by the human RNA name. Since new publications describing snoRNAs in vertebrates can be expected, we propose to develop a nomenclature convention for the homologs. The human snoRNA names can be used with prefixes denoting the vertebrate species, e.g., mmusSNORD87 for the mouse homolog of human SNORD87. We propose to use four-letter prefixes to distinguish species such as Mus musculus (mmus) and Microcebus murinus (mmur).
Recent data indicate that many miRNA genes located within introns of host genes have their own promoters . This interesting and unexpected finding inspires one to test a similar pattern in snoRNAs, nearly all of which are encoded within introns in vertebrates. Noteworthily, no experimental data supporting the hypothesis of intronic snoRNAs transcription from their own promoters are available to date. At the same time, their transcription within the host gene pre-mRNA from the host gene promoter has been well documented dozens of times (e.g., review  and references therein). Thus, the idea of transcription of intronic snoRNAs from their own promoters is at variance with our current knowledge about their expression, and identification of such promoters should have solid experimental support. Preliminary bioinformatic analysis can be beneficial, but it should be adequate and thorough, which was not the case with Li et al. .
Currently, discovery of the species-specific ncRNAs is generally anticipated that may lead to less critical peer reviewing of publications reporting such RNAs. Here we show that the result can be harmful to the field. Even more importantly, such publications began to misshape our understanding of ncRNAs: one of the papers criticized here  has already been cited in a recent review .
Vertebrate genomes may actually contain many not yet identified snoRNAs. This idea is supported by the data from several groups [32, 33, 38]. However, publications like the ones considered here only add confusion to the problem rather than contribute to the solution. Thus, it is very important to prevent a false start in this exciting field.
Homologs of human C/D box snoRNA genes in vertebrate genomes were searched as follows. First, homologs of human host genes were found in vertebrate genomes using the Comparative Genomics panel of UCSC Genome Browser at http://genome.ucsc.edu. Then, the introns of the host genes were manually searched for the presence of snoRNA genes. If unsuccessful, snoRNA sequences were searched by WU-BLAST 2.0 http://www.ensembl.org/Multi/blastview with increased sensitivity parameters: high sensitivity (search for distant homologies) was chosen; W (word size for seeding alignments) = 3 and Q (cost of first gap character) = 1 were set. The intronic location of the search hits was checked using the mRNA and EST databases integrated into the UCSC Genome Browser. The hits with intact C, D/D' boxes, and the antisense element, flanked by short inverted repeats and located within introns of host genes were considered as snoRNA genes. Finally, extra copies of snoRNA genes were searched in the host gene introns.
NcRNAs discussed in [18–20] were analyzed using the UCSC Genome Browser and snoRNABase and Rfam databases [3, 21]. Pairwise and multiple alignments were generated by Clustal V and Clustal W [40, 41]. RNA secondary structures were analyzed using the mfold program [42, 43].
Summary of C/D box snoRNA numbers predicted by M&K in 16 vetebrate genomes (data from additional file five of M&K)
Predicted snoRNA number
Numbers of C/D box snoRNAs in human genome reported by different groups
Number of C/D box snoRNA
The HUGO Gene Nomenclature Committee
Rfam (release 10.0)
ENSEMBL (release 63)
ENSEMBL (release 50)**
Reported by M&K
Reported by M&K
Additional file four of M&K
The work was supported by the Molecular and Cellular Biology Program of the Russian Academy of Sciences and the Russian Foundation for Basic Research (project no. 11-04-00439-a).
Response to: SNOntology: Myriads of Novel SnoRNAs or Just a Mirage?
By Dahai Zhu
Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, 5 Dong Dan San Tiao, 100005, Beijing, China
The work presented by Makarova and Kramerov (M&K) examined our previous studies on chicken and monkey snoRNAs, as well as our work on snoRNA promoter analysis [18–20], and raises some questions. We appreciate the attention given to our work. However, although some of the points raised are reasonable, many of the conclusions are based on biased information, misinterpretation of our results, or analysis of inconsistent datasets.
First, many basic concepts on snoRNAs presented in the M&K manuscript are outdated. For example, in the background section, the authors claim that 'To date, ~200 RNAs of both groups have been described', but the reference cited was published in 2006. The current non-coding RNA collection (in Rfam, release version 10.0) includes 519 snoRNA families and a total of 108, 332 snoRNAs . The authors state that "nearly all snoRNAs and scaRNAs genes in vertebrates are located within introns of other genes. In fact, there are only five exceptions". This point also serves as support for the criticisms on our analysis of independently transcribed snoRNAs. However, this statement must be updated, because the reported number of human intergenic snoRNAs has been far exceeded that given by the authors, and some are indeed independently transcribed, even if intronically encoded, as reviewed in . The recently discovered regulatory functions of snoRNAs [45, 46] are also overlooked.
The authors criticize our analysis of lineage- or species-specific snoRNAs, and give the following reasons. First, "all snoRNAs cloned from rhesus monkey have been previously found in human"; second, "the pattern of rRNA modifications as well as the set of snoRNAs guiding these modifications are conserved in vertebrates"; and third, "the failure to detect the expression of some snoRNAs is due to the sequence divergence among species". Our answers to these questions follow. In terms of the first statement, as we mentioned in our paper, we indeed identified homologous snoRNA genes or pseudogenes for all the rhesus monkey snoRNAs that we cloned. However, as the human snoRNAs used in our study, as well as those to which M&K refer , have been identified by both cloning and computational prediction methods, the presence of a monkey snoRNA homologous sequence in the human genome does not directly indicate that those snoRNAs are expressed in human cells. In terms of the second statement, we do not understand why functional conservation of rRNAs within a large family can be used to support the notion that lineage- or species-specific snoRNAs are absent, especially given the increasing body of evidence indicating the regulatory roles played by snoRNAs in humans [6, 7]. In terms of the third statement, it is possible that the lack of detectable signals from some snoRNAs in the chicken is attributable to sequence divergence. However, we speculate that this may not be the major reason as we were able to obtain positive northern blot hybridization signals for some sequences with as low as 12% conservation, but failed to obtain signals for some sequences with 100% conservation. We plan to gather further experimental data using species-specific probes to update our conclusion.
We think that the authors' criticism of our 'novel' chicken ncRNA work is very misleading. In the cited report, we identified 125 chicken ncRNAs including 102 snoRNAs, using a direct cloning method. Compared with the chicken snoRNAs predicted by Rfam, we found 25 snoRNAs that were not reported in chicken, and termed these molecules "novel snoRNA candidates". We also mentioned that 12 of the novel snoRNA candidates that we cloned had also been independently identified by Qu's group . Although the snoRNAs identified by us in chicken have homologs in other vertebrates (Supplemental File 1 of our original work), majority of them have very low levels of sequence similarity as compared to human snoRNAs. When we conducted the analysis just mentioned, the human snoRNA homologs listed in Table two of M&K were not included in the ENSEMBL and Rfam datasets. Therefore, we could not find human homologs of those snoRNAs. Similarly, the snoRNA homologs listed in Figure six of M&K were also not included in the versions of the ENSEMBL datasets that we used for monkey snoRNA analysis, but are indeed included in the current release. As it is well-known that the human genome annotation is consistently being updated, we think it is inappropriate and misleading to compare results obtained using different datasets.
We admit that our snoRNA target prediction methods may not be perfect; we were aware of this possibility when we conducted our work, but no better snoRNA target prediction software was available at that time. Thus, in our paper, we reported only the comparative conservation of putative snoRNA target sites between human and rhesus monkey. To render comparisons consistent among snoRNAs, we did not refine our predictive results using known targets, because correction in one species may lead to biased results in the conservation analysis. We did emphasize that the target sites that we listed were all putative.
The authors question the accuracy of the numbers of snoRNAs in different species contained in the ENSEMBL and Rfam databases. They have designed a snoRNA prediction tool based on refined sequence similarity search and have identified 1, 352 C/D box snoRNAs in 16 vertebrate species (Additional File five of M&K). Based on that result, they claim that the copy number of C/D box snoRNA genes is lower in mammals than in other vertebrates. We have analyzed the 1, 352 C/D box snoRNAs used in their study (Table 3). To our surprise, only 20 human snoRNAs were included in the list, and the numbers of snoRNAs of other mammals were also very low. However, the current numbers of recorded human C/D box snoRNAs deposited in several major databases range between 230~460 (Table 4), and at least 270 such predictions are supported by EST evidence (Data not shown). Therefore, the number of snoRNAs predicted (by M&K) in vertebrate genomes is obviously far less than the numbers of known snoRNAs supported by experimental evidence.
The authors use SNORD87 as an example to demonstrate the presence of 'a trend towards low copy numbers of C/D snoRNA genes in placental mammals'. However, many opposing examples could be given. One such is the SNORD115 and SNORD116 C/D box snoRNA families which are absent in non-eutherian vertebrate genomes but present as 30~50 tandem repeat copies on human chromosome 15q11-13. Mutations in these snoRNA clusters have been shown to be the cause of autism spectrum disorder and Prader-Willi syndrome [47, 48]. However, these clusters were omitted from the M&K analysis.
The authors suggest that the numbers of snoRNAs obtained in our analysis are overestimates, given that some mammalian snoRNAs may be pseudogenes. We mentioned the possible existence of pseudogenes in our original work. However, as we reported (Figure 4A & B of our original paper), the numbers of snoRNAs and snoRNA families can be seen to have increased during evolution even when only intronic snoRNAs are considered. In addition, the expansion of snoRNA pseudogenes could also be considered to reflect snoRNA duplication.
M&K also question our snoRNA promoter prediction results . In that work, we integrated the manual snoRNA dataset of Dieci et al.  with the Ensembl dataset (Release 53)  to perform promoter predictions for human snoRNAs. As a result, we proposed five transcriptional models for human snoRNAs. M&K challenge our models II and III by arguing that several snoRNA loci with putative independent promoters reported in our study might be pseudogenes because of the presence of short sequence deletions or sequence variations. However, their claim of SNORD3 as a pseudogene for the lack of 100% sequence conservation at functional regions is not convincing. As shown in our earlier work , the detected DNase I-hypersensitive sites and the Pol II binding site are all located within 500 bp of the predicted TSS of SNORD3, strongly supporting the idea that the SNORD3 locus is transcriptionally active.
Although snoRNAs function mainly as modulators of ribosomal RNAs, snoRNAs may have broader functions than previously appreciated. One possibility is that snoRNAs may serve as precursors of microRNAs and may possess microRNA-like functions [46, 50]. Some snoRNAs are known to regulate alternative splicing of their target mRNAs [45, 51, 52]. Therefore, genomic loci harboring snoRNA variants might have non-canonical functions different from those of typical snoRNAs, although transcriptional activity must be experimentally proven. Moreover, active transcription of pseudogenes actually plays an important role in gene expansion during genome evolution. Overall, it is inadequate and illogical for M&K to point to potential pseudogenes to challenge snoRNA transcription models II and III.
M&K argue that some intergenic snoRNA examples used by us in our snoRNA promoter study were indeed of intronic origin. As illustrated in Figure Four b of M&K, SNORD60 lie in the intronic region of some ESTs, however, many unspliced ESTs were omitted in their figure (Figure 8). Similar cases are SNORD104 and SNORA76 shown in additional file six of M&K. Previous studies have demonstrated that SNORD104 and SNORA76 are independently transcribed , which is in agreement with our results. For another example SNORD93, it is located within an intergenic region according to the RefSeq and UCSC gene models (hg18) used in our previous work , but was reannotated as an intronic snoRNA in the hg19 release. Such information update should not be classified as analysis errors.
In summary, because of the nature of computational prediction work, it is very unlikely that bioinformatic analysis data will ever be error-free. We welcome updated analysis of our data using improved methods and enriched reference sources. However, the work presented in the report by M&K is characterized by the drawing of conclusions based on biased information, and misinterpretation of both their own and our results, which may add more confusions to the field.