Recent studies have demonstrated that the functions of non-protein-coding RNAs may encompass almost every aspect of biological activity in normal development and disease biogenesis [21, 25, 42–45]. Rhesus macaques are a suitable primate model for basic and applied biomedical research [38, 39]. However, in contrast to the considerable literature on human and mouse ncRNAs, rhesus monkey ncRNAs have not previously been systematically characterized. Here, we performed a detailed screening of the rhesus monkey intermediate-size ncRNA transcriptome and cloned 117 rhesus monkey ncRNAs, including 80 snoRNAs, eight unclassified ncRNAs, and 29 known RNAs (snRNAs, Y RNA, and others). By comparative genomics analysis, we found several lineage- or species-specific snoRNAs. Genomic organization analysis showed that the majority of rhesus monkey snoRNAs have many paralogs in the rhesus monkey genome. By flanking sequence analysis, we found that SINE-like retroposon-mediated trans -duplication may have been an important mechanism in expansion of novel snoRNAs in the rhesus monkey genome.
Among the 117 identified rhesus monkey ncRNAs, eight unclassified ncRNA candidates could not be assigned to any known class of ncRNA. These eight unclassified ncRNAs were ubiquitously expressed in the six rhesus monkey tissues tested. Recently, we also identified nine unclassified ncRNAs from the chicken . Previous reports also showed that some ncRNAs obtained from cDNA library sequencing did not belong to any known ncRNA family, and these ncRNAs were designated as unclassified or unknown [13, 31, 47]. Hüttenhofer and coworkers found 57 such unclassified ncRNAs, of length 50~500 nt, in mouse brain cDNA libraries . Deng et al. reported 14 unclassified ncRNAs of length 70~200 nt in a C. elegans cDNA library . Yuan identified 29 unclassified ncRNAs by constructing cDNA libraries from four developmental stages of Drosophila melanogaster . These unclassified ncRNAs often show little sequence conservation and are less prevalent compared to known snoRNAs. However, these observations do not mean that the unclassified ncRNAs are non-functional [13, 31, 48]. The increasing number of newly identified unclassified ncRNAs suggests that other types/classes of ncRNA of intermediate size (50~500 nt) remain to be identified, and novel ncRNA families will likely be susceptible to classification using enhanced bioinformatic comparisons and extensive functional studies of the roles played by such ncRNAs.
Previous reports showed that the majority of known snoRNAs were conserved between human and mouse, at a level of 80~90% . Most rhesus monkey snoRNAs identified in the present study show high homology to those of the human and mouse. However, 13 snoRNAs had a conservation score below 0.6 (Table 1), suggesting that some snoRNAs are less conserved between primates and rodents. Using comparative genomics analysis, we found several lineage- or species-specific snoRNAs. Fifteen snoRNA families were ancient, being present at an early stage of vertebrate evolution, whereas 11 snoRNA families appeared after the divergence of birds and mammals. Fourteen young snoRNA families arose during mammalian evolution and one of these (SNORA15) developed only after primates had arisen. Our findings are in line with recent studies in other species. Previously, we found 30 chicken/bird-specific ncRNAs , and Schmitz reported 49 platypus-specific snoRNAs . Computer analysis of human genomic tiling array data revealed 300 putative candidates for classification as primate-specific ncRNAs . Together with previous reports, our data show that ncRNAs may play important roles in lineage development, or speciation, during evolution.
Although homologs of some rhesus monkey snoRNAs could be found in the mouse and human genomes, the expression of several snoRNAs was not detectable by northern blotting, suggesting that some snoRNA homologs might be pseudogenes without transcriptional potential in the human and/or mouse. Thus, we found only 14 potential primate-specific and eight rhesus monkey (or non-human primate)-specific transcripts (Table 1 and Figure 2). However, it remains possible that undetectable expression in the human or mouse might be attributable to transcriptional regulation by spatio-temporal, physiological, or pathological stimuli/stresses that were not present under the normal conditions prevalent when our tissue samples were taken. In support of this hypothesis, several examples of tissue-specific expression of ncRNAs have been reported in previous studies describing brain-specific snoRNAs or snoRNAs involved in neuronal development . By analogy, some microRNAs and piRNAs display specific spatio-temporal expression patterns, and play functional roles in cell differentiation and organogenesis during development [11, 12, 52, 53]. In the present study, we also found that SNORA71, ubiquitously expressed in human and rhesus monkey tissues, is predominantly expressed in the brain of mouse.
In vertebrates, most snoRNAs are located within introns of protein-coding or non-protein-coding genes [21, 54]. Some snoRNAs are present as several copies, either in different introns of the same gene or within introns of different genes [32, 55]. Genomic organization analysis showed that the majority of the rhesus monkey snoRNAs identified in this study have multiple paralogs in the rhesus genome, suggesting redundancy arising from duplication, including transposition. Diverse molecular mechanisms may be involved in the creation of protein-coding genes, such as gene duplication and retroposition . To investigate the mechanisms of rhesus monkey snoRNA expansion, we analyzed the flanking sequences of each snoRNA paralog and found that these sequences adjacent to some rhesus monkey snoRNAs have a typical SINE-like retroposon characterized by a poly(A) end and TSDs, suggesting that some rhesus monkey snoRNA paralogs are retrogenes formed by autonomous retroposon-mediated retroposition. In addition, the 5' flanking sequences of rhesus monkey SNORA76b and SNORA24b possess T2A4 motifs, which are preferentially recognized by the L1 retroposon-encoded nicking endonuclease, suggesting that SNORA76b and SNORA24b were generated from a parental copy by L1 integration machinery-mediated retroposition. Significantly, we found that six paralogs of SNORA25 also possess typical SINE-like retroposon characteristics, and contain multiple poly(A) sequences, indicating that SNORA25 underwent multiple duplication events during evolution. Thus, we propose a model involving retroposition for SNORA25 duplication. Recently, the mechanisms of snoRNA gene expansion in other species have been reported. In nematodes, some snoRNA paralogs were generated by cis - or trans -duplication . Other data suggest that mammalian snoRNA genes are SINE-like retroposons (snoRTs/snoRTEs), and that retroposition mediated by snoRTs may have played an important role in snoRNA expansion during evolution of the mammalian genome [33–35]. The extensive expansion of snoRNA-encoding genes during mammalian evolution might ensure the presence of a functional copy when a parental gene loses function because of mutation. On the other hand, novel paralogs could independently evolve to generate isoforms with different targets/functions, for example the acquisition of new sites complementary to modification regions of rRNAs .