Research article | Open | Published:
Systematic identification and evolutionary features of rhesus monkey small nucleolar RNAs
BMC Genomicsvolume 11, Article number: 61 (2010)
Recent studies have demonstrated that non-protein-coding RNAs (npcRNAs/ncRNAs) play important roles during eukaryotic development, species evolution, and in the etiology of disease. Rhesus macaques are the most widely used primate model in both biomedical research and primate evolutionary studies. However, most reports on these animals focus on the functional roles of protein-coding sequences, whereas very little is known about macaque ncRNAs.
In the present study, we performed the first systematic profiling of intermediate-size ncRNAs (50 to 500 nt) from the rhesus monkey by constructing a cDNA library. We identified 117 rhesus monkey ncRNAs, including 80 small nucleolar RNAs (snoRNAs), 29 other types of known RNAs (snRNAs, Y RNA, and others), and eight unclassified ncRNAs. Comparative genomic analysis and northern blot hybridizations demonstrated that some snoRNAs were lineage- or species-specific. Paralogous sequences were found for most rhesus monkey snoRNAs, the expression of which might be attributable to extensive duplication within the rhesus monkey genome. Further investigation of snoRNA flanking sequences showed that some rhesus monkey snoRNAs are retrogenes derived from L1-mediated integration. Finally, phylogenetic analysis demonstrated that birds and primates share some snoRNAs and host genes thereof, suggesting that both the relevant host genes and the snoRNAs contained therein may be inherited from a common ancestor. However, some rhesus monkey snoRNAs hosted by non-ribosome-related genes appeared after the evolutionary divergence between birds and mammals.
We provide the first experimentally-derived catalog of rhesus monkey ncRNAs and uncover some interesting genomic and evolutionary features. These findings provide important information for future functional characterization of snoRNAs during primate evolution.
It is widely accepted that up to 90% of the human genome is transcribed into various types of RNAs [1–4]. However, only a very small proportion of transcripts (~2-3%) encode proteins. Although there is a possibility that many transcripts are simply noise , a considerable number of non-protein-coding RNAs (npcRNAs/ncRNAs) are produced [1–4]. The increasing numbers of ncRNAs found by systematic genome-wide screening have also demonstrated the widespread existence of ncRNAs in nature [6–9]. The ncRNAs can be categorized by length as 19~35 nt small ncRNAs such as miRNAs and piRNAs [10–12]; intermediate-size ncRNAs, ranging between 50 and 500 nt, such as the small nucleolar RNAs (snoRNAs) ; and long mRNA-like ncRNAs with sizes larger than 500 nt [14–18].
snoRNAs function mainly as modulators of ribosomal RNAs (rRNAs) , and represent the largest group of functional ncRNAs. Based on sequence and structural features, snoRNAs can be classified into two families-box C/D snoRNAs and box H/ACA snoRNAs-which guide site-specific 2'-O-ribose methylation and pseudouridylation of rRNA, respectively [20, 21]. The spectrum of snoRNA targets is continuously growing. Some snoRNAs control methylation of tRNAs [22, 23]. Small Cajal body RNAs (ScaRNAs), a subset of snoRNAs with box C/D and/or box H/ACA, regulate post-transcriptional modification of RNA polymerase II-transcribed snRNAs . Recent findings have demonstrated that snoRNA can also target mRNA, to guide alternative splicing . Another interesting discovery is that snoRNAs may be precursors of microRNAs and possess microRNA-like functions [26, 27]. Together, available evidence suggests that snoRNAs may have broader functions than previously appreciated.
The genomic organization of snoRNA genes displays great diversity in different organisms. Unlike yeast and plants, in which snoRNAs are usually transcribed from independent polymerase II transcription units with dedicated promoters , most vertebrate snoRNAs reside in the introns of protein-coding or non-protein-coding genes and are generated by splicing-dependent processing [29, 30]. Intron-encoded snoRNAs may also have special promoters to drive snoRNA transcription . Many snoRNA genes have multiple paralogs derived from one or more duplications . In nematodes, the paralogs of intron-encoded snoRNA genes were likely generated by cis- and trans-duplication mechanisms . Luo and Li demonstrated that most human box H/ACA snoRNAs were retrogenes produced by L1 integration . Weber reported that many mammalian snoRNAs were mobile genetic elements designated as snoRNA/scaRNA retroposons (snoRTs, scaRTs) . Recently, Schmitz and colleagues discovered a platypus-specific snoRNA retroposon with powerful transposable activity that replicated a single snoRNA to form about 40,000 paralogs in the whole genome . It is therefore possible that retroposition of snoRNA genes may have played an important role during evolution of mammalian genomes.
Based on these recent findings, it is likely that ncRNAs have important functions in almost every aspect of eukaryotic growth regulation. However, only a limited number and classes of ncRNAs have been discovered to date. Therefore, systematic identification of ncRNAs from various organisms is a critical primary step in the provision of a road map for functional studies of ncRNAs in various organisms. The rhesus macaque (Macaca mulatta) is the most thoroughly studied primate apart from humans. Although phylogenetically separated by more than 70 million years of evolution [36, 37], rhesus macaques and humans are closely related and share a common ancestor dating back to about 25 million years ago [36, 38]. Therefore, study of rhesus monkeys assists primate evolutionary research and modern biomedical programs [38, 39]. A total of 21,905 protein-coding genes and 5,253 non-protein coding genes (including 715 predicted snoRNA loci) have been identified in the rhesus monkey genome by the ENSEMBL genome annotation group . Although the expression pattern and possible functions of many protein-coding genes have been reported, identification of non-protein-coding genes of the rhesus monkey has relied only on computational predictions, by searching for sequences similar to those of known ncRNAs identified in other species. Such an approach is obviously inappropriate for identification of novel ncRNAs. Here, we conducted a systematic experimental identification of rhesus monkey ncRNAs by constructing a cDNA library derived from RNA fragments with sizes of 50 to 500 nt. We identified 117 non-redundant ncRNAs, including 80 snoRNAs and eight unclassified ncRNAs. We found that some of our identified ncRNAs were lineage- or species-specific. Further analysis of the genomic organization of these ncRNAs demonstrated that the majority represented snoRNAs with multiple paralogs in the rhesus monkey genome. Detailed analysis of the flanking sequences of each of the snoRNA paralogs revealed that some snoRNAs were retrogenes generated through L1-mediated integration machinery, suggesting that retroposon-mediated trans-duplication may have been a driving force for expansion of novel snoRNAs in the rhesus monkey genome.
Systematic identification of rhesus monkey ncRNAs by analysis of a cDNA library
Full-length intermediate-size ncRNA-enriched libraries (50~500 nt) were constructed using a previously described method , with minor modifications. This ensured that the libraries contained a substantial proportion of full-length ncRNA clones with defined 5' and 3' termini. The RNA used in library construction was extracted from the heart and skeletal muscle tissue of rhesus monkey. In total, 4,844 clones from two full-length cDNA libraries were sequenced. After discarding matches to tRNAs, rRNAs, and mRNAs, the remaining 835 sequences were considered to be putative ncRNAs and analyzed further. By merging redundant sequences and comparing the sequences and secondary structures of such putative ncRNAs with known ncRNAs annotated in the ENSEMBL and Rfam databases, the 835 clones were classified into 117 ncRNAs, including 80 snoRNAs (32 C/D box snoRNAs and 48 H/ACA box snoRNAs) representing 64 snoRNA families, 17 snRNAs, one 7SK RNA, six Y RNAs, two 7SL RNAs (SRP-RNA), one vault RNA, one ribonuclease P RNA component H1 (RPPH1), one RNA component of mitochondrial RNA processing endoribonuclease (RMRP), and eight unclassified ncRNA candidates (Figure 1 and Additional File 1).
All rhesus monkey snoRNAs identified in this study have known human homologs. Among 80 rhesus snoRNAs, 68 show perfect matches with the human homologs, whereas the other twelve rhesus snoRNAs are also highly conserved between monkey and human, with conservation scores over 0.96 (Table 1). In addition to showing homology in sequence and/or secondary structure with known human snoRNAs, all of our cloned snoRNAs had the conserved snoRNA motifs. In the 32 C/D box snoRNAs, we identified 52 pairs of the C/C' box with the D/D' box (Additional File 1). An H box and an ACA box were also found in the secondary structures of all H/ACA snoRNAs (Additional File 2). We further searched the sequences of each rhesus monkey snoRNA and the human homolog. The data showed that guide sequences and target sites were highly conserved between rhesus monkey and human (Additional Files 3 and 4).
Expression patterns of the 117 ncRNAs in tissues of rhesus monkey and other species
The expression of the 117 ncRNAs was confirmed by northern blotting (Additional File 5). All tested ncRNAs were expressed in the six examined tissues of rhesus monkey (spleen, brain, kidney, liver, heart, and skeletal muscle). Several quantitative differences in the expression abundance of ncRNAs were observed among different tissues, but no tissue-specific expression pattern could be discerned. We also investigated expression of the 117 monkey ncRNAs in the skeletal muscle tissue of human, mouse, and chicken, by northern blotting (Additional File 5). Based on the observed expression patterns, the 117 ncRNAs could be classified into six groups (Figure 2). Group 1 ncRNAs were expressed in chicken, mouse, human, and all examined rhesus monkey tissues; group 2 ncRNAs were detected in monkey, human, and mouse, but not in chicken; group 3 ncRNAs were expressed in monkey and human; and group 4 ncRNAs were detected only in rhesus monkey tissues. Interestingly, SNORD45 was expressed only in mouse and monkey (group 5); SNORD50 was detected in chicken, human, and rhesus monkey, but was absent from the mouse (group 6). To rule out the possibility that the lack of detectable signals in northern blotting was caused by tissue-specific expression of lineage/species-specific ncRNAs, we investigated the synthesis of these materials in nine human and mouse tissues, but no signals were detected (Additional File 5).
Conservation analysis of rhesus ncRNAs using the BLAST algorithm, as well as comparison with human, mouse, and chicken genomic sequences, demonstrated that most (96/117) lineage/species-specific expression patterns were supported by sequence homology, although expression of some highly conserved sequences was not detected (Table 1). The expression patterns of ten ncRNAs in groups 4, 5, and 6 were inconsistent with the conservation scores of their genomic sequences across different species. For example, eight ncRNAs of group 4 showed conserved sequences but no detectable expression in human tissues (Table 1 and Additional File 5). It is possible that the homologs of these ncRNAs are pseudogenes, or are expressed at levels below the threshold of sensitivity of the northern blot. Alternatively, the homologs might be transcriptionally regulated in a spatio-temporal fashion, or by physiological or pathological stimuli/stresses, and would thus not be constitutively expressed under normal conditions.
Comparative genomic analysis of rhesus monkey snoRNAs
The secondary structures and functional boxes of snoRNAs were found to be highly conserved , but the nucleotide sequences outside of the hallmark boxes and the antisense regions of snoRNAs changed during vertebrate evolution. To investigate the sequence conservation of snoRNAs over the course of primate evolution, we plotted the sequences of 64 rhesus monkey snoRNA families against those of eight other primate genomes. As genomic sequences of some species are incomplete, only 25 snoRNA families showed identifiable homologs in all eight primate species examined. Sequence alignment data showed that some snoRNAs sequences diverged even among closely related primates. The sequence alignments of the top five divergent snoRNAs are shown in Additional File 6.
To determine when rhesus monkey snoRNAs appeared during vertebrate evolution, we searched for homologs of 58 rhesus monkey snoRNA families (six families were excluded because of a lack of annotation in either or both of the human and mouse datasets) in seven other representative vertebrates, based on annotations in ENSEMBL Release 50 (Additional File 7). Among the 58 rhesus monkey snoRNA families, 15 shared homologs even in zebrafish and medaka (Figure 3, Group 1), indicating that these snoRNAs appeared early in vertebrate evolution. Eight snoRNA families were detected in reptiles and evolutionarily later species (Figure 3, Group 2), and 10 snoRNA families appeared after the emergence of birds (Figure 3, Group 3). The remaining 25 snoRNA families were present in mammals but clearly absent in birds and other non-mammalian species (Figure 3, Groups 4-6). Thirteen of 25 mammalian snoRNA families did not have homologs in the platypus genome (Group 5), suggesting a later emergence in mammalian evolution. Finally, one snoRNA without a homolog in the mouse may be primate-specific (Group 6).
SnoRNA expansion during vertebrate evolution
The total number of snoRNA-encoding genes increased during vertebrate evolution, based on data from the ENSEMBL genome annotation project . We asked whether this increment in snoRNA genes was attributable to the generation of multiple paralogs by duplication mechanisms, or arose de novo by accumulation of nucleotide mutations, or was attributable to the action of other driving mechanisms. Of course, these possibilities may be combined. To address this question, we collected all predicted and validated snoRNA sequences from eight representative vertebrate species represented in the ENSEMBL database, including zebrafish, medaka, frog, chicken, platypus, mouse, rhesus monkey, and human, and calculated the total number of snoRNA genes as well as the number of snoRNA families (any snoRNA family could contain a single copy snoRNA or have multiple paralogs in the genome). As shown in Figure 4A, the number of snoRNA families increased during vertebrate evolution, indicating a de novo origin of snoRNA genes. In addition, the number of intron-encoded snoRNAs rose significantly in birds and thereafter appeared in mammals, contributing extensively to the expansion of snoRNA families (Figure 4A). The total number of snoRNA-encoded genes increased suddenly in mammals after the divergence from birds. Also, the expansion of mammalian snoRNAs usually involved intergenic-encoded snoRNAs, and the principal contribution to expansion was the production of many members of such snoRNA families (Figure 4B and 4D). The number of predicted snoRNA genes in medaka, zebrafish, frog, and birds is less than 200, but the numbers increased to 2,217 in the platypus, 992 in the mouse, and 744 in the rhesus monkey genome. As shown in Figure 4C, compared to Caenorhabditis elegans, in which nearly all snoRNAs exist as single copies (singletons), 30~60% of vertebrate snoRNA families have multiple paralogs, demonstrating that large-scale duplications of particular snoRNA families may have occurred during vertebrate evolution (Figure 4D). Among the 58 identified rhesus monkey snoRNA families with annotated orthologs in human and mouse, 14 are singletons, and the remaining 44 snoRNA families have 315 paralogs in the rhesus monkey genome (Additional File 7).
The expansion mode of snoRNAs differed among the species examined. For example, the rhesus monkey, mouse, and platypus genomes each contain no more than three copies of SNORAU13, but 439 copies may be found in the human genome. However, the SNORA17 family has no more than three copies in the rhesus and human genomes, but 354 members may be found in the mouse genome.
Duplication mechanisms of rhesus monkey snoRNAs
According to ENSEMBL annotations, eight rhesus monkey snoRNA families are predicted to have more than ten paralogs. As shown in Table 2, the majority of high-copy snoRNAs are present in the three examined mammalian species, and most are duplicated in a species-specific fashion. This suggests that most high-copy snoRNAs were replicated in recent evolutionary times, after the speciation of mammals. To explore driving forces for the high duplication rate of snoRNAs in mammalian species, we analyzed the flanking sequences of each paralog within individual snoRNA families to search for putative transposable elements mediating snoRNA expansion. We found that the paralogs of SNORA70 in the rhesus monkey and mouse genomes shared a ~490 bp consensus sequence in the 3' flanking regions (Figure 5). To investigate whether a particular transposable element (TE) mediated the duplication of SNORA70 in monkey and mouse genomes, we first searched for known TEs in the flanking sequences of SNORA70 using RepeatMasker . However, no known transposable element was identified in the flanking sequences. A genomic BLAST search of the consensus sequence did not show a high copy-number in either the rhesus monkey or mouse genome, suggesting that a novel TE did not exist in the consensus sequence. Thus, the duplication of SNORA70 paralogs most likely occurred via a non-TE mediated mechanism. The SNORA25 family includes 16 paralogs with apparently random distribution in the rhesus monkey genome. Each duplication unit possesses typical SINE-like retroposon structural features characterized by a poly(A) end and a target site duplication (TSD) . The 3'-flanking sequences of eleven SNORA25 paralogs of the rhesus monkey are shown in Figure 6A. Interestingly, six SNORA25 paralogs have multiple poly(A) sequences (Figure 6A), suggesting that some rhesus monkey SNORA25 sequences might have undergone several rounds of duplication, to create the variant paralogs (Figure 6B).
Two paralogs of rhesus monkey SNORA76 were also examined. One (designated as SNORA76a) is located in an intergenic region on chromosome 16, the other (designated SNORA76b) is located on chromosome 2 within the intron of NF-kappa-B inhibitor-interacting Ras-like protein 1 (nkiras 1). There is one copy of SNORA76 in the mouse genome. Based on syntenic region analysis between mouse and rhesus monkey, SNORA76a is likely to be the parental copy in the rhesus monkey genome. The SNORA76b paralog is probably a novel progeny copy that possibly arose after the divergence of rodents and primates. This paralog seems to be rhesus monkey-specific, as SNORA76b is absent in the syntenic region of the marmoset, orangutan, chimpanzee, and human. The 3'-flanking sequences of SNORA76b and SNORA76a share about 1,200 nt, suggesting that SNORA76a was translocated together with the 3' flanking sequence, from chromosome 16 to chromosome 2, to create the novel SNORA76b paralog (Figure 7A).
SINE-like expansion was also observed among some snoRNA families. The flanking sequences of SNORA76b contain a terminal poly(A), a TSD, and T2A4 derivatives preferably recognized by the L1 nicking endonuclease, all of which are features of SINE family transposons. Therefore, we hypothesize that SNORA76b may be a SINE-like retrogene generated using the L1 integration machinery. Figure 7B shows another example of snoRNA trans-duplication in the rhesus monkey genome. There are six copies of the SNORA24 gene in this genome. One copy of SNORA24 (SNORA24a) on chromosome 5 is located in the first intron of a gene termed the human snhg 8 homolog (small nucleolar RNA host gene 8; snhg 8). SNORA24b on chromosome 1 possesses characteristics typical of a SINE-like retrogene (with a TSD and a polyA structure) and the immediate downstream region of rhesus SNORA24b is composed of three segments that could be aligned to the 3' region of the first intron, and the entire sequences of exon 2 and exon 3, of the human snhg 8 gene, respectively. The genomic composition of the flanking region of rhesus monkey SNORA24b is evidence that this snoRNA locus was generated in an RNA-mediated retro-transposition event and that the transposed unit originated from a partially processed hnRNA of snhg8. As a result, SNORA24 together with the 3' segment of the sngh8 transcript and the polyA end thereof retroposed to a new locus on chromosome 1, the SNOR24b locus (Figure 7B). Apart from these two examples, we also identified another 22 potential rhesus monkey snoRNA retrogenes (Additional File 8). In summary, our data suggest that SINE-like retroposon-mediated retroposition might represent a driving force for rhesus monkey snoRNA expansion.
Analysis of snoRNA host genes
A large proportion of vertebrate snoRNAs are encoded in the introns of protein-coding or non-protein-coding genes. Although snoRNA host genes with ribosome-translation-related functions were the first to be reported, some snoRNAs are also hosted by non-ribosome or non-translation-related genes. Here, we systematically analyzed the functional spectrum of host genes for all intronic snoRNAs predicted in four representative vertebrates (the data are from ENSEMBL release 50), including medaka, frog, chicken, and rhesus monkey. As shown in Figure 8A, more than 80% of snoRNA host genes in medaka are ribosome-related protein-coding genes, whereas this percentage decreases to 30% in the rhesus monkey. Similar patterns were evident in the functional distribution of experimentally validated snoRNA host genes when the chicken and rhesus monkey were compared (Figure 8B). The data suggests that snoRNA-encoding genes expanded in the introns of non-ribosomal and non-translational protein-coding genes during vertebrate evolution.
We also searched for gene orthologs hosting snoRNAs in eight additional species, including C. elegans, fruit fly, medaka, zebrafish, frog, platypus, mouse, and human. Interestingly, we found that almost all chicken snoRNAs and host genes thereof had orthologs in humans and the rhesus monkey (Figure 8C and Figure 8D), suggesting that birds and primates shared not only snoRNAs but also the host genes from a common ancestor dating back more than 310 million years ago. A large proportion (about 80%) of rhesus monkey snoRNA host genes with ribosome- and translation-related functions have orthologs in the chicken genome, and the chicken orthologs are also hosts of snoRNAs (Figure 8E). However, only 37% of the orthologs of non-ribosome and non-translation-related rhesus monkey snoRNA host genes carried snoRNAs in the chicken genome (Figure 8F), indicating that the majority of monkey snoRNAs encoded by introns of non-ribosome-related genes appeared after the divergence of birds and mammals.
Recent studies have demonstrated that the functions of non-protein-coding RNAs may encompass almost every aspect of biological activity in normal development and disease biogenesis [21, 25, 42–45]. Rhesus macaques are a suitable primate model for basic and applied biomedical research [38, 39]. However, in contrast to the considerable literature on human and mouse ncRNAs, rhesus monkey ncRNAs have not previously been systematically characterized. Here, we performed a detailed screening of the rhesus monkey intermediate-size ncRNA transcriptome and cloned 117 rhesus monkey ncRNAs, including 80 snoRNAs, eight unclassified ncRNAs, and 29 known RNAs (snRNAs, Y RNA, and others). By comparative genomics analysis, we found several lineage- or species-specific snoRNAs. Genomic organization analysis showed that the majority of rhesus monkey snoRNAs have many paralogs in the rhesus monkey genome. By flanking sequence analysis, we found that SINE-like retroposon-mediated trans-duplication may have been an important mechanism in expansion of novel snoRNAs in the rhesus monkey genome.
Among the 117 identified rhesus monkey ncRNAs, eight unclassified ncRNA candidates could not be assigned to any known class of ncRNA. These eight unclassified ncRNAs were ubiquitously expressed in the six rhesus monkey tissues tested. Recently, we also identified nine unclassified ncRNAs from the chicken . Previous reports also showed that some ncRNAs obtained from cDNA library sequencing did not belong to any known ncRNA family, and these ncRNAs were designated as unclassified or unknown [13, 31, 47]. Hüttenhofer and coworkers found 57 such unclassified ncRNAs, of length 50~500 nt, in mouse brain cDNA libraries . Deng et al. reported 14 unclassified ncRNAs of length 70~200 nt in a C. elegans cDNA library . Yuan identified 29 unclassified ncRNAs by constructing cDNA libraries from four developmental stages of Drosophila melanogaster . These unclassified ncRNAs often show little sequence conservation and are less prevalent compared to known snoRNAs. However, these observations do not mean that the unclassified ncRNAs are non-functional [13, 31, 48]. The increasing number of newly identified unclassified ncRNAs suggests that other types/classes of ncRNA of intermediate size (50~500 nt) remain to be identified, and novel ncRNA families will likely be susceptible to classification using enhanced bioinformatic comparisons and extensive functional studies of the roles played by such ncRNAs.
Previous reports showed that the majority of known snoRNAs were conserved between human and mouse, at a level of 80~90% . Most rhesus monkey snoRNAs identified in the present study show high homology to those of the human and mouse. However, 13 snoRNAs had a conservation score below 0.6 (Table 1), suggesting that some snoRNAs are less conserved between primates and rodents. Using comparative genomics analysis, we found several lineage- or species-specific snoRNAs. Fifteen snoRNA families were ancient, being present at an early stage of vertebrate evolution, whereas 11 snoRNA families appeared after the divergence of birds and mammals. Fourteen young snoRNA families arose during mammalian evolution and one of these (SNORA15) developed only after primates had arisen. Our findings are in line with recent studies in other species. Previously, we found 30 chicken/bird-specific ncRNAs , and Schmitz reported 49 platypus-specific snoRNAs . Computer analysis of human genomic tiling array data revealed 300 putative candidates for classification as primate-specific ncRNAs . Together with previous reports, our data show that ncRNAs may play important roles in lineage development, or speciation, during evolution.
Although homologs of some rhesus monkey snoRNAs could be found in the mouse and human genomes, the expression of several snoRNAs was not detectable by northern blotting, suggesting that some snoRNA homologs might be pseudogenes without transcriptional potential in the human and/or mouse. Thus, we found only 14 potential primate-specific and eight rhesus monkey (or non-human primate)-specific transcripts (Table 1 and Figure 2). However, it remains possible that undetectable expression in the human or mouse might be attributable to transcriptional regulation by spatio-temporal, physiological, or pathological stimuli/stresses that were not present under the normal conditions prevalent when our tissue samples were taken. In support of this hypothesis, several examples of tissue-specific expression of ncRNAs have been reported in previous studies describing brain-specific snoRNAs or snoRNAs involved in neuronal development . By analogy, some microRNAs and piRNAs display specific spatio-temporal expression patterns, and play functional roles in cell differentiation and organogenesis during development [11, 12, 52, 53]. In the present study, we also found that SNORA71, ubiquitously expressed in human and rhesus monkey tissues, is predominantly expressed in the brain of mouse.
In vertebrates, most snoRNAs are located within introns of protein-coding or non-protein-coding genes [21, 54]. Some snoRNAs are present as several copies, either in different introns of the same gene or within introns of different genes [32, 55]. Genomic organization analysis showed that the majority of the rhesus monkey snoRNAs identified in this study have multiple paralogs in the rhesus genome, suggesting redundancy arising from duplication, including transposition. Diverse molecular mechanisms may be involved in the creation of protein-coding genes, such as gene duplication and retroposition . To investigate the mechanisms of rhesus monkey snoRNA expansion, we analyzed the flanking sequences of each snoRNA paralog and found that these sequences adjacent to some rhesus monkey snoRNAs have a typical SINE-like retroposon characterized by a poly(A) end and TSDs, suggesting that some rhesus monkey snoRNA paralogs are retrogenes formed by autonomous retroposon-mediated retroposition. In addition, the 5' flanking sequences of rhesus monkey SNORA76b and SNORA24b possess T2A4 motifs, which are preferentially recognized by the L1 retroposon-encoded nicking endonuclease, suggesting that SNORA76b and SNORA24b were generated from a parental copy by L1 integration machinery-mediated retroposition. Significantly, we found that six paralogs of SNORA25 also possess typical SINE-like retroposon characteristics, and contain multiple poly(A) sequences, indicating that SNORA25 underwent multiple duplication events during evolution. Thus, we propose a model involving retroposition for SNORA25 duplication. Recently, the mechanisms of snoRNA gene expansion in other species have been reported. In nematodes, some snoRNA paralogs were generated by cis- or trans-duplication . Other data suggest that mammalian snoRNA genes are SINE-like retroposons (snoRTs/snoRTEs), and that retroposition mediated by snoRTs may have played an important role in snoRNA expansion during evolution of the mammalian genome [33–35]. The extensive expansion of snoRNA-encoding genes during mammalian evolution might ensure the presence of a functional copy when a parental gene loses function because of mutation. On the other hand, novel paralogs could independently evolve to generate isoforms with different targets/functions, for example the acquisition of new sites complementary to modification regions of rRNAs .
In the present study, we provide the first experimentally-derived catalog of rhesus monkey ncRNAs. Small nucleolar RNAs (snoRNAs) comprise one of the largest groups of functionally diverse ncRNAs currently known to exist in eukaryotic cells. By performing northern blotting and comparative genomic analysis on rhesus monkey snoRNAs, we determined several features of interest. First, we identified several lineage- or species-specific snoRNAs. Moreover, we observed that the majority of snoRNAs have multiple paralogs in the rhesus monkey genome. Based on the data from the ENSEMBL genome annotation project, the total number of snoRNA-encoding genes was shown to have increased during vertebrate evolution. Our results demonstrate that SINE-like retroposon-mediated trans-duplication may have been a driving force for the expansion of novel snoRNAs in the rhesus monkey genome.
Animals and Ethics statement
Two year-old rhesus macaques (Macaca mulatta) were used in this study. For tissue sampling, monkeys were anesthetized with ketamine (25 mg/kg) and pentobarbital (30 mg/kg) and killed; tissues were removed, cut into blocks, and immediately frozen in liquid nitrogen for RNA isolation. Murine tissues were collected from six-month-old C57BL/6 mice. All experimental procedures were conducted in accordance with the protocols of the Chinese Academy of Medical Sciences and the Institutional Animal Care and Use Committee of Peking Union Medical College. Chicken tissues were collected from four week-old meat-type broilers (bred by a commercial company, Arbor Acres), in accordance with the policies of the Animal Care and Use Committee of China Agricultural University. Total RNA from human tissues was purchased from Shang Hai Haoran Biological Technology Co. Ltd., Shanghai, China.
Construction of rhesus monkey libraries enriched in ncRNAcDNA
Total RNA was isolated from mixed heart and skeletal muscle tissue of rhesus macaques. Full-length ncRNA-specific libraries of both capped and uncapped transcripts were generated according to a previously described method , with modifications. Total RNA was fractionated on Qiagen-tips with 0.6~1.0 M NaCl gradient elution employing QRW2 buffer (the protocol was taken from the Qiagen RNA/DNA handbook). Highly abundant rRNAs (5.8S rRNAs and 5S rRNAs) and snRNAs (U1 snRNA, U2 snRNA, U4 snRNA, and U5 snRNA) were removed from the small RNA fraction (50~500 nt) employing an Ambion MicrobExpress kit. The remaining RNAs were dephosphorylated with calf intestine alkaline phosphatase (Fermentas) and ligated to a 3' adaptor with T4 RNA ligase (Fermentas). After removal of excess 3' adaptor, the ligation products were split into two aliquots, of which one was treated with PolyNucleotide Kinase (PNK, Fermentas) to phosphorylate non-capped RNA, and the other was incubated with Tobacco Acid Pyrophosphatase (TAP, Epicentre) to remove 5'-end methyl-guanosine caps from capped RNA. Thereafter, both samples were ligated to the 5' adaptor and reverse transcribed with Thermoscript reverse transcriptase (RT) (Invitrogen) using oligo 3RT as the RT primer. cDNA was amplified by PCR over 13 cycles using Platinum Taq (Invitrogen) with the 3RT and 5AD primers, cloned into the vector pGEM-T, (Promega), and sequenced. All primer sequences used in this study are shown in Additional File 9.
Northern blot hybridization
Total RNA extracted from six rhesus monkey tissues (heart, liver, brain, kidney, spleen, and skeletal muscle), and skeletal muscle from human, mouse, and chicken, were separated by 8% (w/v) PAGE (with 7M urea) and transferred to nylon membranes (N+, Amersham). Probes detecting specific ncRNAs were labelled with digoxigenin (DIG)-11-UTP by in vitro transcription using T7 and SP6 RNA polymerase. The RNA blots were hybridized in ULTRAhyb (Ambion) at 68°C overnight, washed with 2 × SSC/0.1% (w/v) SDS washing buffer at 68°C for 2 × 5 min, followed by stringent washing with 0.1 × SSC/0.1% (w/v) SDS buffer at 68°C for 2 × 30 min. Thereafter, RNA blots were blocked with blocking buffer for 30~60 min at room temperature and incubated for 30 min with anti-DIG-alkaline phosphatase (AP) antibody (1:10,000, diluted in blocking buffer). Hybridization signals were detected using the CDP-star reagent (Roche). Chemiluminescent signals were detected on X-ray film.
Rhesus monkey ncRNA annotation
A total of 4,844 clones were sequenced from the rhesus monkey ncRNA libraries. The Staden package was used to trim vector and adaptor sequences, employing default parameters, and we obtained 4,059 insert sequences for further analysis. After removing redundant sequences, the remaining 2,164 unique sequences were annotated according to their degree of similarity to data in the NCBI nt database (2008-06 release), Rfam ncRNA sequences (8.1), ENSEMBL rhesus monkey ncRNAs and cDNA sequences (release 49), and NCBI rhesus monkey Refseq mRNAs (release 2008-05), using BLASTN (version 2.2.17). We filtered the alignments and retained only those with plus/plus strand matches and e-values above 1e-20. Sequence annotations from these alignments were combined in the priority: Rfam ncRNAs, NCBI nt sequences, ENSEMBL ncRNAs, NCBI refseq mRNAs, and ENSEMBL cDNA sequences. Structural alignment with known snoRNAs was performed using INFERNAL software . SnoReport software  was utilized to recognize two major classes of snoRNAs (H/ACA box- and C/D box-containing snoRNAs).
Target prediction of rhesus monkey snoRNAs
We downloaded sequences and annotations of rhesus tRNAs, rRNAs, snRNAs, and snoRNAs from the GtRNAdb and ENSEMBL databases [6, 59]. The guide sequences of C/D box snoRNA were defined by the region sandwiched by the C(C') box and D(D') box. Alignment between snoRNAs and the above mentioned RNA sequences was achieved using a modified BLASTN program. For each guide sequence of C/D box snoRNA, we selected one best-aligned target. The secondary structure of H/ACA box snoRNA was predicted using Mfold software . The guide sequences of H/ACA box snoRNA were identified as sequences within the internal loop of one (or both) snoRNA hairpin structures. We predicted target RNAs for H/ACA box snoRNAs by the following criteria. First, the target RNA should share at least seven nucleotides complementary in sequence to the flanking sequences of the junction sites between the stem and loop of the snoRNA guide sequence, and, second, the predicted pseudouridine site in the target RNA that paired to the 5' nucleotides of juncture sites in guide sequences should be a uridine.
Comparative genomic analysis of rhesus monkey snoRNAs
Genomic sequences of all examined species were downloaded from the UCSC genome browser , together with the genome annotations of ENSEMBL release 50 . The sequences, annotations, and genomic loci of vertebrate snoRNAs were originally predicted by INFERNAL software , supported by the Rfam database , and were next integrated into ENSEMBL . Conservation of rhesus monkey snoRNAs in human, mouse, and chicken genomes was examined using BLAST. Conservation scores were calculated based on the maximal alignment length and the identity of BLAST hits in each genome. Multi-alignment patterns for snoRNA sequence comparison among different primates were extracted from UCSC Hg18 alignment data after rhesus monkey snoRNA locations were converted to human genome positions employing the UCSC liftOver software. The genomic context, and annotations of protein-coding genes and their orthologs in other species, were downloaded using BioMart, employing the ENSEMBL genome annotation version described above . RepeatMasker  and CENSER  were used to search for simple repeats and transposons with known sequences. To locate low copy-number snoRNAs, we wrote PERL scripts to search for 5~50 bp repeats in the flanking sequences of rhesus monkey snoRNAs. To find interspersed high copy-number snoRNAs, we used ClustalW  and MEGA  software to search for consensus sequences in flanking regions within a 10 kb window of the gene of interest.
Kapranov P, Cawley SE, Drenkow J, Bekiranov S, Strausberg RL, Fodor SP, Gingeras TR: Large-scale transcriptional activity in chromosomes 21 and 22. Science. 2002, 296 (5569): 916-919. 10.1126/science.1068597.
Bertone P, Stolc V, Royce TE, Rozowsky JS, Urban AE, Zhu X, Rinn JL, Tongprasit W, Samanta M, Weissman S: Global identification of human transcribed sequences with genome tiling arrays. Science. 2004, 306 (5705): 2242-2246. 10.1126/science.1103388.
Cheng J, Kapranov P, Drenkow J, Dike S, Brubaker S, Patel S, Long J, Stern D, Tammana H, Helt G: Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution. Science. 2005, 308 (5725): 1149-1154. 10.1126/science.1108625.
Johnson JM, Edwards S, Shoemaker D, Schadt EE: Dark matter in the genome: evidence of widespread transcription detected by microarray tiling experiments. Trends Genet. 2005, 21 (2): 93-102. 10.1016/j.tig.2004.12.009.
Brosius J: Waste not, want not--transcript excess in multicellular eukaryotes. Trends Genet. 2005, 21 (5): 287-288. 10.1016/j.tig.2005.02.014.
Hubbard TJ, Aken BL, Ayling S, Ballester B, Beal K, Bragin E, Brent S, Chen Y, Clapham P, Clarke L: Ensembl 2009. Nucleic Acids Res. 2009, D690-697. 10.1093/nar/gkn828. 37 Database
Gardner PP, Daub J, Tate JG, Nawrocki EP, Kolbe DL, Lindgreen S, Wilkinson AC, Finn RD, Griffiths-Jones S, Eddy SR: Rfam: updates to the RNA families database. Nucleic Acids Res. 2009, D136-140. 10.1093/nar/gkn766. 37 Database
Carninci P, Kasukawa T, Katayama S, Gough J, Frith MC, Maeda N, Oyama R, Ravasi T, Lenhard B, Wells C: The transcriptional landscape of the mammalian genome. Science. 2005, 309 (5740): 1559-1563. 10.1126/science.1112014.
Pang KC, Stephen S, Dinger ME, Engstrom PG, Lenhard B, Mattick JS: RNAdb 2.0--an expanded database of mammalian non-coding RNAs. Nucleic Acids Res. 2007, D178-182. 10.1093/nar/gkl926. 35 Database
Ruby JG, Jan C, Player C, Axtell MJ, Lee W, Nusbaum C, Ge H, Bartel DP: Large-scale sequencing reveals 21U-RNAs and additional microRNAs and endogenous siRNAs in C. elegans. Cell. 2006, 127 (6): 1193-1207. 10.1016/j.cell.2006.10.040.
Girard A, Sachidanandam R, Hannon GJ, Carmell MA: A germline-specific class of small RNAs binds mammalian Piwi proteins. Nature. 2006, 442 (7099): 199-202.
Aravin A, Gaidatzis D, Pfeffer S, Lagos-Quintana M, Landgraf P, Iovino N, Morris P, Brownstein MJ, Kuramochi-Miyagawa S, Nakano T: A novel class of small RNAs bind to MILI protein in mouse testes. Nature. 2006, 442 (7099): 203-207.
Huttenhofer A, Kiefmann M, Meier-Ewert S, O'Brien J, Lehrach H, Bachellerie JP, Brosius J: RNomics: an experimental approach that identifies 201 candidates for novel, small, non-messenger RNAs in mouse. EMBO J. 2001, 20 (11): 2943-2953. 10.1093/emboj/20.11.2943.
Numata K, Kanai A, Saito R, Kondo S, Adachi J, Wilming LG, Hume DA, Hayashizaki Y, Tomita M: Identification of putative noncoding RNAs among the RIKEN mouse full-length cDNA collection. Genome Res. 2003, 13 (6B): 1301-1306. 10.1101/gr.1011603.
Mercer TR, Dinger ME, Sunkin SM, Mehler MF, Mattick JS: Specific expression of long noncoding RNAs in the mouse brain. Proc Natl Acad Sci USA. 2008, 105 (2): 716-721. 10.1073/pnas.0706729105.
Ravasi T, Suzuki H, Pang KC, Katayama S, Furuno M, Okunishi R, Fukuda S, Ru K, Frith MC, Gongora MM: Experimental validation of the regulated expression of large numbers of non-coding RNAs from the mouse genome. Genome Res. 2006, 16 (1): 11-19. 10.1101/gr.4200206.
Mercer TR, Dinger ME, Mattick JS: Long non-coding RNAs: insights into functions. Nat Rev Genet. 2009, 10 (3): 155-159. 10.1038/nrg2521.
Wilusz JE, Sunwoo H, Spector DL: Long noncoding RNAs: functional surprises from the RNA world. Genes Dev. 2009, 23 (13): 1494-1504. 10.1101/gad.1800909.
Maxwell ES, Fournier MJ: The small nucleolar RNAs. Annu Rev Biochem. 1995, 64: 897-934. 10.1146/annurev.bi.64.070195.004341.
Balakin AG, Smith L, Fournier MJ: The RNA world of the nucleolus: two major families of small RNAs defined by different box elements with related functions. Cell. 1996, 86 (5): 823-834. 10.1016/S0092-8674(00)80156-7.
Kiss T: Small nucleolar RNA-guided post-transcriptional modification of cellular RNAs. EMBO J. 2001, 20 (14): 3617-3622. 10.1093/emboj/20.14.3617.
Clouet d'Orval B, Bortolin ML, Gaspin C, Bachellerie JP: Box C/D RNA guides for the ribose methylation of archaeal tRNAs. The tRNATrp intron guides the formation of two ribose-methylated nucleosides in the mature tRNATrp. Nucleic Acids Res. 2001, 29 (22): 4518-4529. 10.1093/nar/29.22.4518.
Zemann A, op de Bekke A, Kiefmann M, Brosius J, Schmitz J: Evolution of small nucleolar RNAs in nematodes. Nucleic Acids Res. 2006, 34 (9): 2676-2685. 10.1093/nar/gkl359.
Darzacq X, Jady BE, Verheggen C, Kiss AM, Bertrand E, Kiss T: Cajal body-specific small nuclear RNAs: a novel class of 2'-O-methylation and pseudouridylation guide RNAs. EMBO J. 2002, 21 (11): 2746-2756. 10.1093/emboj/21.11.2746.
Kishore S, Stamm S: The snoRNA HBII-52 regulates alternative splicing of the serotonin receptor 2C. Science. 2006, 311 (5758): 230-232. 10.1126/science.1118265.
Ender C, Krek A, Friedlander MR, Beitzinger M, Weinmann L, Chen W, Pfeffer S, Rajewsky N, Meister G: A human snoRNA with microRNA-like functions. Mol Cell. 2008, 32 (4): 519-528. 10.1016/j.molcel.2008.10.017.
Saraiya AA, Wang CC: snoRNA, a novel precursor of microRNA in Giardia lamblia. PLoS Pathog. 2008, 4 (11): e1000224-10.1371/journal.ppat.1000224.
Tycowski KT, Aab A, Steitz JA: Guide RNAs with 5' caps and novel box C/D snoRNA-like domains for modification of snRNAs in metazoa. Curr Biol. 2004, 14 (22): 1985-1995. 10.1016/j.cub.2004.11.003.
Tycowski KT, Shu MD, Steitz JA: A small nucleolar RNA is processed from an intron of the human gene encoding ribosomal protein S3. Genes Dev. 1993, 7 (7A): 1176-1190. 10.1101/gad.7.7a.1176.
Kiss T, Filipowicz W: Exonucleolytic processing of small nucleolar RNAs from pre-mRNA introns. Genes Dev. 1995, 9 (11): 1411-1424. 10.1101/gad.9.11.1411.
Deng W, Zhu X, Skogerbo G, Zhao Y, Fu Z, Wang Y, He H, Cai L, Sun H, Liu C: Organization of the Caenorhabditis elegans small non-coding transcriptome: genomic features, biogenesis, and expression. Genome Res. 2006, 16 (1): 20-29. 10.1101/gr.4139206.
Kiss AM, Jady BE, Bertrand E, Kiss T: Human box H/ACA pseudouridylation guide RNA machinery. Mol Cell Biol. 2004, 24 (13): 5797-5807. 10.1128/MCB.24.13.5797-5807.2004.
Luo Y, Li S: Genome-wide analyses of retrogenes derived from the human box H/ACA snoRNAs. Nucleic Acids Res. 2007, 35 (2): 559-571.
Weber MJ: Mammalian small nucleolar RNAs are mobile genetic elements. PLoS Genet. 2006, 2 (12): e205-10.1371/journal.pgen.0020205.
Schmitz J, Zemann A, Churakov G, Kuhl H, Grutzner F, Reinhardt R, Brosius J: Retroposed SNOfall--a mammalian-wide comparison of platypus snoRNAs. Genome Res. 2008, 18 (6): 1005-1010. 10.1101/gr.7177908.
Kumar S, Hedges SB: A molecular timescale for vertebrate evolution. Nature. 1998, 392 (6679): 917-920. 10.1038/31927.
Gibbs RA, Weinstock GM, Metzker ML, Muzny DM, Sodergren EJ, Scherer S, Scott G, Steffen D, Worley KC, Burch PE: Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature. 2004, 428 (6982): 493-521. 10.1038/nature02426.
Gibbs RA, Rogers J, Katze MG, Bumgarner R, Weinstock GM, Mardis ER, Remington KA, Strausberg RL, Venter JC, Wilson RK: Evolutionary and biomedical insights from the rhesus macaque genome. Science. 2007, 316 (5822): 222-234. 10.1126/science.1139247.
Hernandez RD, Hubisz MJ, Wheeler DA, Smith DG, Ferguson B, Rogers J, Nazareth L, Indap A, Bourquin T, McPherson J: Demographic histories and patterns of linkage disequilibrium in Chinese and Indian rhesus macaques. Science. 2007, 316 (5822): 240-243. 10.1126/science.1140462.
Chen N: Using RepeatMasker to identify repetitive elements in genomic sequences. Curr Protoc Bioinformatics. 2004, Chapter 4 (Unit 4): 10-
Wicker T, Sabot F, Hua-Van A, Bennetzen JL, Capy P, Chalhoub B, Flavell A, Leroy P, Morgante M, Panaud O: A unified classification system for eukaryotic transposable elements. Nat Rev Genet. 2007, 8 (12): 973-982. 10.1038/nrg2165.
Storz G, Altuvia S, Wassarman KM: An abundance of RNA regulators. Annu Rev Biochem. 2005, 74: 199-217. 10.1146/annurev.biochem.74.082803.133136.
Plasterk RH: Micro RNAs in animal development. Cell. 2006, 124 (5): 877-881. 10.1016/j.cell.2006.02.030.
Prasanth KV, Spector DL: Eukaryotic regulatory RNAs: an answer to the 'genome complexity' conundrum. Genes Dev. 2007, 21 (1): 11-42. 10.1101/gad.1484207.
Couzin J: MicroRNAs make big impression in disease after disease. Science. 2008, 319 (5871): 1782-1784. 10.1126/science.319.5871.1782.
Zhang Y, Wang J, Huang S, Zhu X, Liu J, Yang N, Song D, Wu R, Deng W, Skogerbo G: Systematic identification and characterization of chicken (Gallus gallus) ncRNAs. Nucleic Acids Res. 2009, 37 (19): 6562-6574. 10.1093/nar/gkp704.
Vitali P, Royo H, Seitz H, Bachellerie JP, Huttenhofer A, Cavaille J: Identification of 13 novel human modification guide RNAs. Nucleic Acids Res. 2003, 31 (22): 6543-6551. 10.1093/nar/gkg849.
Yuan G, Klambt C, Bachellerie JP, Brosius J, Huttenhofer A: RNomics in Drosophila melanogaster: identification of 66 candidates for novel non-messenger RNAs. Nucleic Acids Res. 2003, 31 (10): 2495-2507. 10.1093/nar/gkg361.
Pang KC, Frith MC, Mattick JS: Rapid evolution of noncoding RNAs: lack of conservation does not mean lack of function. Trends Genet. 2006, 22 (1): 1-5. 10.1016/j.tig.2005.10.003.
Zhang Z, Pang AW, Gerstein M: Comparative analysis of genome tiling array data reveals many novel primate-specific functional RNAs in human. BMC Evol Biol. 2007, 7 (Suppl 1): S14-10.1186/1471-2148-7-S1-S14.
Cavaille J, Buiting K, Kiefmann M, Lalande M, Brannan CI, Horsthemke B, Bachellerie JP, Brosius J, Huttenhofer A: Identification of brain-specific and imprinted small nucleolar RNA genes exhibiting an unusual genomic organization. Proc Natl Acad Sci USA. 2000, 97 (26): 14311-14316. 10.1073/pnas.250426397.
Zhao Y, Ransom JF, Li A, Vedantham V, von Drehle M, Muth AN, Tsuchihashi T, McManus MT, Schwartz RJ, Srivastava D: Dysregulation of cardiogenesis, cardiac conduction, and cell cycle in mice lacking miRNA-1-2. Cell. 2007, 129 (2): 303-317. 10.1016/j.cell.2007.03.030.
Chen JF, Mandel EM, Thomson JM, Wu Q, Callis TE, Hammond SM, Conlon FL, Wang DZ: The role of microRNA-1 and microRNA-133 in skeletal muscle proliferation and differentiation. Nat Genet. 2006, 38 (2): 228-233. 10.1038/ng1725.
Tanaka-Fujita R, Soeno Y, Satoh H, Nakamura Y, Mori S: Human and mouse protein-noncoding snoRNA host genes with dissimilar nucleotide sequences show chromosomal synteny. RNA. 2007, 13 (6): 811-816. 10.1261/rna.209707.
Pelczar P, Filipowicz W: The host gene for intronic U17 small nucleolar RNAs in mammals has no protein-coding potential and is a member of the 5'-terminal oligopyrimidine gene family. Mol Cell Biol. 1998, 18 (8): 4509-4518.
Long M, Betran E, Thornton K, Wang W: The origin of new genes: glimpses from the young and old. Nat Rev Genet. 2003, 4 (11): 865-875. 10.1038/nrg1204.
Eddy SR, Durbin R: RNA sequence analysis using covariance models. Nucleic acids research. 1994, 22 (11): 2079-2088. 10.1093/nar/22.11.2079.
Hertel J, Hofacker IL, Stadler PF: SnoReport: computational identification of snoRNAs with unknown targets. Bioinformatics. 2008, 24 (2): 158-164. 10.1093/bioinformatics/btm464.
Chan PP, Lowe TM: GtRNAdb: a database of transfer RNA genes detected in genomic sequence. Nucleic Acids Res. 2009, D93-97. 10.1093/nar/gkn787. 37 Database
Zuker M: Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 2003, 31 (13): 3406-3415. 10.1093/nar/gkg595.
Kuhn RM, Karolchik D, Zweig AS, Wang T, Smith KE, Rosenbloom KR, Rhead B, Raney BJ, Pohl A, Pheasant M: The UCSC Genome Browser Database: update 2009. Nucleic Acids Res. 2009, D755-761. 10.1093/nar/gkn875. 37 Database
Smedley D, Haider S, Ballester B, Holland R, London D, Thorisson G, Kasprzyk A: BioMart--biological queries made easy. BMC Genomics. 2009, 10: 22-10.1186/1471-2164-10-22.
Kohany O, Gentles AJ, Hankus L, Jurka J: Annotation, submission and screening of repetitive elements in Repbase: RepbaseSubmitter and Censor. BMC Bioinformatics. 2006, 7: 474-10.1186/1471-2105-7-474.
Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994, 22 (22): 4673-4680. 10.1093/nar/22.22.4673.
Kumar S, Nei M, Dudley J, Tamura K: MEGA: a biologist-centric software for evolutionary analysis of DNA and protein sequences. Brief Bioinform. 2008, 9 (4): 299-306. 10.1093/bib/bbn017.
We thank Dr. Francesco Marincola for critical reading of the manuscript and Xueya Zhou for help in sequence analysis. This research was supported by the following grants: the National Basic Research Program of China (2007CB946903, 2007CB946901, 2005CB522405, 2009CB941602, and 2009CB825403), the National Natural Science Foundation of China (30721063 and 30871248), and the Chinese National Programs for High Technology Research and Development (2006AA10A121 and 2007AA02Z109). The funders played no role in study design, data collection, analysis, the decision to publish, or preparation of the manuscript.
YZ designed and performed the experiments and drafted the manuscript. JL carried out bioinformatics analysis and participated in manuscript preparation. CJ was responsible for animal care and tissue sampling. TL, JW, and YC participated in bioinformatics analysis. RW and XZ carried out experiments. RC, XJW, and DZ conceived of the study, participated in design and coordination, and helped to draft the manuscript. All authors read and approved the final manuscript.
Yong Zhang, Jun Liu contributed equally to this work.