Computational prediction and validation of C/D, H/ACA and Eh_U3 snoRNAs of Entamoeba histolytica
BMC Genomics volume 13, Article number: 390 (2012)
Small nucleolar RNAs are a highly conserved group of small RNAs found in eukaryotic cells. Genes encoding these RNAs are diversely located throughout the genome. They are functionally conserved, performing post transcriptional modification (methylation and pseudouridylation) of rRNA and other nuclear RNAs. They belong to two major categories: the C/D box and H/ACA box containing snoRNAs. U3 snoRNA is an exceptional member of C/D box snoRNAs and is involved in early processing of pre-rRNA. An antisense sequence is present in each snoRNA which guides the modification or processing of target RNA. However, some snoRNAs lack this sequence and often they are called orphan snoRNAs.
We have searched snoRNAs of Entamoeba histolytica from the genome sequence using computational programmes (snoscan and snoSeeker) and we obtained 99 snoRNAs (C/D and H/ACA box snoRNAs) along with 5 copies of Eh_U3 snoRNAs. These are located diversely in the genome, mostly in intergenic regions, while some are found in ORFs of protein coding genes, intron and UTRs. The computationally predicted snoRNAs were validated by RT-PCR and northern blotting. The expected sizes were in agreement with the observed sizes for all C/D box snoRNAs tested, while for some of the H/ACA box there was indication of processing to generate shorter products.
Our results showed the presence of snoRNAs in E. histolytica, an early branching eukaryote, and the structural features of E. histolytica snoRNAs were well conserved when compared with yeast and human snoRNAs. This study will help in understanding the evolution of these conserved RNAs in diverse phylogenetic groups.
Small nucleolar RNAs (snoRNAs) are a special class of small non coding RNAs localized to the nucleolus. They belong to two major categories; box C/D and box H/ACA snoRNAs, based on the presence of short consensus sequence motifs. H/ACA box snoRNAs guide the pseudouridylation while C/D box snoRNAs guide the site specific 2'-o-ribose methylation during post transcriptional modification of pre rRNA[2–4]. Such modification is accomplished by complementary base pairing between specific regions of the snoRNA and target RNA by the small nucleolar ribonucleoprotein complex which guides the modification of target RNA. Some snoRNAs are also known to perform functions other than the modification of ribosomal RNAs, e.g. U3, U17, U8, U14, and U22. The U3 snoRNA is an exceptional member of the box C/D class, and is involved in early pre rRNA cleavage in the 5’ external transcribed spacer (ETS) in yeast cells, mouse extracts, and Xenopus oocyte extracts. Depletion of this snoRNA impairs the formation of mature 18 S rRNA. Other exceptions include C/D snoRNA U8, U22 and an H/ACA snoRNA U17/snR30 which are required for pre-rRNA cleavage. They are not involved in rRNA and nuclear RNA modification. Some snoRNAs are involved in both pre-rRNA cleavage as well as modification e.g. U14 (C/D) and snR10 (H/ACA). Several snoRNAs lack any known target site, and are called orphan snoRNAs. These snoRNAs might have undiscovered functions, which may or may not concern rRNAs. Evidence in this respect is the role of orphan C/D box snoRNA (SNORD115) in regulation of alternative splicing.
Structural motifs are one of the important distinguishing features of snoRNAs. The characteristic structural motifs in C/D box snoRNAs are RUGAUGA for C box and CUGA for D box. In H/ACA box snoRNAs the H box is ANANNA and ACA box is ACA, arranged in a hairpin, hinge, hairpin, tail structure[14, 15]. C/D box snoRNAs are about 60–100 bases in size, while H/ACA snoRNAs are 120–160 bases. Vertebrate snoRNAs are typically encoded from introns of protein coding genes while in plants they are transcribed as polycistronic transcripts. In yeast most of them are transcribed from independent promoters. Amongst protozoan parasites, snoRNAs have been extensively studied in Trypanosoma brucei and Plasmodium falciparum[20–22]. In the latter it was shown for the first time that snoRNA genes may be located in UTRs. Strikingly, both organisms showed a much larger number of methylation sites compared with pseudouridylation sites.
A number of bioinformatic tools are available for the scanning of genomic sequences for snoRNAs. These include Snoscan and snoSeeker (CDSeeker and ACASeeker) for the search of C/D and H/ACA box snoRNAs. In this study, we have carried out a genome wide analysis of the early branching parasitic protist Entamoeba histolytica for identification of C/D and H/ACA box snoRNAs in this organism. A computational search for structural motifs gave hits out of which false positives having no identifiable target sites were removed. This was achieved by aligning the rRNA of E. histolytica with rRNAs of five eukaryotic organisms Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster, Saccharomyces cerevisiae and Homo sapiens separately, whose snoRNAs and target sites are already known[25–27]. The computational analysis was combined with experimental validation.
Results and discussion
Computational identification of putative snoRNAs from E. histolytica by snoscan and snoSeeker
Target site modifications by snoRNAs are commonly conserved amongst distant eukaryotes. We therefore selected five eukaryotic organisms: A. thaliana, C. elegans, D. melanogaster, S. cerevisiae, H. sapiens, whose methylation sites and pseudouridylation (psi) sites are known and used these to find putative sites in E. histolytica rRNA by aligning its 5.8 S, 28 S and 18 S rRNA sequences with rRNAs of the selected organisms separately (Additional file1: Figure S1). Each of the mapped methylation and psi sites were picked as putative modification sites in E. histolytica. We could identify a total of 173 putative methylation sites and 126 putative psi sites in E. histolytica. A large fraction of these (53%) matched with yeast and human sites. 24 novel methylation sites were also found in E. histolytica. The programs snoscan and snoSeeker (CDSeeker); and snoSeeker (ACASeeker) were used to identify the putative sequences for C/D and H/ACA box snoRNAs respectively in E. histolytica whole genome. The initially predicted snoRNAs (41705 C/D box and 661 H/ACA box) were further analyzed to eliminate false positive candidates using the following criteria (Figure1). Firstly, we selected snoRNAs that could target the putative modification sites obtained by aligning the rRNA of E. histolytica with the five organisms listed above. SnoRNAs that could potentially target 23 predicted methyl sites and 41 psi sites in E. histolytica were thus selected. Secondly, we set a threshold value, the final logarithmic odd score, that incorporated information from each of the snoRNA features and fetched out the snoRNAs having final score equal or more than the threshold value[24, 26]. The threshold values used are given in “Methods”. Thirdly; we looked for the genomic localization of these snoRNAs and selected those coming from intergenic regions and introns. We also selected snoRNAs from genic regions for which the logarithmic odd score was well above the threshold (45 bits for H/ACA and 20 bits for C/D box snoRNAs)[24, 26]. Lastly, we did BLASTn analysis of predicted snoRNAs with EST database of E. histolytica. All those snoRNAs giving hits with ESTs were discarded. Finally we obtained a total of 99 snoRNAs of which 41 were C/D box (34 guide and 7 orphan snoRNAs) and 58 were H/ACA box (43 guide and 15 orphan snoRNAs). We have named the genes encoding the putative snoRNAs so as to indicate firstly the type of snoRNA (Me or ACA), followed by species name (Eh) and the modification site in rRNA (where predicted) or orphan (where it is not known), e.g. ACA-Eh-SSU-1315 represents H/ACA type of snoRNA of E. histolytica which is predicted to modify SSU rRNA at position 1315 (Tables1,2,3).
We compared the predicted E. histolytica snoRNAs with those of S. cerevisiae, H. sapiens and the two protozoan parasites (T. brucei and P. falciparum) on the basis of homology with conserved antisense sequences that guide the respective modifications for the two snoRNA classes (Table4). We found 9 C/D guide snoRNAs out of 34 which showed homology with P. falciparum snoRNAs, and 10/34 which showed homology with T. brucei snoRNAs, while in yeast and human this number was 14/34 (with yeast) and 11/34 (with human). Only 4 E. histolytica H/ACA box snoRNAs out of 43 showed homology with P. falciparum snoRNAs and 2/43 showed homology with T. brucei snoRNAs, while the homology with yeast was 14/43 and with human was 18/43. The conservation of modification sites between these organisms was as follows. Of the sites predicted to be modified in E. histolytica rRNAs (47 methylation sites and 41 pseudouridylation sites), 16 methylation sites and 21 pseudouridylation sites were conserved in at least one of the other four organisms (Table4). Taking the two modification sites together, 30 sites were conserved between E. histolytica and S. cerevisiae, 31 between E. histolytica and H. sapiens, 13 sites between E. histolytica and P. falciparum, and 12 sites were conserved between E. histolytica and T. brucei. Seven modification sites of E. histolytica were shared by all the four organisms. We also found 7 and 15 orphan snoRNAs in the C/D and H/ACA categories respectively. Orphan snoRNAs are important as they may act on RNA substrates other than mature rRNAs. As mentioned before, one of the roles of orphan snoRNAs is reported for human HBII-52 snoRNA, which is a C/D orphan snoRNA and regulates alternative splicing of the serotonin receptor 2 C. Similarly, some orphan H/ACA box snoRNAs may function in other aspects of RNA biogenesis. For example, the human U17 box H/ACA snoRNA and its yeast orthologue, snR30, plays an essential role in the nucleolytic processing of 18 S rRNA from pre rRNA. We checked for sequence complementarity of the antisense elements in our predicted orphan snoRNAs with the E. histolytica data base. For two C/D orphan snoRNAs (Additional file2: Figure S2) the possible antisense element (upstream to D' box and/or D box) showed complementary base paring with mRNAs of EHI_192630 and EHI_008070 genes in E. histolytica. Further we checked whether the predicted orphan snoRNAs were found in the small RNA data base of E. histolytica (generated in our lab by next generation sequencing). We found that 14 of 22 orphan snoRNAs were detected in this data base.
All of the predicted E. histolytica snoRNAs possessed conserved structural motifs characteristic of each class. Secondary structure of the predicted H/ACA snoRNAs was determined by ACASeeker. All of the predicted 58 H/ACA snoRNAs adopted the consensus folding pattern as shown using VARNA: Visualization Applet for RNA. A representative of H/ACA snoRNA is shown in Additional file3: Figure S3 A. As expected the H/ACA box snoRNAs formed hairpin-hinge-hairpin-tail structure with H box lying in hinge region and ACA box at 3' tail region. Unlike ACASeeker, the C/D box prediction tool did not provide the secondary structure information. Therefore the secondary structure of C/D box was predicted with RNA fold (rna.tbi.univie.ac.at/cgi-bin/RNAfold.cgi) and structures were drawn using VARNA: Visualization Applet for RNA. Secondary structures obtained for C/D box snoRNAs were similar to the published structures for these RNAs (Additional file3: Figure S3 B).
The genome sequence of other Entamoeba species is now becoming available. We checked these data bases to look for close matches to the predicted snoRNAs of E. histolytica. Of the 58 predicted H/ACA snoRNAs we found 36 in E. dispar and 47 in E. nuttalli, while of the 41 predicted C/D box RNAs we found 33 in E. dispar and 36 in E. nuttalli. There was a high level of sequence similarity (77-100%), which was expected with E. dispar and E. nuttalli since they are very closely related to E. histolytica. However when the same analysis was done with a distant species E. invadens, which infects reptiles, we found only 1 H/ACA and 2 C/D snoRNAs matching with E. histolytica. Although this result could also be a reflection of the quality of sequence assembly, it shows that E. invadens has diverged significantly from E. histolytica. Sequence comparison of conserved genes, e.g. rRNA genes also shows high divergence between E. histolytica and E. invadens[33, 34].
Validation of computationally predicted snoRNAs by RT-PCR and northern hybridization
To demonstrate whether the predicted snoRNAs are indeed expressed in E. histolytica cells we selected 24 snoRNAs to represent different categories, namely guide/orphan; and gene location in genic/intergenic regions. Accordingly 8 C/D box guide and orphan snoRNAs were selected (5 intergenic, 1 intronic, 1 in UTR and 1 genic) as also the U3 snoRNA; and 15 H/ACA box guide and orphan snoRNAs were selected (8 intergenic, 7 genic). Expression analysis of these snoRNAs was performed by RT-PCR using total RNA from E. histolytica and specific primers for each snoRNA designed from the ends of the predicted snoRNA sequence (Additional file4: Table S1 for primer sequences). RT-PCR products were obtained for all snoRNAs tested (Figure2). Amplicons of predicted size (as obtained by genomic PCR with the same primers using total DNA of E. histolytica) were observed for all C/D box snoRNAs and most of the H/ACA box snoRNAs. For three of the H/ACA snoRNAs somewhat smaller size amplicons were observed (Figure2B, marked by asterisk). A possible explanation for this is provided later. To further validate the RT-PCR results northern blot analysis was performed with RNA enriched in small RNA species. DNA probes from four C/D box and nine H/ACA box snoRNAs tested by RT-PCR were used. Results showed detectable bands corresponding to all snoRNAs tested (Figure3), although intensities of bands were not the same for all, possibly reflecting differential expression levels. For the four C/D box snoRNAs and U3 snoRNA tested, the sizes of observed bands were consistent with the predicted sizes (Figure3C). However several of the H/ACA snoRNAs showed bands in addition to the predicted sizes. These bands may represent mature snoRNAs obtained after processing, as has been reported in other species. Some of these processing events may involve splicing of internal sequences, resulting in shorter size amplicons in RT-PCR. The multiple bands observed in some of the H/ACA snoRNAs indicate that these may be present as both single and double hairpin RNAs, as is known in other species. On the other hand, northern blot analysis of ACA-Eh-SSU626 indicates the existence of double hairpin H/ACA snoRNA alone in this case; while ACA-Eh-SSU1315, ACA-Eh-SSU1345, ACA-Eh-LSU2809 and ACAEhOrph13 seem to exist as single hairpin alone. Thus, the experimental analysis using RT-PCR and northern blotting demonstrate that the snoRNA predictions by computational analysis are indeed valid and correspond to authentic snoRNA genes.
Genomic organization of snoRNAs in E. histolytica
The genomic location of all snoRNAs (C/D-box, H/ACA-box and orphan) was determined (Tables1,2,3). The majority (69%) of snoRNA genes mapped to intergenic regions, while 20% mapped to protein-coding regions where snoRNAs were encoded either from the opposite strand of the protein coding gene (12%) or from the same strand (8%). A small number of snoRNA genes were located in other parts of protein-coding genes, e.g. in the 5’-UTR (3%), 3’-UTR (3%), and intron (1%). 4% of the genes mapped to non annotated regions (Additional file5: figure S4). We checked for proximity of snoRNA genes with protein-coding genes involved in ribosome biogenesis, e.g. ribosomal protein genes and genes encoding nucleolar-localized proteins. A gene was considered proximal if it was found within 1 kb of the snoRNA gene. Of the 68 intergenically-located snoRNA genes, 5 were found close to ribosomal protein genes. Of 20 genically-located snoRNA genes 3 were found close to ribosomal protein genes and 1 was close to the gene for fibrillarin, a component of the C/D box snoRNP, while of 6 snoRNA genes located in UTR 1 was located close to ribosomal protein gene (Table1,2,3). Me-Eh-LSU-U1176a was present close to rRNA methyltransferase gene. Therefore a substantial number of snoRNA genes were physically close to genes of related function. The remaining snoRNA genes were located close to functionally diverse genes, e.g. genes involved in cellular signal transduction, DNA (cytosine-5)-methyltransferase gene, heat shock genes etc. When the genomic location of E. histolytica snoRNA genes was compared with that of other organisms, some striking similarities were observed. For example, the H/ACA snoRNA ACA-Eh-SSU1216 is localized to the ORF of a hypothetical protein and encoded from its opposite strand. Interestingly the yeast H/ACA snoRNA snR35, which is homologous to ACA-Eh-SSU1216 is also located in an ORF for a hypothetical protein and expressed form the opposite strand. Like in E. histolytica, several of the Drosophila snoRNA genes are located in the coding strand of a host gene. It was proposed that in such cases alternative splicing may occur, giving rise to two different RNA species, exhibiting different functions, from the same pre-mRNA; an mRNA translated into a protein, and a small non-messenger RNA (snmRNA) functioning as the snoRNA. A striking feature in P. falciparum is that some of the snoRNA genes are located in the 3’-UTRs. This feature was found in E. histolytica also, where 3 snoRNA genes were localized to 3’-UTRs. Additionally 3 snoRNA genes were also found in 5’-UTRs- a feature not reported in any other system so far. Although we have not experimentally validated the assignment of snoRNA genes to UTRs, these assignments are likely to be correct since we found that snoRNA genes overlapped with protein-coding region of the gene as well as the UTR. In one case (Me-Eh-5.8 S-U84 snoRNA, which is transcribed from the opposite strand of UTR region of receptor protein kinase gene (EHI_021310) we have validated the presence of this snoRNA by RT-PCR as well as northern blotting.
snoRNA genes in other organisms are known to be present both in single and multiple copies, and some may also be in clusters. In E. histolytica we found that 80% of the genes were single copy while the rest were in multiple copies. Our data shows that at least in two instances the snoRNA genes may be present in clusters and may be co-transcribed. 1) The snoRNA genes ACA-Eh-SSU1212 and ACA-Eh-5.8 S84 are 126 bp apart and are transcribed from the opposite strand of EHI_098580 gene. Due to their proximity and presence in the opposite strand of the same gene, it is likely that these two genes may be transcribed together and may exist in a cluster. 2) The four identical copies of ACA-Eh-LSU2997a snoRNA genes (located in Scaffold DS572347) are separated from one another by a sequence of 206–214 bp, which is also identical in the four copies. We tried to locate promoters in the 206–214 bp intergenic region of these snoRNA genes using bioinformatic tools (Promoter2.0 prediction server, neural network promoter prediction) but did not find any promoters. The upstream region of the very first copy of snoRNA may have a promoter but this could not be checked computationally as this region was right at the start of the scaffold. It is possible that these four genes may be co-transcribed as a single unit (polycistronic) and may constitute a cluster.
Structural features of E. histolytica box H/ACA and box C/D snoRNAs
H/ACA snoRNAs typically fold into a characteristic hairpin-hinge-hairpin-tail structure in which base-paired stems alternate with single-stranded regions (hinge and tail). The H box is located at the hinge and the ACA box is located at the 3' tail, 3 nt away from the 3' end of the snoRNA. The site for guiding uridine modification of the target RNA is always located 14–16 nts upstream of the H box and/or the ACA box[38, 39]. This guide site consists of 8–18 base stretch which is complementary to the target RNA. It is located in an internal bulge or recognition loop in each hairpin and contacts the target RNA containing the unpaired uridine to be modified. Each H/ACA snoRNA can guide the modification of one uridine or two uridines which may be located in the same or different target RNAs. Thus the H/ACA snoRNA may contain only one or both functional loops. In E. histolytica all the H/ACA snoRNAs (Table5) adopted the hairpin-hinge-hairpin-tail structure. Some variations were observed, e.g. in some cases the guide sequence may extend into the adjoining P1 and P2 stems flanking the recognition loop (Additional file3: Figure S3 A). Of 43 guide H/ACA snoRNAs in E. histolytica, 5 snoRNAs (ACA-Eh-LSU1107a, ACA-Eh-SSU631, ACA-Eh-LSU2288, ACA-Eh-LSU1159b, ACA-Eh-LSU1107b) possessed both the functional antisense regions which can either guide the same or different substrate rRNAs. For example, ACA-Eh-SSU631 is predicted to guide the modification of uridine in 18 S rRNA at 2 different positions, 631 and 1114; whereas, ACA-Eh-LSU2288 can guide the modification of uridine at position 1431 in 18 S and at position 2288 in 28 S rRNA (Table2). Three H/ACA snoRNAs show potential of directing two pseudouridylations by a single guide sequence (Additional file6: Figure S5), as has been reported in other organisms e.g. ACA19 in human. It is proposed that RNAs get folded into alternate structures thus targeting multiple sites. Overall we found 41 psi sites guided by 43 H/ACA guide snoRNAs. We also found some sites which may be subjected to both methylation as well as pseudouridylation. In human, U3797 position of 28 S rRNA is subjected to methylation as well as pseudouridylation. Similarly in E. histolytica, the residue LSU1176 could be guided by C/D box snoRNAs Me-Eh-LSU-U1176a, Me-Eh-LSU-U1176b and Me-Eh-LSU-U1176c as well as by an H/ACA box snoRNA: ACA-Eh-LSU1176. The target site corresponding to LSU1176 is known to get methylated in Arabidopsis thaliana (SnoR41Y C/D snoRNA modifying at 25 S:U1064) and pseudouridylated in S. cerevisiae (snR49 H/ACA snoRNA modifying at 25 S:U990)[25, 29]. Similarly the 5.8 S84 site could be guided by C/D box snoRNA Me-Eh-5.8 S-U84 as well as H/ACA box snoRNA ACA-Eh-5.8 S84.
The C/D box snoRNAs typically possess the conserved boxes C (RUGAUGA) and D (CUGA) near the 5' and 3' ends, respectively. A short region upstream of C box and downstream of D box usually shows base complementarity. Base-pairing in this region brings the C and D boxes close together. In addition to C and D boxes, some snoRNAs of this class also possess C' and D' boxes which are less conserved and form a folded structure in the order 5’-C/D'/C'/D-3’. The 2'-O-ribose methylation of the target RNA is guided by one or two 10-21nt antisense elements located upstream of the D and/or D' boxes in a manner such that the modified base is paired with the snoRNA nucleotide located precisely 5nts upstream of the D or D' box[3, 4]. All 41 C/D box snoRNAs in E. histolytica had the conserved motifs: C box and D box. The C box had the consensus sequence RUGA [U/g/c/a]G[A/u]. The sequence of D box in two of the C/D box snoRNA genes Me-Eh-LSU-U3580b and Me-Eh-SSU-U871 was AUGA. All of the other snoRNA genes possessed the consensus CUGA sequence in the D box. 71% of these RNAs possessed the D’ box as well (Table6). The D' box is much less conserved and it varied from CUGA to CAGA, UUGA, AUGA, ACCA and CCGA. All the C/D box snoRNAs possessed at least one antisense element upstream to either the D’ box or D box. Me-Eh-SSU-A1183 snoRNA gene had two antisense elements and was able to guide different target sites of the same or different rRNAs (Additional file7: Figure S6A) whereas Me-Eh-SSU-G1535 and Me-Eh-SSU-A790 had single antisense element upstream to D’ box which could guide multiple sites for methylation in different rRNAs (Additional file7: Figure S6B (i-ii)). Five C/D box snoRNAs with a single antisense stretch in each were predicted to target different sites in the same target RNA (Additional file7: Figure S6C (i-v)). From the predicted folding pattern 60% C/D box snoRNAs possessed the terminal stem while the rest either lacked it or had an external stem, or an internal stem.
Computational identification and validation of multiple copies of U3 snoRNA in E. histolytica
U3 snoRNA belongs to the C/D box snoRNA category and performs the specialized function of site specific cleavage of rRNA during pre-rRNA processing. It is present in all eukaryotic organisms either as a single copy or in multiple copies. BLASTn analysis of yeast and human U3 snoRNA with E. histolytica whole genome revealed the presence of 5 copies of U3 snoRNA (Eh_U3a-e) in E. histolytica. These were 97-99% identical to each other and ranged in size from 209–225 nt. All copies were located in intergenic regions (Table7A) and their sequences are given in Table7B. The characteristic boxes- box GAC, A’, A, C, B, box C and box D of E. histolytica U3 snoRNA were conserved (Figure4) when compared with U3 snoRNAs of selected organisms (H. sapiens, Leishmania major and Leishmania tarentolae). The Eh_U3 snoRNA was well conserved with respect to T. brucei and T. cruzi. However, it showed poor homology with P. falciparum U3 snoRNA. Sequence conservation was greater at 5’ end up to central hinge domain, with less conservation in the 3’ hairpin region. We checked for the conservation of U3 snoRNA among Entamoeba species and found 6 copies of U3 snoRNA with 91% identity in E. dispar (Table7A) and 1 copy with 96% identity in E. nuttalli. No homology was observed for E. invadens. To validate the predicted U3 snoRNA in E. histolytica we did RT-PCR and northern blotting with total RNA (Figure2A,3A). RT-PCR was performed using specific primers for U3 snoRNAs (Additional file4: Table S1). The predicted and the observed sizes as obtained by both RT-PCR and northern were the same. The sequencing of one of the clones of the RT-PCR product confirmed the presence of Eh_U3e copy of U3 snoRNA.
Ribosome biogenesis in eukaryotic cells requires the activity of a highly conserved set of small RNAs, the snoRNAs. In this study we show that the parasitic protist, E. histolytica, thought to be an early branching eukaryote, possesses the major classes of snoRNAs as judged by sequence conservation with yeast and human. These RNAs are expressed at fairly high levels as they are readily detectable by northern blots. It is relevant to ask whether E. histolytica, being a human parasite, has evolved any snoRNA features uniquely shared by other parasitic protozoa infecting humans. Amongst these organisms, studies on snoRNAs have mainly been reported with P. falciparum and T. brucei. When the features of E. histolytica snoRNAs are compared with these organisms, the following points emerge. Both in P. falciparum and E. histolytica some snoRNA genes are located in the 3’- UTR, a property not reported in any other organism except Drosophila where an H/ACA-like snoRNA is reported to be present in 3’ UTR. In addition, some E. histolytica snoRNA genes are also found in the 5'-UTR which is unique to this organism so far. Both in P. falciparum and E. histolytica most (80%) snoRNA genes are present in single copy whereas in T. brucei most of the snoRNA clusters are repeated in the genome with few clusters carrying single copy genes. The clustering of snoRNA genes is frequent in P. falciparum and T. brucei. We have reported two instances in E. histolytica where these genes may be clustered. Unlike P. falciparum where 9 snoRNA genes are found in introns, we could locate only one snoRNA gene in an intron, while the majority of them were in intergenic regions, whereas no intronic snoRNA has been reported in T. brucei so far. Like T. brucei, E. histolytica also possesses single hairpin H/ACA snoRNAs which are likely to be processed from a double hairpin pre-H/ACA snoRNA into single hairpin snoRNAs, whereas in P. falciparum single hairpin H/ACA snoRNA has not been reported. Unlike T. brucei which possesses H/AGA box, both P. falciparum and E. histolytica contain the highly conserved H/ACA box. In contrast to P. falciparum and T. brucei where the number of methylation sites is much larger than psi sites, in E. histolytica we find an almost equal number of both kinds of modifications. There are 47 methylation sites and 41 psi sites. In overall sequence, E. histolytica snoRNAs are much more homologous to yeast and human than to P. falciparum and T. brucei.
The greater sequence homology of E. histolytica snoRNAs with yeast and human compared with the two parasite species, and the lack of any particular snoRNA features unique to all three parasite species shows that this highly conserved RNA modification machinery is unlikely to be linked to pathogenesis and each parasite species has evolved its own distinct snoRNA features. This study will help to further understand the evolution of these conserved RNAs in diverse phylogenetic groups and will be very useful in future studies on pre rRNA processing in E. histolytica.
Extraction of putative methylation and pseudouridylation sites in rRNA of E. histolytica
We used the known methylation and psi sites of five different eukaryotic organisms: A. thaliana, C. elegans, D. melanogaster, S. cerevisiae and H. sapiens to find putative methylation and psi sites in E. histolytica rRNA (5.8 S, 18 S and 28 S). Alignment of rRNA of E. histolytica and selected five organisms was carried by EMBOSS pair wise alignment tool separately (Additional file1: Figure S1). This gave us putative 173 methylation and 126 psi sites.
Search for E. histolytica C/D box snoRNAs
Snoscan and CDSeeker were used to score potential guide and orphan C/D box snoRNAs respectively from the whole genome sequence (WGS) of E. histolytica. WGS was downloaded from ncbi [NCBI:AAFB00000000] (updated on April 17, 2008). The tools were initially used with this file and the results obtained were checked periodically online with the updated genome file. Snoscan is based on the greedy search algorithm. It identifies six features in the genome: box C, box D, a region of sequence complementary to target RNA, box D' if the rRNA complementary region is not adjacent to box D, the predicted methylation site based on the complementary region and the terminal stem, if present. CDSeeker can be used to find both guide as well as orphan C/D box RNA but in the present study it was used to find orphan C/D box snoRNAs in E. histolytica. The CDSeeker program combines probabilistic model, conserved primary and secondary structure motifs to search orphan C/D snoRNAs in whole genome sequence. It searches for same features described for snoscan but for the search of orphan C/D box snoRNAs it looks for predicted conserved functional region next to box D or D' (if D' is present). Both the tools need genomic DNA sequence and rRNA sequences as an input requirement (optional for CDSeeker). All hits that had scored higher than 14 bits were selected as positive guide C/D box snoRNAs. For orphan C/D box snoRNAs, score was set to be 18 bits. These threshold values given are those used for S. cerevisiae (for guide snoRNAs) and the default value used in CDseeker (for orphan snoRNAs). BLASTn analysis of predicted snoRNAs with EST database of E. histolytica revealed the authenticity of predicted snoRNAs. To find the homology between closely related species E. dispar, E. nuttalli and E. invadens, we did BLASTn analysis of selected snoRNAs with WGS of E. dispar SAW760 (NCBI: AANV02000000) E. nuttalli P19 (AGBL01000000) and E. invadens IP1 (NCBI: AANW02000000).
Search for E. histolytica H/ACA box snoRNAs
ACASeeker was used to screen out potential guide and orphan H/ACA box snoRNAs similarly as mentioned above for CDSeeker. ACASeeker program combines probabilistic model, conserved primary and secondary structure motifs to search orphan and guide H/ACA snoRNAs in whole genome sequence. It identifies following features common for both orphan and guide H/ACA box snoRNA genes: box H, box ACA, hairpin 1, hairpin 2, and hairpin-hinge-hairpin. For guide snoRNA genes, another feature: two regions of sequence complementary to target RNA in a hairpin, was taken into account. This tool needs WGS and the list of putative psi sites (optional) as an input requirement. We have provided the list of putative psi sites (as obtained in method section 1) thus 186 guide H/ACA snoRNAs were predicted on the basis of putative sites and 475 snoRNAs with no putative sites were predicted as orphan H/ACA snoRNAs. The threshold value was 40 bits and 27 bits for H/ACA guide and orphan snoRNAs respectively, which was the cutoff used to train the software SnoSeeker on vertebrate snoRNAs. The snoRNAs were further analyzed for genomic localization in intron, intergenic region or from the ORF of protein coding genes. BLASTn analysis of predicted snoRNAs with EST database of E. histolytica revealed the authenticity of predicted snoRNAs. To find the homology between closely related species E. dispar, E. nuttalli and E. invadens, we did BLASTn analysis of selected snoRNAs with WGS of E. dispar SAW760 (NCBI: AANV02000000) E. nuttalli P19 (AGBL01000000) and E. invadens IP1 (NCBI: AANW02000000).
Validation of snoRNAs by RT-PCR and northern hybridization
Total RNA was isolated from mid log phase trophozoites (~ 5x106cells) using Trizol reagent (Invitrogen) as per manufacturer's instruction. DNase I (Roche)-treated RNA sample (5 μg) was reverse transcribed at 37°C using MMLV (USB) with specific reverse primers (Additional file4: Table S1) as per protocol prescribed by manufacturer, followed by PCR with forward primers. PCR with genomic DNA was used as control. Oligonucleotides used for RT and RT- PCR reactions are listed in Additional file4: Table S1. For northern analysis total RNA and total RNA enriched in small RNA from ~ 5x106 cells was isolated using trizol (invitrogen) and miRNA isolation kit (Ambion) respectively as per manufacturer's instructions. 15 μg of total RNA enriched in small RNA was resolved on a 12% denaturing urea PAGE gel. For Eh_U3 snoRNA 10 μg of total RNA was electrophoresed on 1.2% denaturing agarose and transferred to Genescreen plusR membrane (Perkin Elmer). Probes were prepared by random priming method (NEB blot kit). Hybridization was carried out in buffer (1 M NaCl and 0.5% SDS) at 42°C for 36 hrs. Post hybridization washing of membrane was done as per instructions suggested by manufacturer. Blot was exposed for 48 hrs in imaging plate of phosphorimager for autoradiography.
Balakin AG, Smith L, Fournier MJ: The RNA world of the nucleolus: two major families of small RNAs defined by different box elements with related functions. Cell. 1996, 86: 823-834. 10.1016/S0092-8674(00)80156-7.
Ganot P, Bortolin ML, Kiss T: Site-specific pseudouridine formation in preribosomal RNA is guided by small nucleolar RNAs. Cell. 1997, 89: 799-809. 10.1016/S0092-8674(00)80263-9.
Kiss-László Z, Henry Y, Bachellerie JP, Caizergues-Ferrer M, Kiss T: Site-Specific Ribose Methylation of Preribosomal RNA: A Novel Function for Small Nucleolar RNAs. Cell. 1996, 85: 1077-1088. 10.1016/S0092-8674(00)81308-2.
Cavaillé J, Nicoloso M, Bachellerie JP: Targeted ribose methylation of RNA in vivo directed by tailored antisense RNA guides. Nature. 1996, 383: 732-735. 10.1038/383732a0.
Hughes JM, Ares M: Depletion of U3 small nucleolar RNA inhibits cleavage in the 5’ external transcribed spacer of yeast preribosomal RNA and impairs formation of 18 S ribosomal RNA. EMBO J. 1991, 10: 4231-4239.
Kass S, Tyc K, Steitz JA, Sollner-Webb B: The U3 small nucleolar ribonucleoprotein functions in the first step of preribosomal RNA processing. Cell. 1990, 60: 897-908. 10.1016/0092-8674(90)90338-F.
Mougey EB, Pape LK, Sollner-Webb B: A U3 small nuclear ribonucleoprotein-requiring processing event in the 5’ external transcribed spacer of Xenopus precursor rRNA. Mol Cell Biol. 1993, 13: 5990-5998.
Peculis BA, Steitz JA: Disruption of U8 nucleolar snRNA inhibits 5.8 S and 28 S rRNA processing in the Xenopus oocyte. Cell. 1993, 73: 1233-1245. 10.1016/0092-8674(93)90651-6.
Tycowski KT, Shu MD, Steitz JA: Requirement for intron-encoded U22 small nucleolar RNA in 18 S ribosomal RNA maturation. Science. 1994, 266: 1558-1561. 10.1126/science.7985025.
Morrissey JP, Tollervey D: Yeast snR30 is a small nucleolar RNA required for 18 S rRNA synthesis. Mol Cell Biol. 1993, 13: 2469-2477.
Dunbar DA, Baserga SJ: The U14 snoRNA is required for 2'-O-methylation of the pre-18 S rRNA in Xenopus oocytes. RNA. 1998, 4: 195-204.
King TH, Liu B, McCully RR, Fournier MJ: Ribosome structure and activity are altered in cells lacking snoRNPs that form pseudouridines in the peptidyl transferase center. Mol Cell. 2003, 11: 425-435. 10.1016/S1097-2765(03)00040-6.
Kishore S, Stamm S: The snoRNA HBII-52 Regulates Alternative Splicing of the Serotonin Receptor 2 C. Science. 2006, 311: 230-232. 10.1126/science.1118265.
Kiss-László Z, Henry Y, Kiss T: Sequence and structural elements of methylation guide snoRNAs essential for site-specific ribose methylation of pre-rRNA. EMBO J. 1998, 17: 797-807. 10.1093/emboj/17.3.797.
Ganot P, Caizergues-Ferrer M, Kiss T: The family of box ACA small nucleolar RNAs is defined by an evolutionarily conserved secondary structure and ubiquitous sequence elements essential for RNA accumulation. Genes Dev. 1997, 11: 941-956. 10.1101/gad.11.7.941.
Filipowicz W, Pogacić V: Biogenesis of small nucleolar ribonucleoproteins. Curr Opin Cell Biol. 2002, 14: 319-327. 10.1016/S0955-0674(02)00334-4.
Leader DJ, Clark GP, Watters J, Beven AF, Shaw PJ, Brown JW: Clusters of multiple different small nucleolar RNA genes in plants are expressed as and processed from polycistronic pre-snoRNAs. EMBO J. 1997, 16: 5742-5751. 10.1093/emboj/16.18.5742.
Dieci G, Preti M, Montanini B: Eukaryotic snoRNAs: a paradigm for gene expression flexibility. Genomics. 2009, 94: 83-88. 10.1016/j.ygeno.2009.05.002.
Liang XH, Uliel S, Hury A, Barth S, Doniger T, Unger R, Michaeli S: A genome-wide analysis of C/D and H/ACA-like small nucleolar RNAs in Trypanosoma brucei reveals a trypanosome-specific pattern of rRNA modification. RNA. 2005, 11: 619-645. 10.1261/rna.7174805.
Mishra PC, Kumar A, Sharma A: Analysis of small nucleolar RNAs reveals unique genetic features in malaria parasites. BMC Genomics. 2009, 10: 68-10.1186/1471-2164-10-68.
Chakrabarti K, Pearson M, Grate L, Sterne-Weiler T, Deans J, Donohue JP, Ares M: Structural RNAs of known and unknown function identified in malaria parasites by comparative genomics and RNA analysis. RNA. 2007, 13: 1923-1939. 10.1261/rna.751807.
Raabe CA, Sanchez CP, Randau G, Robeck T, Skryabin BV, Chinni SV, Kube M, Reinhardt R, Ng GH, Manickam R, Kuryshev VY, Lanzer M, Brosius J, Tang TH, Rozhdestvensky TS: A global view of the nonprotein-coding transcriptome in Plasmodium falciparum. Nucleic Acids Res. 2010, 38: 608-617. 10.1093/nar/gkp895.
Schattner P, Brooks AN, Lowe TM: The tRNAscan-SE, snoscan and snoGPS web servers for the detection of tRNAs and snoRNAs. Nucleic Acids Res. 2005, 33: W686-W689. 10.1093/nar/gki366.
Yang JH, Zhang XC, Huang ZP, Zhou H, Huang MB, Zhang S, Chen YQ, Qu LH: snoSeeker: an advanced computational package for screening of guide and orphan snoRNA genes in the human genome. Nucleic Acids Res. 2006, 34: 5112-5123. 10.1093/nar/gkl672.
snoRNA orthological gene database.http://snoopy.med.miyazaki-u.ac.jp/,
Lowe TM, Eddy SR: A computational screen for methylation guide snoRNAs in yeast. Science. 1999, 283: 1168-1171. 10.1126/science.283.5405.1168.
Eo HS, Jo KS, Lee SW, Kim CB, Kim W: A combined approach for locating box H/ACA snoRNAs in the human genome. Mol Cells. 2005, 20: 35-42.
Bachellerie JP, Cavaillé J, Hüttenhofer A: The expanding snoRNA world. Biochimie. 2002, 84: 775-790. 10.1016/S0300-9084(02)01402-5.
Piekna-Przybylska D, Decatur WA, Fournier MJ: New bioinformatic tools for analysis of nucleotide modifications in eukaryotic rRNA. RNA. 2007, 13: 305-312. 10.1261/rna.373107.
Lestrade L, Weber MJ: snoRNA-LBME-db, a comprehensive database of human H/ACA and C/D box snoRNAs. Nucleic Acids Res. 2006, 34: D158-D162. 10.1093/nar/gkj002.
Darty K, Denise A, Ponty Y: VARNA: Interactive drawing and editing of the RNA secondary structure. Bioinformatics. 2009, 25: 1974-1975. 10.1093/bioinformatics/btp250.
Takano J, Tachibana H, Kato M, Narita T, Yanagi T, Yasutomi Y, Fujimoto K: DNA characterization of simian Entamoeba histolytica-like strains to differentiate them from Entamoeba histolytica. Parasitol Res. 2009, 105: 929-937. 10.1007/s00436-009-1480-3.
Wang Z, Samuelson J, Clark CG, Eichinger D, Paul J, Van Dellen K, Hall N, Anderson I, Loftus B: Gene discovery in the Entamoeba invadens genome. Mol Biochem Parasitol. 2003, 129: 23-31. 10.1016/S0166-6851(03)00073-2.
Bhattacharya A, Satish S, Bagchi A, Bhattacharya S: The genome of Entamoeba histolytica. Int J Parasitol. 2000, 30: 401-410. 10.1016/S0020-7519(99)00189-7.
Yuan G, Klämbt C, Bachellerie JP, Brosius J, Hüttenhofer A: RNomics in Drosophila melanogaster: Identification of 66 candidates for novel non-messenger RNAs. Nucleic Acids Res. 2003, 31: 2495-2507. 10.1093/nar/gkg361.
Liang XH, Liu L, Michaeli S: Identification of the first trypanosome H/ACA RNA that guides pseudouridine formation on rRNA. J Biol Chem. 2001, 276: 40313-40318.
Li SG, Zhou H, Luo YP, Zhang P, Qu LH: Identification and Functional Analysis of 20 Box H/ACA Small Nucleolar RNAs (snoRNAs) from Schizosaccharomyces pombe. J Biol Chem. 2005, 280: 16446-16455. 10.1074/jbc.M500326200.
Bortolin ML, Ganot P, Kiss T: Elements essential for accumulation and function of small nucleolar RNAs directing site-specific pseudouridylation of ribosomal RNAs. EMBO J. 1999, 18: 457-469. 10.1093/emboj/18.2.457.
Ni J, Tien AL, Fournier MJ: Small nucleolar RNAs direct site-specific synthesis of pseudouridine in ribosomal RNA. Cell. 1997, 89: 565-573. 10.1016/S0092-8674(00)80238-X.
Wu H, Feigon J: H/ACA small nucleolar RNA pseudouridylation pockets bind substrate RNA to form three-way junctions that position the target U for modification. Proc Natl Acad Sci USA. 2007, 104: 6655-6660. 10.1073/pnas.0701534104.
Xiao M, Yang C, Schattner P, Yu YT: Functionality and substrate specificity of human box H/ACA guide RNAs. RNA. 2009, 15: 176-186.
Darzacq X, Kiss T: Processing of intron-encoded box C/D small nucleolar RNAs lacking a 5', 3’-terminal stem structure. Mol Cell Biol. 2000, 20: 4522-4531. 10.1128/MCB.20.13.4522-4531.2000.
Charette JM, Gray MW: Comparative analysis of eukaryotic U3 snoRNA, U3 snoRNA genes are multi-copy and frequently linked to U5 snRNA genes in Euglena gracilis. BMC Genomics. 2009, 10: 528-10.1186/1471-2164-10-528.
Huang ZP, Chen CJ, Zhou H, Li BB, Qu LH: A combined computational and experimental analysis of two families of snoRNA genes from Caenorhabditis elegans, revealing the expression and evolution pattern of snoRNAs in nematodes. Genomics. 2007, 89: 490-501. 10.1016/j.ygeno.2006.12.002.
This work was supported by a grant to SB from DST and DBT, fellowship by DBT to DK and RS and fellowship from CSIR to VK and AKG. We gratefully acknowledge the helpful discussions with Dr. P. C. Mishra.
The authors declare that they have no competing interests.
SB proposed and designed the research, drafted the final version of the manuscript, AB designed and analyzed the computational work. DK and RS performed the computational work. AKG and VK performed the experiments regarding RT-PCR and Northern blotting. All authors have participated in preparing the manuscript. All authors have read and approved the final manuscript.
Electronic supplementary material
Additional file 1: Figure S1. Global alignment of lsu rRNA of S. cerevisiae and E. histolytica to predict the putative modification sites in E. histolytica. Red and yellow dots are already known methylation and pseudouridylation sites of S. cerevisiae respectively. Blue and green dots are the putative methylation and pseudouridylation sites of E. histolytica respectively. (PDF 114 KB)
Additional file 2: Figure S2. Orphan C/D box snoRNAs and putative antisense element in mRNAs: Two C/D orphan snoRNAs with possible antisense element (upstream to D' box and/or D box) showed complementary base paring with mRNAs of the indicated genes in E. histolytica. (PDF 19 KB)
Additional file 3: Figure S3. Predicted secondary structure of E. histolytica snoRNA. Secondary structure of H/ACA box snoRNA (A) and C/D box snoRNA (B) drawn using VARNA visualization tool. Antisense elements are represented by bases colored in green and location of conserved boxes is indicated. (PDF 93 KB)
Additional file 5: Figure S4. Genomic distribution of predicted snoRNAs in E. histolytica. Pie chart representing localization of predicted snoRNAs in E. histolytica genome. (PDF 34 KB)
Additional file 6: Figure S5. H/ACA snoRNAs guiding two sites with single guide sequence: Predicted pseudouridylation guide duplexes between snoRNA and rRNA are shown. The convention followed by has been adopted. snoRNA sequences in a 5’ to 3’ orientation are shown in upper strands, whereas rRNA sequence in 3’ to 5’ orientation are shown in lower strands. The conserved motifs are in bold text. (PDF 79 KB)
Additional file 7: Figure S6. C/D box snoRNAs with predicted antisense element and target RNAs. C/D box snoRNA with two antisense stretch sequence present upstream to D’ and D box (A). Single antisense stretch guiding two different target RNAs (B i-ii). Single antisense stretch guiding different sites in single target RNAs (C i-v). snoRNA sequences in a 3’ to 5’ orientation are shown in lower strand, whereas rRNA sequence in 5’ to 3’ orientation are shown in upper strand. The conserved motifs are in bold text. (PDF 298 KB)
About this article
Cite this article
Kaur, D., Gupta, A.K., Kumari, V. et al. Computational prediction and validation of C/D, H/ACA and Eh_U3 snoRNAs of Entamoeba histolytica. BMC Genomics 13, 390 (2012). https://doi.org/10.1186/1471-2164-13-390
- U3 snoRNA
- Guide/ orphan snoRNAs
- Entamoeba histolytica