- Research article
- Open Access
A comparative genome-wide study of ncRNAs in trypanosomatids
© Doniger et al; licensee BioMed Central Ltd. 2010
- Received: 25 January 2010
- Accepted: 4 November 2010
- Published: 4 November 2010
Recent studies have provided extensive evidence for multitudes of non-coding RNA (ncRNA) transcripts in a wide range of eukaryotic genomes. ncRNAs are emerging as key players in multiple layers of cellular regulation. With the availability of many whole genome sequences, comparative analysis has become a powerful tool to identify ncRNA molecules. In this study, we performed a systematic genome-wide in silico screen to search for novel small ncRNAs in the genome of Trypanosoma brucei using techniques of comparative genomics.
In this study, we identified by comparative genomics, and validated by experimental analysis several novel ncRNAs that are conserved across multiple trypanosomatid genomes. When tested on known ncRNAs, our procedure was capable of finding almost half of the known repertoire through homology over six genomes, and about two-thirds of the known sequences were found in at least four genomes. After filtering, 72 conserved unannotated sequences in at least four genomes were found, 29 of which, ranging in size from 30 to 392 nts, were conserved in all six genomes. Fifty of the 72 candidates in the final set were chosen for experimental validation. Eighteen of the 50 (36%) were shown to be expressed, and for 11 of them a distinct expression product was detected, suggesting that they are short ncRNAs. Using functional experimental assays, five of the candidates were shown to be novel H/ACA and C/D snoRNAs; these included three sequences that appear as singletons in the genome, unlike previously identified snoRNA molecules that are found in clusters. The other candidates appear to be novel ncRNA molecules, and their function is, as yet, unknown.
Using comparative genomic techniques, we predicted 72 sequences as ncRNA candidates in T. brucei. The expression of 50 candidates was tested in laboratory experiments. This resulted in the discovery of 11 novel short ncRNAs in procyclic stage T. brucei, which have homologues in the other trypansomatids. A few of these molecules are snoRNAs, but most of them are novel ncRNA molecules. Based on this study, our analysis suggests that the total number of ncRNAs in trypanosomatids is in the range of several hundred.
- Small ncRNA
- Primer Extension Assay
- Triplet Periodicity
- ncRNA Family
- ncRNA Candidate
Non-coding RNA (ncRNA) genes produce functional RNA molecules, but these molecules do not encode for protein products; rather, these RNA molecules directly participate in various cellular processes. For many years, only a few such ncRNA molecules were known, mainly represented by transfer-RNA (tRNA), ribosomal-RNA (rRNA), small nuclear RNA (snRNA) and small nucleolar RNA (snoRNA). The possible existence of additional types of ncRNA molecules was given little consideration, as the fundamental biological principle was that almost all genes are translated into proteins. As a result, most studies have focused their efforts primarily on protein discovery. The appreciation for the role of untranslated RNAs in the cell has changed dramatically over the past decade. Recent work has shown that the incidence and importance of ncRNA molecules has been underestimated [1–3]. ncRNAs are emerging as key players in multiple layers of cellular regulation [4–7]. In addition, it has been speculated that there are many additional types of ncRNA that have yet to be discovered.
However, systematic computational and experimental identification of these molecules has been difficult. The challenge of predicting ncRNAs from primary sequence is that they lack the known signals, such as start and stop codons as well as the triplet periodicity, which are distinguishing features of protein coding genes. Furthermore, discriminating between ncRNAs and protein-coding mRNAs is not a trivial task. ncRNAs, especially long ones, may contain open reading frames [8, 9].
Over the years, several tools for identifying specific ncRNA family members have been developed. These programs generally exploit the fact that some ncRNA classes have relatively well-defined sequence and/or structural characteristics (i.e. tRNAs , snoRNAs (H/ACA [11–14] and C/D ) and miRNA [16, 17]). General non-family specific tools for identifying ncRNA genes have had more limited success. Many ncRNAs have conserved secondary structures, despite having primary sequences that are often highly variable. This resulted in compensatory changes during evolution that are consistent with the conservation of a consensus secondary structure, and can be detected by a stochastic context-free grammar (SCFG) or hidden Markov models (HMMs) that may be used in conjunction with thermodynamic stability (i.e. qRNA , RNAz ).
ncRNA molecules can be experimentally detected by selecting for small molecules and preparing a cDNA library as was demonstrated by . Most recently, the next generation sequencing technologies have become powerful tools for ncRNA discovery (see ). However, laboratory techniques for identifying RNA molecules are often expensive, time-consuming, and labor-intensive. In addition, these experimental methods have a bias toward highly abundant molecules and can miss RNAs that are only present under specific physiological conditions or during specific developmental stages. Thus, in silico methods for identifying RNA molecules have greatly complemented experimental work [22–24].
With the availability of many whole genome sequences, comparative analysis has become a powerful tool to study sequence similarities and differences between various organisms. Comparative genomics is an approach that has been used to aid in the discovery of genes, regulatory elements and gene structure [25–27]. It has also been shown as a powerful tool for identifying ncRNA [28–32].
Comparative genomics can serve as a powerful filter for ncRNA; it sifts genomic DNA and yields a subset of sequences that are enriched for ncRNA sequences. Comparative genome-wide studies for the purpose of detecting ncRNAs have been performed in a range of organisms from bacteria to humans. The number of predicted ncRNAs across the evolutionary scale varies widely. In human and higher vertebrates, computational [33, 34] and experimental studies [35, 36] indicate a number of putative ncRNAs in the range of tens of thousands. In contrast, in urochordates , nematodes , and drosophilids  the predicted numbers are lower, in the range of several thousand. Lower eukaryotes, such as yeast , and Plasmodium[40, 41] are predicted to have ncRNAs in the range of several hundred. Studies of ncRNAs in prokaryotes, such as E. coli and other bacteria [18, 24, 42, 43], suggest that the number of ncRNAs is in the low hundreds.
Trypanosomes are unicellular parasites, and are the cause of several devastating diseases affecting humans (e.g. Chagas disease and African sleeping sickness). Trypanosomatids are known for their non-conventional gene expression mechanisms, including RNA editing , and trans-splicing, a process that is required for the maturation of all mRNAs in these organisms whereby a small exon, encoded by a small RNA, the SL RNA, is donated to all pre-mRNA [45, 46]. Trypanosomes have also been used as model organisms to study ncRNA, and over the years the U snRNAs , 7SL RNA  and snoRNAs [48–52] were described. However, many ncRNAs that have been found in other eukaryotes have not been identified in trypanosomes, such as many snoRNAs involved in RNA processing, RNase P, and telomerase RNA. These molecules remain elusive despite the fact that computer programs (i.e. Snoscan ) exist that are specifically designed to search for some classes of ncRNA (i.e. C/D), and are appropriate for identifying trypanosome homologues in genome-scale searches . Based on experimental data from mapping of ribose methylation sites on ribosomal RNA in T. brucei, many C/D molecules that guide those modifications still remain to be discovered . Many of the undiscovered ncRNA may have weak or novel motifs that would be impossible to identify without the use of comparative genomics. There have been several in silico genome-wide studies in trypanosomes to search for snoRNAs [14, 51, 53]. Recently, a genome-wide computational study of functional RNA elements in T. brucei was published. The genomes of T. brucei and L. braziliensis were compared using a binomial-based model to assess conservation followed by a QRNA  analysis. After filtering by QRNA score and annotation, a total of 53 ncRNA candidates were reported.
Here, we describe a systematic in silico screen to identify conserved non-protein-coding genes across multiple trypanosomatid genomes, and prediction of 72 sequences as novel ncRNA candidates. The expression of 50 candidates was tested in laboratory experiments; 18 molecules were shown to be expressed, and for 11 of them there is strong evidence that they represent novel short ncRNAs in procyclic stage T. brucei, or their homologues in the other trypansomatids. The RNAs that do not belong to the previously described most abundant families of small RNAs, such as C/D and H/ACA snoRNAs or RNAs binding the Sm or Lsm proteins, were termed RNAs of Unknown Function (RUFs).
Assessment of performance
Assessment of performance on known ncRNA found in GeneDB
Type of ncRNA
Annotated in Genedb v4
All 6 genomes
4 of 6 genomes
During the analysis, all known and hypothetical protein coding genes were filtered by comparing the coordinates of the candidate sequences to those of the annotation. This filter left a pool of 125 potential ncRNA candidates that are conserved in a minimum of four of the six genomes. However, the initial filtering of annotated sequences was based on comparing the coordinates of the sequences as appears in GeneDB. This comparison is fast, but it may miss proteins because of coordinate annotation problems, which are quite common. Thus, we checked the 125 candidates further by direct sequence comparison to see if they match any annotated gene. As annotation in the T. brucei genome is incomplete, we compared our candidates against the annotated genes of both T. brucei and L. major. Using BLAST comparison versus the T. brucei and L. major annotated sequences, a significant number of candidates (47 of the 125) were found to be highly similar to known coding sequences. Most of these sequences were simply a result of incomplete genome annotation. For example, several of the ribosomal RNA proteins (LmjF28.2460 ribosomal protein S29, putative and LmjF36.3750 40S ribosomal protein S27), which are highly conserved, were not annotated in T. brucei. We also found six previously described RNAs that are not reported in GeneDB. For example, the screen identified selenocysteine-tRNA , whose sequence had been unannotated in the genome, while instead sRNA-76  was labeled as selenocysteine-tRNA, and was also identified in the final set (candidate 7). A list of the additional RNA genes that have been reported previously in the literature, but have not yet been incorporated or are misannotated in the GeneDB genome annotation is provided as Supplementary Material (Additional File 2). These include MRP RNA , snR30 , U5 , tRNA-sec , sRNA-76 , and several previously identified snoRNA clusters [14, 49, 51].
Number of candidates from the different subtypes of RNA
4 of 6 genomes
All 6 genomes
We checked for redundancy between the 72 candidates and found that Candidates # 85 and #90 shared 98% identity to each other and 70% identity with candidate #78. Candidates # 89 and #99 shared almost 100% identity to each other and 88% identity with candidate # 124. Candidates 68 and 70 shared 63% identity. Interestingly none of these candidates were among the molecules that we were able to validate experimentally.
Fifty of the 72 candidates in the final set were chosen for experimental validation. Fifteen were chosen from the sequences that were conserved in six of six genomes, and the rest were chosen randomly from the remaining candidates. The list of candidates sent for experimental verification appears in the comments to Additional File 3, and Additional File 4 provides the list of primers. Eighteen of the 50 candidates were shown to be expressed in cells. Expression was detected by primer extension assay that exactly determines the 5' end of the molecule. The strength of the signal reflected the abundance of the RNA, as the same amount of radio labelled primer and RNA were used in each experiment. Note that we did not use an internal control of very abundant RNA because it often affects the ability of non-abundant RNA to prime. Rather, we performed the primer extension using U3 snRNA as an internal control. This RNA was chosen because it is stable and tends not to degrade. However, the presence of the U3 oligo in the reaction reduced the efficiency of extension from our tested RNA (see Additional File 5).
Note that even some of the candidates that were not expressed may still be ncRNAs that are expressed in another part of the parasite's life cycle. We analyzed expression only in procyclic form, and it is possible that the other RNAs are stage specific and are expressed only at 37°C when the parasite lives in the mammalian host. Indeed, we previously identified snoRNAs that are expressed better in the bloodstream form . However, for the purposes of evaluating the performance of our procedure, we considered candidates that did not show a distinct band in our assay as false predictions.
Although Northern blot would be a better approach to show that the identified candidates are indeed small RNAs, the majority of the novel RNAs identified by this study were not abundant. There are only two that were abundant as determined by primer extension: tRNA-sec and candidate #28. tRNA-sec does appear abundant in the Northern blot, but candidate #28, while appearing strong by primer extension, gives a relatively weak band by Northern analysis (see Additional File 6); hence the remaining molecules, which were not abundant on the primer extension assay, are not likely to be clearly detected by Northern analysis. Note that in these two cases where we compared primer extension with Northern analysis, the sizes of the molecules were consistent.
In order to evaluate the performance of our prediction scheme we needed to estimate the True Positive (TP), True Negative (TN), False Positive (FP) and False Negative (FN) rates of the ncRNA prediction. TP represents the predictions that turn out to be correct, and our analysis yielded 379 (the number of known ncRNA molecules that we "identified") plus the 11 molecules we confirmed experimentally. FN can be estimated by the number of known ncRNA that our methods missed and there are 180 such molecules (559 known ncRNA molecules minus the 379 detected). FP corresponds to the number of predictions that were shown to be wrong which is 39 (50-11). Calculating the TN values is meaningless since most of the genome is not comprised of ncRNA. Thus, the calculated TN value would be in the millions, and while this would make the performance measures that are dependent on TN (like Specificity which is defined as TN/(TN+FP)) seem to be extremely good, this doesn't reflect true performance characteristics.
However, even when ignoring TN, we can estimate the Sensitivity (defined as TP/(TP+FN) to be 0.68 and the Positive Predictive Value (PPV also known as Precision, defined by TP/(TP+FP)) to be 0.9. Note that if we consider the additional 18 molecules that show expression (although with multiple bands) as positive predictions, as well, the score would be even somewhat higher.
To examine if the novel RNAs belong to known families of RNA, we examined their level in T. brucei cells depleted of core RNA proteins by RNAi-silencing. NOP58 silenced cells (previously described ) were used to classify RNAs as C/D snoRNAs, and CBF5 silenced cells  were used to identify H/ACA RNAs. Five of the identified RNA species were assigned to their respective families (4 C/Ds, and 1 H/ACA), and the others remain RNAs of unknown function (RUFs). The level of the RUFs was examined in cells silenced for the C/D and H/ACA core proteins as described above, and in cells depleted for Lsm8 and SmD1, and their levels were unchanged, suggesting that these are novel small RNAs, not belonging to known classes of small RNAs, and have binding proteins that are yet to be discovered.
Novel RNA Candidates of unknown Function (RUFs)
We divide the Discussion into two sections; the first section deals with the technical aspects of the comparative genomics procedure, while the second will describe the implications of our findings on the repertoire of ncRNA molecules in Trypanosomes.
FASTA versus BLAST for RNA comparative genomics
For the purpose of RNA comparative genomics, one has to choose the most appropriate tool to efficiently compare the genomes with optimal sensitivity for detecting homologous ncRNA. BLAST  and FASTA  are the two popular heuristic programs for searching query sequences against a sequence database. Several papers have been published benchmarking the performance of BLAST and FASTA in protein-coding similarity searches [60, 61]. One study  evaluated the sensitivity and specificity for the detection of ncRNA based on a variety of homology methods including BLAST and FASTA. Overall, FASTA was found to be more sensitive in detecting ncRNA than BLAST. In addition, FASTA's performance in detecting ncRNA was found to be comparable to WU-BLAST , though FASTA's run- time was faster. Nonetheless in the ncRNA community at large, the most popular tool of choice has been and continues to be BLASTN (i.e. [12, 30, 39, 42, 64]). As a test case for the preferential homology search methodology, the detection of a known snoRNA cluster (LM25Cs1) in L. major was examined. BLASTn and FASTA searches were performed using as the query a 100 kb area in L. major which included the snoRNA cluster, versus the whole T. brucei genome as the database. Based on our results from this small sample, FASTA, using the default settings, is more sensitive at identifying ncRNA even when we used more sensitive parameters for BLASTN (-r 1, -q -1 instead of the default +1/-3, personal communication William Pearson). We also tested the sensitivity of performing the sequence comparison programs on the whole 100 kb, and on windows of 100 bps with 20% and 50% overlap. The result of these experiments, which is consistent with , is that FASTA should be preferred over BLAST for ncRNA searches.
Implications for the repertoire of ncRNA in Trypanosomes
In this study, a systematic in silico screen for conserved ncRNA among seven trypanosomatids is presented. In total, we found close to 100 candidates. One reason for the relatively low number of the additional ncRNAs that we found stems at least in part from the fact that studies from our labs, and those of others (i.e. [47, 49, 56, 57, 65, 66]), already characterized the repertoire of Trypanosome snoRNAs, snRNAs, and other ncRNA species. Many of the recent studies which utilized comparative genomics to identify ncRNAs examined organisms that had very little previous ncRNA annotation. For example, in a study of Plasmodium, Chakrabarti et al.  identified several snRNAs (U1-U5), telomerase RNA, and about 30 snoRNAs. We believe that the fact that we were able to detect a third to half of the known ncRNA in trypanosomes by the bioinformatic method used indicates that our computational procedure is thorough.
In a recent study, Mao et al.  evaluated the conservation between T. brucei and L. braziliensis using a binomial-based model. QRNA was then used to identify likely ncRNA candidates. A total of 378 sequences were found with a significant QRNA score. Among the 378 sequences, 117 sequences were found to be highly significant when compared to randomized versions of the same sequence. Of the 117, 53 were unannotated. We evaluated the overlap between our final set and Mao's set of 378. We found three common sequences. They were: VSG pseudogene (candidate #121), a retro-transposon hotspot (candidate #98), and a novel C/D snoRNA (candidate #29, named TB10Cs2"C4). Note that although Mao et al. reported a low false positive rate - their algorithm only detected about 50% of the tRNAs, 20% of the rRNAs and 0% of the known snoRNAs. Comparing the performance of our procedure with this work, we conclude that the procedure used in our study is efficient and can serve as a useful tool for other systems, as well.
We propose that our findings can be used to estimate the total number of small ncRNA molecules in Trypanosomes. Of the 50 candidates tested, 18 novel ncRNAs were validated in procyclic stage trypanosomes. The experimental validation of a sample of 50 candidates suggest that about 1/3 of the candidates exist as novel small RNAs. On the other hand, when we tested our procedure on the known ncRNA we found that about 2/3 of the molecules have sufficient sequence conservation to be discovered by comparative sequence methods. Assuming that the rest of the ncRNA repertoire has similar characteristics and combining the two observations above, we can suggest that the total number of ncRNA molecules yet to be discovered in trypanosomatids is unlikely to be more than a few hundred.
There are several caveats to this claim. First, in our search, we did not consider the large amount (about 60% of the genes) of conserved hypothetical proteins. Many hypothetical proteins have been annotated as such because their sequence is found in open reading frames. However, some of these sequences may actually harbor ncRNA molecules. Several snoRNAs have been found within open reading frames. For example, Tb03.30p12.690, labelled as a hypothetical protein, overlaps with a C/D snoRNA TB3Cs2C1.
In addition, it is possible that there are many ncRNAs that are organism specific and cannot be detected by comparative methods. We notice that our study failed to identify several RNAs that are expected to exist in trypanosomatids such as telomerase RNA and RNAse P. Interestingly, Piccinelli et al.  studied RNase P and MRP in a variety of eukaryotes, but were unable to identify them in trypanosomatids. This is likely due to the fact that these RNAs are highly divergent even among closely related trypanosomatids. An interesting finding in this context is the detection of snoRNAs (TB2Cs1C1, TB10Cs6C1 and TB9Cs7H1) that are present in the genome as singletons, and are not part of the usual cluster organization of snoRNA in trypanosomatids. While obviously these two molecules were conserved enough among the different trypanosomatid species to be detected, other singleton molecules may be more diverse and hence harder to detect, suggesting that more such snoRNAs may exist.
Third, our extrapolation was based on our observation that only about 1/3 of candidate molecules were shown to be expressed. We cannot rule out the possibility that these candidate molecules are expressed at different stages in the life cycle of the parasite or under ambient environmental conditions. In C. elegans, it was shown that many ncRNAs are developmentally regulated and exhibit stage-specific function.
Taken together these issues limit our ability to quantitatively estimate, the number of ncRNA molecules in trypanosomatids. However, even if each one of these factors are off by a factor of two, our overall estimate should be in error by less than a single order of magnitude. Thus, we believe that our results supply an "order of magnitude" qualitative argument suggesting that there are relatively few remaining small ncRNA to be identified. Since we found several dozen candidates, we estimate that not more than several hundred ncRNA molecules exist in each of the trypanosomatid genomes. Many of these molecules may be additional members of known ncRNA families, so that the expected number of novel families is limited.
It has been suggested that the genome of higher eukaryotes contain many thousands of as yet undiscovered ncRNA molecules. Washietl et al., , suggested that this repertoire includes short and long ncRNA molecules. Indeed, there is mounting evidence  that there are thousands of long ncRNA molecules (although their functional relevance is still under debate). However, we must note that there is no experimental evidence to support the claim of a large number of short ncRNA, except for the large variety of very short ncRNA (miRNA, piRNA) which are associated with the Dicer/Argonaut silencing system. Our findings support the view that at least for unicellular eukaryotes, the repertoire of small ncRNA is not likely to grow much beyond what is already known, and will remain in the hundreds and not thousands.
Genomic Data sources
Trypanosoma brucei (TB) genomic DNA and sequence annotation (version 4) was downloaded from GeneDB (http://www.genedb.org). GeneDB contains all available sequences from the 11 megabase chromsomes of T. brucei strain TREU927/4 GUTat10.1 generated by the T. brucei genome projects at The Institute for Genomic Research (TIGR's T. brucei project) and The Wellcome Trust Sanger Institute (Sanger's T. brucei project). Trypanosoma cruzi (version 4) (TC), Leishmania major (version 5.2) (LM), Leishmania infantum (version 2) (LI), and Leishmania braziliensis (version 1) (LB) genomic sequence data was also downloaded from GeneDB (ftp://ftp.sanger.ac.uk/pub/pathogens/). The nuclear genome of Trypanosoma cruzi CL Brener is being sequenced by the TIGR-Seattle Biomedical Research Institute-Karolinska Institute T. cruzi Sequencing Consortium (TSK-TSC) (http://www.jcvi.org/). The genome of L. major Friedlin, the reference strain (MHOM/IL/80/Friedlin, zymodeme MON-103), was sequenced as part of a multi-centre collaboration (Sanger Institute/EULEISH, Seattle Biomedical Research Institute, FMRP). The shotgun sequences of T. vivax (TV) and T. congolense (TCONG) were downloaded from GeneDB (ftp://ftp.sanger.ac.uk/pub/databases/). The Sanger Institute has also carried out a 5× coverage of the nuclear genome of T. vivax, as well as T. congolense. The sizes of the T. brucei, T. cruzi, L. major genomes are 25 Mb, 60 Mb, and 32 Mb respectively. These genomes have been published [70–72]. The L. major and T. brucei genomes are fully assembled. The T. cruzi genome has been fully sequenced, but its assembly is still in its preliminary stages. The T. cruzi genome is available as many large contigs.
Sequence similarity searches
The basis for our search strategy was to designate one trypanosomatid genome as a "reference" and to find all sequences in the other organisms that are similar to it. We chose to use T. brucei as the reference genome since it is fully sequenced, reasonably annotated, and because we have the experimental setup for candidate validation. While genome synteny maybe the preferable method to align genomes, we note that some of our genomes are not assembled and are still only available as shotgun sequences; thus we had to chose an alignment method that is based on relatively small windows. The reference genome, T. brucei, was divided into a window size of 100 bps with a sliding window of 50 bps. These sequences were searched for similarity against the other trypanosomatid genomes using FASTA . We found FASTA to be more useful for this project than BLAST (see the Discussion). A Bio-PERL/PERL  script was written to post-process the FASTA results. Sequence matches were further analyzed if they fit the following criteria: 25 bps or longer, an e-value less than or equal to 0.01, and percent identity equal or greater than 60%. FASTA matches that passed the filter were then mapped back to the T. brucei genome. Areas that were less than 10 bps apart were concatenated. Conservation was defined by the number of genomes that had matches to the same corresponding segment of the genome. We considered areas that were conserved in at least four of the six genomes and those that were conserved in all six genomes. Sequences annotated as protein coding or hypothetical protein coding, were then filtered out.
General ncRNA Detection tools
RNA was prepared from T. brucei cells using the TRI-Reagent (Sigma). Primer extension analysis was performed as described [75, 76] using 5'-end-labeled oligonucleotides specific to each target RNA. The extension products were analyzed on a 6% polyacrylamide/7 M urea gel and visualized by autoradiography. For examining the level of ncRNAs under silencing of the core RNA binding proteins, RNA was prepared from untreated cells and 3 days after the induction of silencing, as previously described [48, 49, 77, 78].
Sequence data from this study were deposited in GeneDB. Accession numbers can be found in Additional File 9.
This research was supported by a grant from the Israel-US Binational Science Foundation (BSF), and by an International Research Scholar's Grant from the Howard Hughes Foundation to S.M. S.M. holds the David and Inez Myers Chair in RNA silencing of diseases.
- Amaral PP, Dinger ME, Mercer TR, Mattick JS: The eukaryotic genome as an RNA machine. Science. 2008, 319: 1787-1789. 10.1126/science.1155472.PubMedView ArticleGoogle Scholar
- Costa FF: Non-coding RNAs: lost in translation?. Gene. 2007, 386: 1-10. 10.1016/j.gene.2006.09.028.PubMedView ArticleGoogle Scholar
- Dinger ME, Amaral PP, Mercer TR, Mattick JS: Pervasive transcription of the eukaryotic genome: functional indices and conceptual implications. Brief Funct Genomic Proteomic. 2009, 8: 407-423. 10.1093/bfgp/elp038.PubMedView ArticleGoogle Scholar
- Erdmann VA, Barciszewska MZ, Hochberg A, de GN, Barciszewski J: Regulatory RNAs. Cell Mol Life Sci. 2001, 58: 960-977. 10.1007/PL00000913.PubMedView ArticleGoogle Scholar
- Kiss T: Small nucleolar RNAs: an abundant group of noncoding RNAs with diverse cellular functions. Cell. 2002, 109: 145-148. 10.1016/S0092-8674(02)00718-3.PubMedView ArticleGoogle Scholar
- Amaral PP, Mattick JS: Noncoding RNA in development. Mamm Genome. 2008, 19: 454-492. 10.1007/s00335-008-9136-7.PubMedView ArticleGoogle Scholar
- Kugel JF, Goodrich JA: In new company: U1 snRNA associates with TAF15. EMBO Rep. 2009, 10: 454-456. 10.1038/embor.2009.65.PubMed CentralPubMedView ArticleGoogle Scholar
- Dinger ME, Pang KC, Mercer TR, Mattick JS: Differentiating protein-coding and noncoding RNA: challenges and ambiguities. PLoS Comput Biol. 2008, 4: e1000176-10.1371/journal.pcbi.1000176.PubMed CentralPubMedView ArticleGoogle Scholar
- Frith MC, Bailey TL, Kasukawa T, Mignone F, Kummerfeld SK, Madera M, Sunkara S, Furuno M, Bult CJ, Quackenbush J, et al: Discrimination of non-protein-coding transcripts from protein-coding mRNA. RNA Biol. 2006, 3: 40-48.PubMedView ArticleGoogle Scholar
- Lowe TM, Eddy SR: tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997, 25: 955-964. 10.1093/nar/25.5.955.PubMed CentralPubMedView ArticleGoogle Scholar
- Edvardsson S, Gardner PP, Poole AM, Hendy MD, Penny D, Moulton V: A search for H/ACA snoRNAs in yeast using MFE secondary structure prediction. Bioinformatics. 2003, 19: 865-873. 10.1093/bioinformatics/btg080.PubMedView ArticleGoogle Scholar
- Schattner P, Decatur WA, Davis CA, Ares M, Fournier MJ, Lowe TM: Genome-wide searching for pseudouridylation guide snoRNAs: analysis of the Saccharomyces cerevisiae genome. Nucleic Acids Res. 2004, 32: 4281-4296. 10.1093/nar/gkh768.PubMed CentralPubMedView ArticleGoogle Scholar
- Muller S, Charpentier B, Branlant C, Leclerc F: A dedicated computational approach for the identification of archaeal H/ACA sRNAs. Methods Enzymol. 2007, 425: 355-387. full_text.PubMedView ArticleGoogle Scholar
- Myslyuk I, Doniger T, Horesh Y, Hury A, Hoffer R, Ziporen Y, Michaeli S, Unger R: Psiscan: a computational approach to identify H/ACA-like and AGA-like non-coding RNA in trypanosomatid genomes. BMC Bioinformatics. 2008, 9: 471-10.1186/1471-2105-9-471.PubMed CentralPubMedView ArticleGoogle Scholar
- Lowe TM, Eddy SR: A computational screen for methylation guide snoRNAs in yeast. Science. 1999, 283: 1168-1171. 10.1126/science.283.5405.1168.PubMedView ArticleGoogle Scholar
- Lim LP, Glasner ME, Yekta S, Burge CB, Bartel DP: Vertebrate microRNA genes. Science. 2003, 299: 1540-10.1126/science.1080372.PubMedView ArticleGoogle Scholar
- Hertel J, Stadler PF: Hairpins in a Haystack: recognizing microRNA precursors in comparative genomics data. Bioinformatics. 2006, 22: e197-e202. 10.1093/bioinformatics/btl257.PubMedView ArticleGoogle Scholar
- Rivas E, Klein RJ, Jones TA, Eddy SR: Computational identification of noncoding RNAs in E. coli by comparative genomics. Curr Biol. 2001, 11: 1369-1373. 10.1016/S0960-9822(01)00401-8.PubMedView ArticleGoogle Scholar
- Washietl S, Hofacker IL, Stadler PF: Fast and reliable prediction of noncoding RNAs. Proc Natl Acad Sci USA. 2005, 102: 2454-2459. 10.1073/pnas.0409169102.PubMed CentralPubMedView ArticleGoogle Scholar
- Huttenhofer A, Kiefmann M, Meier-Ewert S, O'Brien J, Lehrach H, Bachellerie JP, Brosius J: RNomics: an experimental approach that identifies 201 candidates for novel, small, non-messenger RNAs in mouse. EMBO J. 2001, 20: 2943-2953. 10.1093/emboj/20.11.2943.PubMed CentralPubMedView ArticleGoogle Scholar
- Mardis ER: The impact of next-generation sequencing technology on genetics. Trends Genet. 2008, 24: 133-141.PubMedView ArticleGoogle Scholar
- Brown JW, Clark GP, Leader DJ, Simpson CG, Lowe T: Multiple snoRNA gene clusters from Arabidopsis. RNA. 2001, 7: 1817-1832.PubMed CentralPubMedGoogle Scholar
- Huang ZP, Chen CJ, Zhou H, Li BB, Qu LH: A combined computational and experimental analysis of two families of snoRNA genes from Caenorhabditis elegans, revealing the expression and evolution pattern of snoRNAs in nematodes. Genomics. 2007, 89: 490-501. 10.1016/j.ygeno.2006.12.002.PubMedView ArticleGoogle Scholar
- Voss B, Georg J, Schon V, Ude S, Hess WR: Biocomputational prediction of non-coding RNAs in model cyanobacteria. BMC Genomics. 2009, 10: 123-10.1186/1471-2164-10-123.PubMed CentralPubMedView ArticleGoogle Scholar
- Liu Y, Liu XS, Wei L, Altman RB, Batzoglou S: Eukaryotic regulatory element conservation analysis and identification using comparative genomics. Genome Res. 2004, 14: 451-458. 10.1101/gr.1327604.PubMed CentralPubMedView ArticleGoogle Scholar
- Wang X, Haberer G, Mayer KF: Discovery of cis-elements between sorghum and rice using co-expression and evolutionary conservation. BMC Genomics. 2009, 10: 284-10.1186/1471-2164-10-284.PubMed CentralPubMedView ArticleGoogle Scholar
- Sieglaff DH, Dunn WA, Xie XS, Megy K, Marinotti O, James AA: Comparative genomics allows the discovery of cis-regulatory elements in mosquitoes. Proc Natl Acad Sci USA. 2009, 106: 3053-3058. 10.1073/pnas.0813264106.PubMed CentralPubMedView ArticleGoogle Scholar
- McCutcheon JP, Eddy SR: Computational identification of non-coding RNAs in Saccharomyces cerevisiae by comparative genomics. Nucleic Acids Res. 2003, 31: 4119-4128. 10.1093/nar/gkg438.PubMed CentralPubMedView ArticleGoogle Scholar
- Steigele S, Huber W, Stocsits C, Stadler PF, Nieselt K: Comparative analysis of structured RNAs in S. cerevisiae indicates a multitude of different functions. BMC Biol. 2007, 5: 25-10.1186/1741-7007-5-25.PubMed CentralPubMedView ArticleGoogle Scholar
- Song D, Yang Y, Yu B, Zheng B, Deng Z, Lu BL, Chen X, Jiang T: Computational prediction of novel non-coding RNAs in Arabidopsis thaliana. BMC Bioinformatics. 2009, 10 (Suppl 1): S36-10.1186/1471-2105-10-S1-S36.PubMed CentralPubMedView ArticleGoogle Scholar
- Chen CL, Zhou H, Liao JY, Qu LH, Amar L: Genome-wide evolutionary analysis of the noncoding RNA genes and noncoding DNA of Paramecium tetraurelia. RNA. 2009, 15: 503-514. 10.1261/rna.1306009.PubMed CentralPubMedView ArticleGoogle Scholar
- Kavanaugh LA, Dietrich FS: Non-coding RNA prediction and verification in Saccharomyces cerevisiae. PLoS Genet. 2009, 5: e1000321-10.1371/journal.pgen.1000321.PubMed CentralPubMedView ArticleGoogle Scholar
- Washietl S, Hofacker IL, Lukasser M, Huttenhofer A, Stadler PF: Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome. Nat Biotechnol. 2005, 23: 1383-1390. 10.1038/nbt1144.PubMedView ArticleGoogle Scholar
- Torarinsson E, Sawera M, Havgaard JH, Fredholm M, Gorodkin J: Thousands of corresponding human and mouse genomic regions unalignable in primary sequence contain common RNA structure. Genome Res. 2006, 16: 885-889. 10.1101/gr.5226606.PubMed CentralPubMedView ArticleGoogle Scholar
- Kapranov P, Cheng J, Dike S, Nix DA, Duttagupta R, Willingham AT, Stadler PF, Hertel J, Hackermuller J, Hofacker IL, et al: RNA maps reveal new RNA classes and a possible function for pervasive transcription. Science. 2007, 316: 1484-1488. 10.1126/science.1138341.PubMedView ArticleGoogle Scholar
- Carninci P, Kasukawa T, Katayama S, Gough J, Frith MC, Maeda N, Oyama R, Ravasi T, Lenhard B, Wells C, et al: The transcriptional landscape of the mammalian genome. Science. 2005, 309: 1559-1563. 10.1126/science.1112014.PubMedView ArticleGoogle Scholar
- Missal K, Rose D, Stadler PF: Non-coding RNAs in Ciona intestinalis. Bioinformatics. 2005, 21 (Suppl 2): ii77-ii78. 10.1093/bioinformatics/bti1113.PubMedView ArticleGoogle Scholar
- Missal K, Zhu X, Rose D, Deng W, Skogerbo G, Chen R, Stadler PF: Prediction of structured non-coding RNAs in the genomes of the nematodes Caenorhabditis elegans and Caenorhabditis briggsae. J Exp Zool B Mol Dev Evol. 2006, 306: 379-392. 10.1002/jez.b.21086.PubMedView ArticleGoogle Scholar
- Rose D, Hackermuller J, Washietl S, Reiche K, Hertel J, Findeiss S, Stadler PF, Prohaska SJ: Computational RNomics of drosophilids. BMC Genomics. 2007, 8: 406-10.1186/1471-2164-8-406.PubMed CentralPubMedView ArticleGoogle Scholar
- Chakrabarti K, Pearson M, Grate L, Sterne-Weiler T, Deans J, Donohue JP, Ares M: Structural RNAs of known and unknown function identified in malaria parasites by comparative genomics and RNA analysis. RNA. 2007, 13: 1923-1939. 10.1261/rna.751807.PubMed CentralPubMedView ArticleGoogle Scholar
- Mourier T, Carret C, Kyes S, Christodoulou Z, Gardner PP, Jeffares DC, Pinches R, Barrell B, Berriman M, Griffiths-Jones S, et al: Genome-wide discovery and verification of novel structured RNAs in Plasmodium falciparum. Genome Res. 2008, 18: 281-292. 10.1101/gr.6836108.PubMed CentralPubMedView ArticleGoogle Scholar
- Axmann IM, Kensche P, Vogel J, Kohl S, Herzel H, Hess WR: Identification of cyanobacterial non-coding RNAs by comparative genome analysis. Genome Biol. 2005, 6: R73-10.1186/gb-2005-6-9-r73.PubMed CentralPubMedView ArticleGoogle Scholar
- Coenye T, Drevinek P, Mahenthiralingam E, Shah SA, Gill RT, Vandamme P, Ussery DW: Identification of putative noncoding RNA genes in the Burkholderia cenocepacia J2315 genome. FEMS Microbiol Lett. 2007, 276: 83-92. 10.1111/j.1574-6968.2007.00916.x.PubMedView ArticleGoogle Scholar
- Stuart KD, Schnaufer A, Ernst NL, Panigrahi AK: Complex management: RNA editing in trypanosomes. Trends Biochem Sci. 2005, 30: 97-105. 10.1016/j.tibs.2004.12.006.PubMedView ArticleGoogle Scholar
- Agabian N: Trans splicing of nuclear pre-mRNAs. Cell. 1990, 61: 1157-1160. 10.1016/0092-8674(90)90674-4.PubMedView ArticleGoogle Scholar
- Liang XH, Haritan A, Uliel S, Michaeli S: trans and cis splicing in trypanosomatids: mechanism, factors, and regulation. Eukaryot Cell. 2003, 2: 830-840. 10.1128/EC.2.5.830-840.2003.PubMed CentralPubMedView ArticleGoogle Scholar
- Michaeli S, Podell D, Agabian N, Ullu E: The 7SL RNA homologue of Trypanosoma brucei is closely related to mammalian 7SL RNA. Mol Biochem Parasitol. 1992, 51: 55-64. 10.1016/0166-6851(92)90200-4.PubMedView ArticleGoogle Scholar
- Barth S, Hury A, Liang XH, Michaeli S: Elucidating the role of H/ACA-like RNAs in trans-splicing and rRNA processing via RNA interference silencing of the Trypanosoma brucei CBF5 pseudouridine synthase. J Biol Chem. 2005, 280: 34558-34568. 10.1074/jbc.M503465200.PubMedView ArticleGoogle Scholar
- Barth S, Shalem B, Hury A, Tkacz ID, Liang XH, Uliel S, Myslyuk I, Doniger T, Salmon-Divon M, Unger R, et al: Elucidating the role of C/D snoRNA in rRNA processing and modification in Trypanosoma brucei. Eukaryot Cell. 2008, 7: 86-101. 10.1128/EC.00215-07.PubMed CentralPubMedView ArticleGoogle Scholar
- Doniger T, Michaeli S, Unger R: Families of H/ACA ncRNA molecules in trypanosomatids. RNA Biol. 2009, 6: 370-374. 10.4161/rna.6.4.9270.PubMedView ArticleGoogle Scholar
- Liang XH, Uliel S, Hury A, Barth S, Doniger T, Unger R, Michaeli S: A genome-wide analysis of C/D and H/ACA-like small nucleolar RNAs in Trypanosoma brucei reveals a trypanosome-specific pattern of rRNA modification. RNA. 2005, 11: 619-645. 10.1261/rna.7174805.PubMed CentralPubMedView ArticleGoogle Scholar
- Uliel S, Liang XH, Unger R, Michaeli S: Small nucleolar RNAs that guide modification in trypanosomatids: repertoire, targets, genome organisation, and unique functions. Int J Parasitol. 2004, 34: 445-454. 10.1016/j.ijpara.2003.10.014.PubMedView ArticleGoogle Scholar
- Liang XH, Hury A, Hoze E, Uliel S, Myslyuk I, Apatoff A, Unger R, Michaeli S: Genome-wide analysis of C/D and H/ACA-like small nucleolar RNAs in Leishmania major indicates conservation among trypanosomatids in the repertoire and in their rRNA targets. Eukaryot Cell. 2007, 6: 361-377. 10.1128/EC.00296-06.PubMed CentralPubMedView ArticleGoogle Scholar
- Mao Y, Najafabadi HS, Salavati R: Genome-wide computational identification of functional RNA elements in Trypanosoma brucei. BMC Genomics. 2009, 10: 355-10.1186/1471-2164-10-355.PubMed CentralPubMedView ArticleGoogle Scholar
- Cassago A, Rodrigues EM, Prieto EL, Gaston KW, Alfonzo JD, Iribar MP, Berry MJ, Cruz AK, Thiemann OH: Identification of Leishmania selenoproteins and SECIS element. Mol Biochem Parasitol. 2006, 149: 128-134. 10.1016/j.molbiopara.2006.05.002.PubMedView ArticleGoogle Scholar
- Beja O, Ullu E, Michaeli S: Identification of a tRNA-like molecule that copurifies with the 7SL RNA of Trypanosoma brucei. Mol Biochem Parasitol. 1993, 57: 223-229. 10.1016/0166-6851(93)90198-7.PubMedView ArticleGoogle Scholar
- Dungan JM, Watkins KP, Agabian N: Evidence for the presence of a small U5-like RNA in active trans-spliceosomes of Trypanosoma brucei. EMBO J. 1996, 15: 4016-4029.PubMed CentralPubMedGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215: 403-410.PubMedView ArticleGoogle Scholar
- Pearson WR, Lipman DJ: Improved tools for biological sequence comparison. Proc Natl Acad Sci USA. 1988, 85: 2444-2448. 10.1073/pnas.85.8.2444.PubMed CentralPubMedView ArticleGoogle Scholar
- Shpaer EG, Robinson M, Yee D, Candlin JD, Mines R, Hunkapiller T: Sensitivity and selectivity in protein similarity searches: a comparison of Smith-Waterman in hardware to BLAST and FASTA. Genomics. 1996, 38: 179-191. 10.1006/geno.1996.0614.PubMedView ArticleGoogle Scholar
- Brenner SE, Chothia C, Hubbard TJ: Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships. Proc Natl Acad Sci USA. 1998, 95: 6073-6078. 10.1073/pnas.95.11.6073.PubMed CentralPubMedView ArticleGoogle Scholar
- Freyhult EK, Bollback JP, Gardner PP: Exploring genomic dark matter: a critical assessment of the performance of homology search methods on noncoding RNA. Genome Res. 2007, 17: 117-125. 10.1101/gr.5890907.PubMed CentralPubMedView ArticleGoogle Scholar
- Gish W: WU-blast. 1996Google Scholar
- Stein LD, Bao Z, Blasiar D, Blumenthal T, Brent MR, Chen N, Chinwalla A, Clarke L, Clee C, Coghlan A, et al: The genome sequence of Caenorhabditis briggsae: a platform for comparative genomics. PLoS Biol. 2003, 1: E45-10.1371/journal.pbio.0000045.PubMed CentralPubMedView ArticleGoogle Scholar
- Wieland B, Bindereif A: Unexpected diversity in U6 snRNA sequences from trypanosomatids. Gene. 1995, 161: 129-133. 10.1016/0378-1119(95)00247-4.PubMedView ArticleGoogle Scholar
- Xu Y, Ben-Shlomo H, Michaeli S: The U5 RNA of trypanosomes deviates from the canonical U5 RNA: the Leptomonas collosoma U5 RNA and its coding gene. Proc Natl Acad Sci USA. 1997, 94: 8473-8478. 10.1073/pnas.94.16.8473.PubMed CentralPubMedView ArticleGoogle Scholar
- Piccinelli P, Rosenblad MA, Samuelsson T: Identification and analysis of ribonuclease P and MRP RNA in a broad range of eukaryotes. Nucleic Acids Res. 2005, 33: 4485-4495. 10.1093/nar/gki756.PubMed CentralPubMedView ArticleGoogle Scholar
- He H, Cai L, Skogerbo G, Deng W, Liu T, Zhu X, Wang Y, Jia D, Zhang Z, Tao Y, et al: Profiling Caenorhabditis elegans non-coding RNA expression with a combined microarray. Nucleic Acids Res. 2006, 34: 2976-2983. 10.1093/nar/gkl371.PubMed CentralPubMedView ArticleGoogle Scholar
- Guttman M, Amit I, Garber M, French C, Lin MF, Feldser D, Huarte M, Zuk O, Carey BW, Cassady JP, et al: Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature. 2009, 458: 223-227. 10.1038/nature07672.PubMed CentralPubMedView ArticleGoogle Scholar
- El-Sayed NM, Myler PJ, Bartholomeu DC, Nilsson D, Aggarwal G, Tran AN, Ghedin E, Worthey EA, Delcher AL, Blandin G, et al: The genome sequence of Trypanosoma cruzi, etiologic agent of Chagas disease. Science. 2005, 309: 409-415. 10.1126/science.1112631.PubMedView ArticleGoogle Scholar
- Ivens AC, Peacock CS, Worthey EA, Murphy L, Aggarwal G, Berriman M, Sisk E, Rajandream MA, Adlem E, Aert R, et al: The genome of the kinetoplastid parasite, Leishmania major. Science. 2005, 309: 436-442. 10.1126/science.1112680.PubMed CentralPubMedView ArticleGoogle Scholar
- Berriman M, Ghedin E, Hertz-Fowler C, Blandin G, Renauld H, Bartholomeu DC, Lennard NJ, Caler E, Hamlin NE, Haas B, et al: The genome of the African trypanosome Trypanosoma brucei. Science. 2005, 309: 416-422. 10.1126/science.1112642.PubMedView ArticleGoogle Scholar
- Stajich JE, Block D, Boulez K, Brenner SE, Chervitz SA, Dagdigian C, Fuellen G, Gilbert JG, Korf I, Lapp H, et al: The Bioperl toolkit: Perl modules for the life sciences. Genome Res. 2002, 12: 1611-1618. 10.1101/gr.361602.PubMed CentralPubMedView ArticleGoogle Scholar
- Griffiths-Jones S, Bateman A, Marshall M, Khanna A, Eddy SR: Rfam: an RNA family database. Nucleic Acids Res. 2003, 31: 439-441. 10.1093/nar/gkg006.PubMed CentralPubMedView ArticleGoogle Scholar
- Liang XH, Liu L, Michaeli S: Identification of the first trypanosome H/ACA RNA that guides pseudouridine formation on rRNA. J Biol Chem. 2001, 276: 40313-40318.PubMedView ArticleGoogle Scholar
- Xu Y, Liu L, Lopez-Estrano C, Michaeli S: Expression studies on clustered trypanosomatid box C/D small nucleolar RNAs. J Biol Chem. 2001, 276: 14289-14298.PubMedGoogle Scholar
- Mandelboim M, Barth S, Biton M, Liang XH, Michaeli S: Silencing of Sm proteins in Trypanosoma brucei by RNA interference captured a novel cytoplasmic intermediate in spliced leader RNA biogenesis. J Biol Chem. 2003, 278: 51469-51478. 10.1074/jbc.M308997200.PubMedView ArticleGoogle Scholar
- Liu Q, Liang XH, Uliel S, Belahcen M, Unger R, Michaeli S: Identification and functional characterization of lsm proteins in Trypanosoma brucei. J Biol Chem. 2004, 279: 18210-18219. 10.1074/jbc.M400678200.PubMedView ArticleGoogle Scholar
- Corpet F: Multiple sequence alignment with hierarchical clustering. Nucleic Acids Res. 1988, 16: 10881-10890. 10.1093/nar/16.22.10881.PubMed CentralPubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.