Bcheck: a wrapper tool for detecting RNase P RNA genes
© Yusuf et al; licensee BioMed Central Ltd. 2010
Received: 19 February 2010
Accepted: 13 July 2010
Published: 13 July 2010
Effective bioinformatics solutions are needed to tackle challenges posed by industrial-scale genome annotation. We present Bcheck, a wrapper tool which predicts RNase P RNA genes by combining the speed of pattern matching and sensitivity of covariance models. The core of Bcheck is a library of subfamily specific descriptor models and covariance models.
Scanning all microbial genomes in GenBank identifies RNase P RNA genes in 98% of 1024 microbial chromosomal sequences within just 4 hours on single CPU. Comparing to existing annotations found in 387 of the GenBank files, Bcheck predictions have more intact structure and are automatically classified by subfamily membership. For eukaryotic chromosomes Bcheck could identify the known RNase P RNA genes in 84 out of 85 metazoan genomes and 19 out of 21 fungi genomes. Bcheck predicted 37 novel eukaryotic RNase P RNA genes, 32 of which are from fungi. Gene duplication events are observed in at least 20 metazoan organisms. Scanning of meta-genomic data from the Global Ocean Sampling Expedition, comprising over 10 million sample sequences (18 Gigabases), predicted 2909 unique genes, 98% of which fall into ancestral bacteria A type of RNase P RNA and 66% of which have no close homolog to known prokaryotic RNase P RNA.
The combination of efficient filtering by means of a descriptor-based search and subsequent construction of a high-quality gene model by means of a covariance model provides an efficient method for the detection of RNase P RNA genes in large-scale sequencing data.
Bcheck is implemented as webserver and can also be downloaded for local use from http://rna.tbi.univie.ac.at/bcheck
In recent years, biological sequence databases have grown exponentially. These data include a rapidly increasing number of completely sequenced genomes as well as large-scale metagenomic data sets that await annotation. For instance, the Global Ocean Sampling Expedition (GOS) deposited more than 18G metagenomic sequences already . The analysis of these data calls for new and more efficient methods of data analysis .
Non-protein-coding RNA (ncRNA) genes are abundant in genomic sequences, playing diverse important biological roles . The genomic annotation of ncRNA genes is attracting strong research focus, in particular in the context of genome annotation [4, 5] and metagenomics [6, 7]. Methods for homology-based annotation have dramatically improved over the last years. In particular, Infernal 1.0  outperforms previous methods by orders of magnitude in speed. Nevertheless, such general purpose approaches do not reach the performance levels of customized class-specific tools, in particular tRNAscan-SE in terms of both speed and quality. Manual strategies in some cases  reach superior results, but are too time-consuming for larger projects and in most cases are hard to generalize.
tRNAscan-SE is not a single algorithm but rather a wrapper tool that combines a series of increasingly complex and expensive filters. Similarly, the major searching strategy of Rfam is a combination of a blast-based filter followed by Infernal. The pre-filtering at sequence level with blast, however, is not ideal in particular in applications to distant homologs . Another common approach is to apply a descriptor of sequence and structural motif to predict ncRNA homologs. The descriptor construction is a manual process requiring expert knowledge. Several descriptor languages have been developed, e.g. RNAmot, PatScan, HyPaL, RNAMotif, and rnabob, which is also used here.
RNase P RNA, possibly a remnant of the RNA world , is an important ribozyme involved in the processing of pre-tRNAs . Its gene is usually designated as rnpB in bacteria. A variable number of protein components  facilitate substrate binding . RNase P RNA exists in almost all organisms, but is absent in human mitochondrions . So far, there is compelling evidence for the loss of RNase P RNA only in a single organism, the archaeon Nanoarchaeum equitans. It is not unlikely, however, that plants, red algae, and heterokonts , some bacteria (e.g. Aquifex aeolicus[24, 25]) and some additional archaea (e.g. Pyrobaculum aerophilum) have lost their RNase P RNA. The archaeon Methanothermobacter thermoautotrophicus may be a transition towards the loss of RNase P RNA, which is catalytically inactive in this organism but can be "repaired" by a few substitutions .
The RNase P RNA structures can be broadly assigned to five subfamilies: the bacterial types A and B (bacA and bacB), the archaeal types A and M (arcA and arcM), and a single eukaryote group (nucA) . In addition, two eukaryotic subtypes in fungi (fugA & fugB) can be identified . The types arcA and bacA, which have been identified as ancestral states , cover the majority of microbial rnpB genes, forming diverse sets in terms of both sequence and structure variation. The derived types arcM and bacB, in contrast, have more uniform members. The diversity is largest among eukaryotic RNase P RNA genes. In eukaryotes, RNase P RNA is transcribed by polymerase III . The human promoter elements were described recently  to contain TATA-box, PSE, Oct and SP1/SPH elements within 100 nt upstream of the transcription initiation site. A comparison of all eukaryotic promoter elements showed weak similarities only in the TATA box.
In this contribution we are concerned with the detection of RNase P RNA genes in genomic data from all domains of life. In , a pattern matching based pipeline for efficient rnpB gene prediction has been proposed. It is not applicable to large-scale database searches in practice, however. Here, we present Bcheck, a wrapper, to perform efficient rnpB gene prediction by combining the fast filtering with rnabob and the sensitive validation by Infernal. The construction of such a method entails two tasks: the design of an efficient yet sensitive descriptor model (DM) that acts as a filter, and the derivation of a sensitive statistics covariance model (CM). Both components are based on a careful analysis of published RNase P RNA sequences and structures. The success of Bcheck depends on the efficiency and predictive power of both models, as well as a sensible wrapping algorithm that optimizes the interplay of DM and CM.
2.1 Algorithm and models
The construction of effective models of RNase P RNA genes is a non-trivial task because of the lack of strong family-specific conservation. Our strategy was to first classify the training sequences into the seven sub-families identified in the literature: arcA, arcM, bacA, bacB, nucA, fugA and fugB. The training set consists of sequences from the RNase P RNA Database  with intact secondary structures annotated, and additional RNase P RNA sequences from the Rfam and from two recent publications [23, 36]. A set of randomized decoys as well as randomized genomic sequences were constructed using ushuffle in order to determine the false positive rates.
The training of both DM and CM requires structural alignments, whose quality is crucial for both the automatic learning procedures of the Infernal CMs and the manual construction of the DMs. We adopted a multi-step strategy: The RNase P RNA sequences were first divided into structural elements, then base-pairing regions were structure-aligned manually, and loop regions were sequence-aligned by means of MUSCLE. Local alignments were then recombined into a "raw" global alignment for each subfamily. These alignments contained errors, mainly caused by local foldings which were not fitting to the conservation patterns shared by the majority of members. We applied RNAfold[39, 40] to check the thermodynamic plausibility of local structure elements. Construction of DMs started with these alignments. In the course of DM construction, outliers were temporarily removed from the alignments, searched with the preliminary DMs, which provided additional information to guide the re-insertion of the outlier into the alignment.
Since efficiency was the major focus of DM training, we focused on selected features of local regions. To gain consensus sequence information, each alignment column was summarized and assigned with standard, "ambiguous" IUPAC nucleotides (taking into account every nucleotide appearing in the column) or the gap character (whenever the column contained at least one gap). The sequence was edited to take established structural knowledge into account. The resulting consensus sequence was then annotated in the alignment.
For the subfamilies arcA, bacA, and nuc with strong variation, we constructed two variants, "DM selective" and "DM general", with different parameter settings. The selective DMs miss a few aberrant RNase P RNA genes, while the "DM general" models have a larger false positive rate. For each subfamily, only one CM is needed and automatically generated based on global structural alignment using the tools of the Infernal package.
To distinguish functional copy and pseudogene of eukaryotes, we analyzed their promoter regions. For this purpose we aligned 100 nt upstream of Polymerase III transcripts of the same organism and compared the RNase P RNA gene predictions. While the absence of a recognizable promoter signal does not prove that the RNase P RNA gene is not functional, we have adopted a conservative policy and include only bcheck results for which the presence of a promoter signal provides additional confirmation.
2.2.1 Prokaryote rnpB genes in GenBank
Summary of predicted microbial rnpB for GenBank genome data set.
The GenBank files contained annotated rnpB genes for 365 bacteria and 22 archaea, all of which were among the Bcheck predictions. We then compared start-end positions of Bcheck predictions and GenBank annotations. Only 25% of the annotations agree within a discrepancy of 5 nt or less at both ends.
Inspection of sequences and predicted secondary structures shows that the published sequences are in general incomplete, lacking nucleotides at the 5' end of P4 and 3' end of P4: At the 5' end, 66% known annotations miss flanking regions of P4, ranging from 30 to 90 nucleotides. At the 3' end, 56% known annotations miss flanking regions of P4', ranging from 10 to 20 nucleotides. A few of the GenBank annotations, furthermore, have promoter or terminator sequences included. Bcheck thus provides a substantial improvement also of the existing annotations in most cases.
Evaluation of the five major discrepancies between GenBank annotation and Bcheck results.
R. typhi wilmington
P1, P3, P9, P10
The subfamily distribution of microbial rnpB. No hit was obtained with any of the eukaryotic DMs.
2.2.2 RNase P RNA genes in metagenomic sequences
The GOS metagenomic sequences were obtained from the CAMERA project . Due to the taxonomic uncertainty of the GOS data set, all models of archaea, bacteria and eukaryotes were applied to search over 10 million sequences comprising about 18 G. No hit was produced by any of the eukaryotic models.
In total, Bcheck predicted 4675 rnpB genes with median E-values of 10-78. In 211 cases two models overlap. In these cases there is a clear difference in E-values, so that the assignment to domains is unambiguous in all positive cases. After duplication removal, 2909 rnpB sequences are unique, 2857 of which belong to bacA, 49 to arcA, 3 to bacB, but none for arcM, see Table 3.
2.2.3 RNase P RNA genes in eukaryotic genomes
Summary of predicted RNase P RNA genes in eukaryotes. "Known" refers to organisms with annotated RNase P RNA genes, whereas "unknown" refers to organisms with no annotated RNase P RNA genes.
Strong promoter signals were identified for Meloidogyne hapla and Monosiga brevicollis, supporting that these candidates are functional copies. For 36 metazoan genomes, Bcheck made multiple predictions. In at least 16 cases, additional predictions are highly similar to the functional copy. In genome sequence assembly, the reads originating from different copies of a repeat appear identical and cause assembly errors.
Therefore these predictions seem to be due to assembly errors rather than constituting true paralogs. In the other 20 cases differences in the flanking regions and the RNase P RNA gene itself indicate that we see the results of gene duplications. Even though the promoter might be specific for an organism, it may differ from other polymerase III transcripts within one species. In each of these cases, a presumably functional RNase P RNA like promoter structure was found for only one of the copies. Similar duplication patterns are observed in closely related primate, fish and rodent. For instance, both Homo sapiens and Pan troglodytes have functional copies on chromosome 14 and a pseudogene on chromosome 4. Among teleosts, both Danio rerio and Gasterosteus aculeatus have their functional copies and pseudogenes on the chromosome 2. In rodent family, Rattus norvegicus and Mus musculus have the pseudogenes spreading on at least 4 different chromosomes.
Novel RNase P RNA genes were detected by Bcheck in many fungi, which had not been analyzed in much detail so far. Only eight sequences which were reported before were not recognized by Bcheck. In 119 sequences Bcheck found 37 novel RNase P RNA genes. The remaining 82 genomes are either unfinished drafts, so that the RNase P RNA is not contained in the data, or they belong to clades where RNase P RNA may be absent. In plants, red algae, and heterokonts RNase MRP RNA, an ancient paralog of RNase P RNA, is well described [23, 34]. One may speculate that it substitutes for RNase P RNA in these clades, in particular given that multiple copies of RNase MRP RNA are present in plant genomes. A recent, detailed overview of the evolution of RNase P RNA and its associated proteins can be found in [23, 44]. An incomplete genome assembly explains e.g. the deviant RNase P RNA in the genome of the elephant (Loxodonta africanus), which shows a canonical sequence interrupted by a run of Ns in the latest assembly (Loxafr3.0). On the other hand, we suspect that we missed the RNase P RNA in some fungi and in some of the basal eukaryotes because of highly divergent sequence and secondary structure.
2.3 Software, webserver, and database
Bcheck was written in Python (version 2.5.2). Input consists of DNA or RNA sequences in fasta format, predictions are output with fasta format or with secondary structure annotated. Besides the default search algorithm, Bcheck also gives the option for searching with CM only. However, CM-only search is at least 100 times slower.
We set up a Bcheck webserver to facilitate online RNase P RNA gene prediction. A searchable rnpB database was developed, including genes for 1005 microbial organisms, 147 eukaryote organisms and 4756 GOS sample sequences. The predicted pseudogenes for eukaryote organisms are also included. The "rnpB database" uses a hierarchical tree structure, consisting of 5 tables, implementing preorder tree traversal algorithm to process the query efficiently. blast is also offered in the server for homology search against the database compromising 777 unique rnpB genes. The server can be accessed at http://rna.tbi.univie.ac.at/bcheck. The Bcheck-pipeline can also be downloaded from the same location for local use in a Linux environment.
3 Discussion and conclusions
The rapidly increasing size of sequence databases requires efficient tools for data analysis. In particular, homology annotation of small ncRNAs, with their short and often poorly conserved sequences poses a severe problem for large-scale annotation. Here, we describe Bcheck, an efficient pipeline to determine RNase P RNA genes across all three domains of life. In order to deal with the high variability of the RNase P RNA sequences and structures, we employ descriptor-based models specific for sub-families instead of a single pattern to construct more efficient filters. In the second step, improved covariance models are used to validate the candidates from the DM step and to determine nearly exact gene boundaries.
With Bcheck, we were able to determine the RNase P RNA genes in 59 out of 68 archaea, 946 out of 956 bacteria, and 147 out of 237 eukaryotes. 61% of the prokaryotic sequences and 25% of eukaryotic results were not annotated previously. The quality of the predicted rnpB genes is much better than a large fraction of the - usually blast-based - annotation available through GenBank. The size and diversity of eukaryote genomes brings with it a particular challenge in finding RNase P RNA genes, because this diversity is reflected in many aberrant features of the RNase P RNA itself. Using the fungi-specific DMs, we uncovered 32 previously unannotated sequences, which are downloadable from the Bcheck server. As in previous studies, we did not find RNase P RNA candidates in plants and Heterokonta.
Since Bcheck is more than 100 times faster than the direct application of Infernal (version 1.0), it is suitable in particular as a tool to screen large high-throughput sequencing data. With only a handful of false negatives (10 out of 1005 prokaryotes), Bcheck provides a highly efficient way to annotate newly sequenced genomes. A particular strength of Bcheck is its applicability to metagenomics data.
At present, Bcheck models were built on the conserved sequence and secondary structure features of a large sample of RNase P RNA genes. Conceivably, the predictive power of the pipeline could be improved further by including additional information. For instance, promoter and terminator regions might be utilized. A recent survey for 7SK RNAs capitalized largely on the conserved features of the characteristic pol-III promoter signals of this ncRNA class . A similar strategy might allow a further relaxation of the DM pattern in favor of a second filter utilizing the promoter and terminator motifs.
We are grateful to Prof. Dr. Thomas Rattei (Department of Microbial Ecology, University of Vienna) for making the GOS dataset available to us in convenient form, and Dipl.-Inf. Henry Wirth (Interdisciplinary Centre for Bioinformatics, University of Leipzig) for producing the boxplot figure. This work has been funded by the Austrian GEN-AU projects "bioinformatics integration network III", and in part supported by Landesstipendium Sachsen.
- Rusch DB, Halpern AL, Sutton G, Heidelberg KB, Williamson S, Yooseph S, Wu D, Eisen JA, Hoffman JM, Remington K, Beeson K, Tran B, Smith H, Baden-Tillson H, Stewart C, Thorpe J, Freeman J, Andrews-Pfannkoch C, Venter JE, Li K, Kravitz S, Heidelberg JF, Utterback T, Rogers YH, Falcón LI, Souza V, Bonilla-Rosso G, Eguiarte LE, Karl DM, Sathyendranath S, Platt T, Bermingham E, Gallardo V, Tamayo-Castillo G, Ferrari MR, Strausberg RL, Nealson K, Friedman R, Frazier M, Venter JC: The Sorcerer II Global Ocean Sampling expedition: northwest Atlantic through eastern tropical Pacific. PLoS Biol. 2007, 5: e77-10.1371/journal.pbio.0050077.PubMed CentralPubMedView ArticleGoogle Scholar
- Rust AG, Mongin E, Birney E: Genome annotation techniques: new approaches and challenges. Drug Discov Today. 2002, 7: S70-76. 10.1016/S1359-6446(02)02289-4.PubMedView ArticleGoogle Scholar
- Eddy SR: Non-coding RNA genes and the modern RNA world. Nat Rev Genet. 2001, 2: 919-929. 10.1038/35103511.PubMedView ArticleGoogle Scholar
- The Athanasius F Bompfünewerer RNA Consortium, Backofen R, Flamm C, Fried C, Fritzsch G, Hackermüller J, Hertel J, Hofacker IL, Missal K, Axel Prohaska Mosig, Rose D, Stadler PF, Tanzer A, Washietl S, Sebastian W: RNAs Everywhere: Genome-Wide Annotation of Structured RNAs. J Exp Zool B: Mol Dev Evol. 2007, 308B: 1-25. 10.1002/jez.b.21130.View ArticleGoogle Scholar
- Hertel J, de Jong D, Marz M, Rose D, Tafer H, Tanzer A, Schierwater B, Stadler PF: Non-Coding RNA Annotation of the Genome of Trichoplax adhaerens. Nucleic Acids Res. 2009, 37: 1602-1615. 10.1093/nar/gkn1084.PubMed CentralPubMedView ArticleGoogle Scholar
- Weinberg Z, Perreault J, Meyer MM, Breaker RR: Exceptional structured noncoding RNAs revealed by bacterial metagenome analysis. Nature. 2009, 462: 656-659. 10.1038/nature08586.PubMed CentralPubMedView ArticleGoogle Scholar
- Meyer MM, Ames TD, Smith DP, Weinberg Z, Schwalbach MS, Giovannoni SJ, Breaker RR: Identification of candidate structured RNAs in the marine organism 'Candidatus Pelagibacter ubique'. BMC Genomics. 2009, 10: 268-10.1186/1471-2164-10-268.PubMed CentralPubMedView ArticleGoogle Scholar
- Nawrocki EP, Kolbe DL, Eddy SR: Infernal 1.0: inference of RNA alignments. Bioinformatics. 2009, 25: 1335-1337. 10.1093/bioinformatics/btp157.PubMed CentralPubMedView ArticleGoogle Scholar
- Lowe TM, Eddy S: tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucl Acids Res. 1997, 25: 955-964. 10.1093/nar/25.5.955.PubMed CentralPubMedView ArticleGoogle Scholar
- Mosig A, Zhu L, Stadler PF: Customized strategies for discovering distant ncRNA homologs. Brief Funct Genomics Proteomics. 2009, 8: 451-460. 10.1093/bfgp/elp035.View ArticleGoogle Scholar
- Gardner PP, Daub J, Tate JG, Nawrocki EP, Kolbe DL, Lindgreen S, Wilkinson AC, Finn RD, Griffiths-Jones S, Eddy SR, Bateman A: Rfam: updates to the RNA families database. Nucleic Acids Res. 2009, 136-140. 10.1093/nar/gkn766. 37 Database
- Griffiths-Jones S, Moxon S, Marshall M, Khanna A, Eddy SR, Bateman A: Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res. 2005, 33: D121-124. 10.1093/nar/gki081.PubMed CentralPubMedView ArticleGoogle Scholar
- Gautheret D, Major F, Cedergren R: Pattern searching/alignment with RNA primary and secondary structures: an effective descriptor for tRNA. Comput Appl Biosci. 1990, 6 (4): 325-331.PubMedGoogle Scholar
- Dsouza M, Larsen N, Overbeek R: Searching for patterns in genomic data. Trends Genet. 1997, 13: 497-498. 10.1016/S0168-9525(97)01347-4.PubMedView ArticleGoogle Scholar
- Gräf S, Strothmann D, Kurtz S, Steger G: HyPaLib: a database of RNAs and RNA structural elements defined by hybrid patterns. Nucl Acids Res. 2001, 29: 196-198. 10.1093/nar/29.1.196.PubMed CentralPubMedView ArticleGoogle Scholar
- Macke TJ, Ecker DJ, Gutell RR, Gautheret D, Case DA, Sampath R: RNAMotif, an RNA secondary structure definition and search algorithm. Nucl Acids Res. 2001, 29 (22): 4724-4735. 10.1093/nar/29.22.4724.PubMed CentralPubMedView ArticleGoogle Scholar
- Eddy SR: RNABOB: a program to search for RNA secondary structure motifs in sequence databases. 1992, [http://selab.janelia.org/software.html]Google Scholar
- Randau L, Schröder I, Söll D: Life without RNase P. Nature. 2008, 453: 120-123. 10.1038/nature06833.PubMedView ArticleGoogle Scholar
- Guerrier-Takada C, Gardiner K, Marsh T, Pace N, Altman S: The RNA moiety of ribonuclease P is the catalytic subunit of the enzyme. Cell. 1983, 35: 849-857. 10.1016/0092-8674(83)90117-4.PubMedView ArticleGoogle Scholar
- Evans D, Marquez SM, Pace NR: RNase P: interface of the RNA and protein worlds. Trends Biochem Sci. 2006, 31: 333-341. 10.1016/j.tibs.2006.04.007.PubMedView ArticleGoogle Scholar
- Reich C, Olsen GJ, Pace B, Pace NR: Role of the protein moiety of ribonuclease P, a ribonucleoprotein enzyme. Science. 1988, 239: 178-181. 10.1126/science.3122322.PubMedView ArticleGoogle Scholar
- Walker SC, Engelke DR: A protein-only RNase P in human mitochondria. Cell. 2008, 135 (3): 412-414. 10.1016/j.cell.2008.10.010. [http://www.hubmed.org/display.cgi?uids=18984152]PubMed CentralPubMedView ArticleGoogle Scholar
- Piccinelli P, Rosenblad MA, Samuelsson T: Identification and analysis of ribonuclease P and MRP RNA in a broad range of eukaryotes. Nucleic Acids Res. 2005, 33: 4485-4495. 10.1093/nar/gki756.PubMed CentralPubMedView ArticleGoogle Scholar
- Li Y, Altman S: In search of RNase P RNA from microbial genomes. RNA. 2004, 10: 1533-1540. 10.1261/rna.7970404.PubMed CentralPubMedView ArticleGoogle Scholar
- Marszalkowski M, Willkomm DK, Hartmann RK: 5'-end maturation of tRNA in Aquifex aeolicus. Biol Chem. 2008, 389: 395-403. 10.1515/BC.2008.042.PubMedView ArticleGoogle Scholar
- Li D, Willkomm DK, Hartmann RK: Minor changes largely restore catalytic activity of archaeal RNase P RNA from Methanothermobacter thermoautotrophicus. Nucleic Acids Res. 2009, 37: 231-242. 10.1093/nar/gkn915.PubMed CentralPubMedView ArticleGoogle Scholar
- Loria A, Pan T: Domain structure of the ribozyme from eubacterial ribonuclease P. RNA. 1996, 2: 551-563.PubMed CentralPubMedGoogle Scholar
- Krasilnikov AS, Yang X, Pan T, Mondragón A: Crystal structure of the specificity domain of ribonuclease P. Nature. 2003, 421: 760-764. 10.1038/nature01386.PubMedView ArticleGoogle Scholar
- Li Y, Altman S: In search of RNase P RNA from microbial genomes. RNA. 2004, 10: 1533-1540. 10.1261/rna.7970404.PubMed CentralPubMedView ArticleGoogle Scholar
- Brown JW: The Ribonuclease P Database. Nucleic Acids Res. 1999, 27: 314-10.1093/nar/27.1.314.PubMed CentralPubMedView ArticleGoogle Scholar
- Frank DN, Adamidi C, Ehringer MA, Pitulle C, Pace NR: Phylogenetic-comparative analysis of the eukaryal ribonuclease P RNA. RNA. 2000, 6: 1895-1904. 10.1017/S1355838200001461.PubMed CentralPubMedView ArticleGoogle Scholar
- Haas ES, Williams D, Frank DN, Brown JW: New insight into RNase P RNA structure from comparative analysis of the archaeal RNA. RNA. 2001, 7: 220-232. 10.1017/S1355838201001777.PubMed CentralPubMedView ArticleGoogle Scholar
- Schramm L, Hernandez N: Recruitment of RNA polymerase III to its target promoters. Genes Dev. 2002, 16: 2593-2620. 10.1101/gad.1018902.PubMedView ArticleGoogle Scholar
- Woodhams MD, Stadler PF, Penny D, Collins LJ: RNAse MRP and the RNA Processing Cascade in the Eukaryotic Ancestor. BMC Evol Biol. 2007, 7: S13-10.1186/1471-2148-7-S1-S13.PubMed CentralPubMedView ArticleGoogle Scholar
- Brown JW: The Ribonuclease P Database. Nucleic Acids Res. 1999, 27: 314-314. 10.1093/nar/27.1.314.PubMed CentralPubMedView ArticleGoogle Scholar
- Marz M, Schoen A, Stadler P: RNase MRP and RNase P. in prep. 2010Google Scholar
- Jiang M, Anderson J, Gillespie J, Mayne M: uShuffle: a useful tool for shuffling biological sequences while preserving the k-let counts. BMC Bioinformatics. 2008, 9: 192-192. 10.1186/1471-2105-9-192.PubMed CentralPubMedView ArticleGoogle Scholar
- Edgar RC: MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics. 2004, 5: 113-113. 10.1186/1471-2105-5-113.PubMed CentralPubMedView ArticleGoogle Scholar
- Hofacker IL, Fontana W, Stadler PF, Bonhoeffer LS, Tacker M, Schuster P: Fast Folding and Comparison of RNA Secondary Structures. Monatsh Chem. 1994, 125: 167-188. 10.1007/BF00818163.View ArticleGoogle Scholar
- Hofacker IL: Vienna RNA secondary structure server. Nucleic Acids Res. 2003, 31: 3429-3431. 10.1093/nar/gkg599.PubMed CentralPubMedView ArticleGoogle Scholar
- Griffiths-Jones S: RALEE--RNA ALignment editor in Emacs. Bioinformatics. 2005, 21: 257-259. 10.1093/bioinformatics/bth489.PubMedView ArticleGoogle Scholar
- Seshadri R, Kravitz SA, Smarr L, Gilna P, Frazier M: CAMERA: a community resource for metagenomics. PLoS Biol. 2007, 5: e75-10.1371/journal.pbio.0050075.PubMed CentralPubMedView ArticleGoogle Scholar
- Foerstner KU, von Mering C, Hooper SD, Bork P: Environments shape the nucleotide composition of genomes. EMBO Rep. 2005, 6: 1208-1213. 10.1038/sj.embor.7400538.PubMed CentralPubMedView ArticleGoogle Scholar
- Rosenblad MA, López MD, Piccinelli P, Samuelsson T: Inventory and analysis of the protein subunits of the ribonucleases P and MRP provides further evidence of homology between the yeast and human enzymes. Nucleic Acids Res. 2006, 34 (18): 5145-5156. 10.1093/nar/gkl626. [http://www.hubmed.org/display.cgi?uids=16998185]PubMed CentralPubMedView ArticleGoogle Scholar
- Gruber A, Kilgus C, Mosig A, Hofacker IL, Hennig W, Stadler PF: Arthropod 7SK RNA. Mol Biol Evol. 2008, 1923-1930: 25-Google Scholar
- Marquez SM, Harris JK, Kelley ST, Brown JW, Dawson SC, Roberts EC, Pace NR: Structural implications of novel diversity in eucaryal RNase P RNA. RNA. 2005, 11: 739-751. 10.1261/rna.7211705.PubMed CentralPubMedView ArticleGoogle Scholar
- Walker SC, Engelke DR: Ribonuclease P: the evolution of an ancient RNA enzyme. Crit Rev Biochem Mol Biol. 2006, 41: 77-102. 10.1080/10409230600602634.PubMed CentralPubMedView ArticleGoogle Scholar
- Zhu Y, Pulukkunat DK, Li Y: Deciphering RNA structural diversity and systematic phylogeny from microbial metagenomes. Nucleic Acids Res. 2007, 35: 2283-2294. 10.1093/nar/gkm057.PubMed CentralPubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.