RScan: fast searching structural similarities for structured RNAs in large databases
© Xue and Liu; licensee BioMed Central Ltd. 2007
Received: 17 April 2007
Accepted: 31 July 2007
Published: 31 July 2007
Many RNAs have evolutionarily conserved secondary structures instead of primary sequences. Recently, there are an increasing number of methods being developed with focus on the structural alignments for finding conserved secondary structures as well as common structural motifs in pair-wise or multiple sequences. A challenging task is to search similar structures quickly for structured RNA sequences in large genomic databases since existing methods are too slow to be used in large databases.
An implementation of a fast structural alignment algorithm, RScan, is proposed to fulfill the task. RScan is developed by levering the advantages of both hashing algorithms and local alignment algorithms. In our experiment, on the average, the times for searching a tRNA and an rRNA in the randomized A. pernix genome are only 256 seconds and 832 seconds respectively by using RScan, but need 3,178 seconds and 8,951 seconds respectively by using an existing method RSEARCH. Remarkably, RScan can handle large database queries, taking less than 4 minutes for searching similar structures for a microRNA precursor in human chromosome 21.
These results indicate that RScan is a preferable choice for real-life application of searching structural similarities for structured RNAs in large databases. RScan software is freely available at http://bioinfo.au.tsinghua.edu.cn/member/cxue/rscan/RScan.htm.
A wide range of RNA molecules can form specific secondary structures by folding their primary sequences. RNA secondary structures play important roles in cellular processes, such as regulating gene expressions and producing non-coding transcriptional products [1, 2]. The secondary structures of many non-coding RNAs (ncRNA), like the cloverleaf structure of tRNA and the hairpin structure of microRNA precursor (pre-miRNA), have been evolutionarily conserved instead of the primary sequences .
In recent years, several computational methods have been reported to find these conserved secondary structures, as well as common local structural motifs in pair-wise or multiple sequences. The programs QRNA , ddbRNA  and MSARI  were developed, one after another, to detect functional ncRNAs with conserved structures. Washietl et al. implemented program RNAz , which led to a mapping of thousands of conserved structural and functional RNA in the human genome . In addition, the local structures or structural motifs of RNA molecules were even more important in RNA function study. Macke et al. defined specific types of RNA motif and developed RNAMotif to search the structural elements . The program ERPIN was based on the secondary structure profile and used the RNA sequence alignment with secondary structure information for motif definition and identification . Hoechsmann et al. utilized tree alignment and forest alignment to implement local similarity comparison in RNA secondary structures . The Vienna RNA package was probably a comprehensive RNA secondary structure prediction and comparison tool , and RNALfold was designed to predict locally stable RNA structures in single genome . Havgaard and co-workers focused on detecting the common local structures between two RNA sequences with low sequence similarity . A similar procedure RNAProfile was used for detecting conserved structural motifs in unaligned RNA sequences . Recently, Liu et al. reported RSmatch for aligning RNA secondary structures and motif detection , which used a tree model to organize the structure components. For a comprehensive comparison, Freyhult et al. assessed the effectiveness of 12 methods that can perform RNA homology search. The result showed that most of them have low accuracy .
We now face a challenging task: given a RNA sequence with secondary structure, how to find structural homologs in a large genome database effectively. To deal with this task, Klein and Eddy developed a pair-wise alignment algorithm RSEARCH  based on the profile stochastic context-free grammar. RSEARCH used the base pair and single nucleotide substitution matrix RIBOSUM to find optimal structural alignments between a RNA sequence and a sequence database. It succeeded in searching in Archaeals, yeast and Arabidopsis thaliana databases. However, the time complexity of the algorithm is O(NM3), where N is the length of the database sequence and M is the length of the query sequence. RSEARCH is very slow on a personal computer (PC) [1, 18]. Subsequently, Weinberg and Ruzzo used a rigorous filter to eliminate the sequences that provably could not be annotated as homologs of known ncRNA gene family in the genome database [19, 20]. Then, they achieved a fast search when annotating the new members of known ncRNAs in the genome. Using a similar strategy, Bafna and Zhang invented FastR , which was faster than RSEARCH by dividing the search into two steps. Firstly, FastR filtered a large proportion of the database according to the analysis of the structural element and sequence information of the query RNA, and then, the searching was run on the remaining of the database. The query time of FastR drastically decreased due to a smaller search database produced by the filtering process. However, since FastR must filter the database for each specific query, it was actually an invalid comparison without considering the large overhead required to perform the filtering step. Moreover, FastR also lost sensitivity due to the filtering .
In this paper we focus our attention on finding structural similarities for a structured query RNA in a large database efficiently and quickly and propose an algorithm RScan to do the job. These conserved structural similarities are possible functional homologs of the query RNA. RScan is developed upon the basis of SSAHA  and the Smith-Waterman  algorithms. RScan carries out the search in two steps. Firstly, it builds a hash table for a database. It then searches structural similarities with the hash table for a query in the database. Since the hash table is generated only once for searching any query in a given database, this can drastically reduce the time required to perform a search. Six ncRNA datasets were used as a test-case to make a comparison between RScan and RSEARCH, and the results were encouraging. Moreover, RScan and RSEARCH obtained a close identification rate when searching for a tRNA and an rRNA query in a randomized genome. RScan took only 256 seconds (s) versus 3,178 s with RSEARCH for tRNA. rRNA yielded 832 s with RScan versus 8,951 s with RSEARCH. The experiment of searching structural similarities for a pre-miRNA in human chromosome 21 took less than 4 minutes. RScan makes a good searching tool when queried database is large.
Pre-processing and definitions
Search algorithm of RScan
RScan directly searches optimal structural alignments between a structural query and a structural database. RScan is based on SSAHA algorithm , which utilizes the hashing algorithm to perform a fast search for large genome databases, and the Smith-Waterman algorithm , which is a basic local alignment algorithm.
Construct the hash table for a structural database
Search the structural query
RScan uses a binary match\unmatch score function to calculate the score of each alignment. The penalty values of the match, unmatch and the insertion\deletion gap (indel) can be adjusted on demand. In this paper, penalty values are set as: match = 1, unmatch = -2, indel = -1.
The output reports three structural alignment hits. The first hit is an accurate alignment between the structural query and the structural S1, which just contains two mismatches and two indels in whole alignment. Hit 2 is also a good structural alignment though more indels occur and the lengths of the query (72 nucleotides) and the hit S3 (87 nucleotides) are different, shown in Figure 1. In hit 3, RScan only finds a local structural alignment between the query and S2. Two parameters, "Alignment_ratio" and "Identity", should be introduced more carefully. Using hit 1 as an example, the number of the aligned symbols of the query is 72 and the length of the query is 72, so the parameter "Alignment_ratio" is 100% (72/72). And there are 70 matches in the aligned symbols of the query, so "Identity" is 97.22% (70/72). Users can set the thresholds for "Alignment_ratio" and "Identity", and RScan only reports the alignments which are greater than the thresholds. In following experiments, we set thresholds for finding significant global alignments.
Comparisons between RScan and three other methods: Blast, RSEARCH and RSmatch
Several sets of data were used to evaluate the performance of RScan. The datasets were six ncRNA families, one tRNA, two rRNA and three RNase P families, from Rfam , whose member sequences were filtered using proper steps, for example, eliminating the sequences with high sequence similarity (see details in Methods). For each filtered dataset, one member was used as the query and the other members were treated to build the database. RScan and RSEARCH performed the searches on these datasets and compared the accuracy and the speed.
First of all, it should be emphasized that Blast can not find sequence homologies between the query and the sequences in the datasets because of the filtering step (see Methods). By comparison, RScan and RSEARCH recognized numerous significant structural similarities for the query. This indicates RScan is more sensitive than Blast for structural alignment.
Comparison of RScan and RSEARCH on the ncRNA family datasets
# of True
6 m 3 s
26 m 28 s
6 h 47 m 47 s
10 h 24 m 32 s
12 h 17 m 58 s
3 m 9 s
41 h 3 m 27 s
4 m 55 s
The last column of Table 1 listed the computing times of RScan and RSEARCH on six datasets. RScan was remarkably faster than RSEARCH. RScan only took 5 seconds (s) and 6 s for searching a structural query in the structural RF00005 and RF00001 databases, respectively. In contrast, RSEARCH ran 363 s and 1,588 s for the same structural searches. On the three ribozyme families, RSEARCH needed 6~12 hours to finish the searches, but RScan took only 10 seconds to 3 minutes to do so. For searching RF00177 dataset, the most time consuming case, RSEARCH ran more than 41 hours, whereas RScan ran less than 5 minutes. In this case, the identification rate 82% of RSEARCH was 1.26 times higher than that of RScan, but the computing time of RSEARCH was 501 times than that of RScan.
The time complexity of RSEARCH is O(NM3), where N is the length of the database sequence and M is the length of the query sequence [1, 26]. It will be very slow when M or N is large. Comparatively, RScan transfers the complicated similarity search problem of RNA secondary structure into a sequence alignment problem between a structural query and a structural database. So, the time complexity of RScan is O(NM), which is identical to the sequence alignment algorithms.
We also tested another RNA-specific search method, RSmatch . It only found 19 accurate tRNA alignments on tRNA dataset and missed all hits on other five datasets (see supplementary materials).
Finally, in the experiments RScan showed higher sensitivity than Blast and RSmatch. Although RScan may loss some sensitivity compared with RSEARCH, it is greatly faster than RSEARCH. Actually, some real-life tasks do not need to retrieve all hits for a query. For example, an unknown query is asked if it belongs to a known ncRNA families. We can run a structural similarity search on the entire Rfam database. RScan will give a quick answer with sufficient sensitivity.
However, it should be noted that only those examples, whose predicted structures were satisfied the filtering criteria (see Methods), were used in the tests shown in Table 1. For example, when the dataset was constructed, part of tRNAs without typical cloverleaf structure were filtered. RScan was unable to accurately search a tRNA with two stem-loops in the database composed of tRNAs with three stem-loops. We therefore needed to study the impact of the accuracy of structure prediction.
Impact of the accuracy of structure prediction
% TP (Total)
1110 (excluding 4 queries)
Comparison between RScan and RSEARCH on a randomized genome
A randomized genome was created through shuffling a whole A. pernix genome while preserving di-nucleotide frequency. tRNAs or rRNAs were then inserted in the randomized genome. RScan and RSEARCH searched the tRNA or rRNA query in the randomized genome to evaluate the false positive and the false negative.
Comparison of RScan and RSEARCH on the randomized genome
# of True
52 m 58 s
4 m 16 s
4 m 9 s
2 h 29 m 11 s
13 m 52 s
13 m 44 s
The time used by RScan is composed of two parts: the time spent in building the hash table (Thash) and in searching (Tsearch), like SSAHA . Thash is inconsequential since the hash table is generated only once for a given database. The last two columns of Table 3 list Thash and Tsearch in this experiment. It should be noted that RSEARCH searches the sequence database directly, but RScan searches a processed structural database, and this pre-processing step takes additional time. The times for generating the 75 nt and 120 nt structural database were 534 s and 510 s, respectively. Fortunately, the pre-processing to generate the structural database takes only once.
There another algorithm FastR, which is similar to RScan, comprises of two steps . FastR performs the search on a filtered database, which is only a small proportion of a given genome. However, the search time is genome specific and depends on the efficiency of the filtering step in each query. Since FastR must filter the genome every time for each specific query, the search time should include the time spent on the filter step, which is in the range of hundreds of seconds for filtering the A. pernix genome . By comparison, the computation time of RScan consists of Thash and Tsearch. Thash is related to the database and is taken only once. Once the hash table is generated, it can be used for searching any query. Tsearch is linear to the size of the database and the length of the query. So, RScan is a more efficient search strategy.
On the other hand, the experiments in Table 3 also reveal the limitation of RScan's application on the genome. A window with pre-chosen width probably leads to great changes of tRNAs' or rRNAs' predicted secondary structures. This is the main reason why RScan missed several tRNAs or rRNAs. When applying to native A. pernix genome, RScan displayed a bad performance for finding tRNAs or rRNAs. The reason is that it is difficult to give a proper window width for a genome. At the same time, it is impractical to build a huge structural database using windows with different widths. A promising solution is using RNALfold  to find stable structures or significant local structural motifs in a genome, and then using RScan to align the queries.
Performance analysis of RScan with different parameters
Performances of RScan with different k and match-core
# of True
Although a greater k reduces the search time, a large value k should not be selected solely upon this criterion. The combination of parameter k and parameter "match-core" determines the alignment, which directly affects the hit sensitivity. For example, in Table 4, when k is 7 and "match-core" is 2 times k, a candidate alignment is required to have at least 14 contiguous matching symbols; and then, this candidate is scored using the local alignment algorithm to obtain the final alignment. With k or match-core increasing, the candidates that satisfy the threshold of the match-core are decreasing. RScan found few alignments when the match-core is 22. So, the parameters should be properly selected to balance between the speed and the alignment sensitivity.
Time and Storage requirements for building hash table in advance
Hash Table Storage (Mb)
It should be noted that the computation time in Table 5, the case that the hash table was generated in advance, and the computation time in Table 4, the case that the hash table was not generated in advance, are slightly different. Tsearch in Table 5 was slightly less than Tsearch in Table 4. The reason is that the number of keys in the hash table was reduced; there is a compression step in the former case that rids of the keys that refers to null values. Searches are carried out on all keys loaded in memory for the latter case, which implies a time waste on key comparisons during hash lookup for invalid entries. And the additional time spent on reading the hash table pre-built into memory is neglectable.
Realization of a fast search on human chromosome
Search the pre-miRNAs in human chromosome 21
k = 9
k = 11
150 nt chr21 structural database
In addition, we used the member of the let-7 family, "has-let-7a-2", which is located in human chromosome 11 and has 88% sequence similarity with the "has-let-7c" located in chr21, as a query to search its structural similarities in chr21. Parameters k was set to 9 and the match core was 2 times of k, RScan found seven similarities, including his homology miRNA "has-let-7c", and spent about a thousand seconds.
In real-life applications, an additional step could be considered for reducing the storage of large genome. We may use RNALfold  to find the stable structures or significant local structural motifs in the genome, which are then used to build the structural database. RScan could get more significant structural alignments on this filtered structural database. To sum up, this experiment shows that RScan is competent in the real-life applications of searching structural similarities for structural RNAs in large genome.
Real-life application for validating the unknown RNA sequence
Search RNA sequences in Rfam seed structural database
Rfam ID of Query
Id ≥ %
Ar ≥ %
Search more RNA sequences in Rfam seed structural database
Rfam ID of Query
In this real-life application, RScan is very convenient for the researchers who might produce or obtain numerous transcripts by biological experiments and would want to know if some of them have similar structures with known ncRNA and also to deduce their possible functions. According to Table 7 and 8, if an unknown sequence is perfectly aligned an ncRNA, it can be assigned as the same type. Now, we are building RScan as a web server for supporting more ncRNA structural databases from experimental and computational databases, like RNAdb, NONCODE, Fantom3, etc.
RScan is a fast and sensitive algorithm for searching RNA secondary structure similarity and it is valuable for real-life applications. RScan begins the process by converting the sequence database into a structural database. RScan can then search a structural query on the structural database to obtain the optimal structural alignments. To increase search efficiency, RScan employs a hash table to store k-tuples of the structural database. Consequent searches identify all possible match-cores based on the hash table, and score candidate alignments derived from the match-cores to obtain optimal alignments. It should be emphasized that the structural database and the corresponding hash tables are all generated only once and this drastically reduces the search time.
When a database is for a single genome, RScan slides along the genome sequence with a window of predetermined length to segment sequences. The structural database is then created from the predicted secondary structures of these segmented sequences. In traditional methods, a query is only aligned with the sub-sequences of the genome; these sub-sequences are generally less than the maximum length [13, 18]. So, using RScan to transform the genome into a structural database is sensible. However, a PC cannot afford the huge storage required for building a structural database with all lengths. In our example, we used a 150 nt window, which slides along the both strands of chromosome 21 stepped every 50 nt, to build the structural database. Notice that this sliding window multiplies the genome data six times and it is helpful to prevent the searching from loss of sensitivity. Considering the substantial memory usage of the hash table using structural database to reduce memory requirement becomes markedly relevant. In further application, an additional step may be considered for reducing the storage and improving search performance. We may use RNALfold  to find the stable structures or significant local structural motifs in the genome, which are then used to build the structural database. RScan could perform more efficiently on this filtered structural database and the structural alignments could be more significant.
As highlighted by Klein and Eddy, three areas demand additional analysis: the score matrix, the precise secondary structure of the query sequence, and the speed . Since the score matrix is independent of the alignment algorithm, RScan is not involved in refining the score matrix. RScan only uses the simplest match\unmatch score function. In terms of the query sequence, RScan utilizes known or predicted secondary structures, much like the strategy used by RSEARCH. In most cases, the correct secondary structures of a sequence are difficult to obtain . In order to acquire a good secondary structure, all possible folds of the sequence are considered in Sankoff's algorithm . However, the optimal energy structure may not necessarily be the correct structure. In such a case considering all possible folds will substantially be slower than many other methods. Some of the methods use various constraints to reduce the required folds to predict secondary structures . RScan employs RNAfold to predict the secondary structures of queries and segmented sequences. This means that only a certain secondary structure of a sequence is considered. This implies that the efficiency of RScan greatly depends on the accuracy of the secondary structure prediction algorithm. Improving the score matrix and secondary structure prediction will be considered in our future work, and updates in these areas can be modularly integrated into RScan. In addition, RScan does not evaluate the statistical significance, the P-value, for each alignment. The P-value is generally calculated based on the size of the database and the composition of the sequences in the database . Since secondary structures in the structural database are generated via prediction, the composition of the structural symbols is not sufficiently credible for a statistical evaluation. Finally, RScan focuses on improving search speed, especially on search of large-scale databases. Essentially, RScan transfers the complicated similarity search problem of RNA secondary structure into a sequence alignment issue between a structural query and a structural database. So, the time complexity of RScan is O(NM), identical to that of the sequence alignment algorithms. RScan successfully realizes a quick search of similar secondary structures for the structured RNAs in large databases. In the future, our aim is to build an online RScan server for the applications used by biology and bioinformatics researchers.
RScan can find structural similarities for structured query RNAs in large databases efficiently and quickly. It is a preferable choice for real-life application of structural alignment.
Non-coding RNA family datasets from Rfam
The basic information of filtered ncRNA family datasets
# of members
5S ribosomal RNA
Small subunit ribosomal RNA
Bacterial RNase P class A
Bacterial RNase P class B
For each filtered dataset, one member was used as the query and other members were used as the database. For RScan search, the secondary structures of the query and the sequences in the database were predicted by using RNAfold  as the structural query and the structural database, respectively. The parameters of RScan were set at: k = 7, Identity ≥ 90% and Alignment_ratio ≥ 95% for tRNA (whose lengths were less than 100 nt), and k = 7, Identity ≥ 85% and Alignment_ratio ≥ 90% for RF00001 (whose length range were from 100 nt to 200 nt), and k = 9, Identity ≥ 75% and Alignment_ratio ≥ 85% for other families (whose lengths were more than 200 nt). For RSEARCH search, all sequences in the database were jointed as a single sequence and RSEARCH searched the query in the jointed sequence with the parameters "-n 1000 -E 10". All experiments were carried out on a 2.4 GHz Intel PC with 1 GB of RAM, running Linux.
The A. pernix genome (NC_000854.1, 1.67 Mb) was taken from NCBI. It was shuffled with an identical di-nucleotide frequency to create a shuffling genome. Then, 49 tRNAs and 20 rRNAs recognized by both RSEARCH and RScan according to the ncRNA dataset experiments were inserted in the shuffling genome to make two randomized genomes. The queries tRNA "AB042240.3/15036-15107" and rRNA "M28193.1/1-119" in Table 1 were used again.
For RScan search, the randomized genome inserted with tRNAs was broken into 75-nucleotide (75 nt) segment sequences because the query tRNA was 72 nucleotides. Every segment sequence was forced to contain unknown character "N" less then 5%. The secondary structures of all 75 nt segment sequences were predicted using RNAfold . The set of all symbol sequences of secondary structures comprised the structural database. The randomized genome inserted with rRNAs was processed with the same steps except that it was broken into 120-nucleotide segment sequences because the query rRNA is 119 nucleotides.
Rfam seed structural database
This database came from Rfam version 7.0, seed alignments of 503 families, which contained 13040 seed sequences [25, 29]. The secondary structures of all sequences were predicted using RNAfold . For RScan search, the hash tables of the structural database were generated in advance. The k-tuple was set to 7 or 9 and the storages of the hash table were 2.02 Mb or 1.58 Mb.
Structural database of human 21 chromosome
The human chromosome 21 (chr21) was downloaded from NCBI, totally 45.6 Mb. A 150 nt slide window stepped every 50 nt along both strands of the human chr21 to produce 1,366,746 segment sequences. The secondary structures of all sequences were predicted using RNAfold  to build structural database. The hash tables of the structural database were also generated in advance and the k-tuple was set to 9 or 11. The storages of the hash table were 172 Mb or 145 Mb, respectively. Moreover, the hash table was technically divided into small files, which were read in memory one by one. This maintained a limited requirement for computer memory (see supplementary materials). The query sequences were four pre-miRNAs locating in human chromosome 21, taken from the microRNA registry .
Availability and requirements
The RScan program is freely accessible on our website http://bioinfo.au.tsinghua.edu.cn/member/cxue/rscan/RScan.htm. Supplementary materials, the detailed data of the experiments and the recommended parameter settings are also provided at the website.
We wish to thank Zemin Ning of the Wellcome Trust Sanger Institute for advice on the implementation and tests of RScan and Sam Griffiths-Jones of the Wellcome Trust Sanger Institute for careful reading of the manuscript and his valuable suggestions on data analysis. Thanks are also due to Liang Ji and Fei Li of Tsinghua University and Michael Q. Zhang of Cold Spring Harbor Laboratory for useful comments. We thank Liang Ji, Alfred Yeung and Rick Bruderick for critical improvement in writing. Especially, this work was finished during CX's postdoctoral days, and he thanks Prof. Yanda Li and Prof. Xuegong Zhang of Tsinghua University for their great support and supervising. This work is supported in part by China Postdoctoral Science Foundation (20060400060).
- Klein RJ, Eddy SR: RSEARCH: finding homologs of single structured RNA sequences. BMC Bioinformatics. 2003, 4 (1): 44-10.1186/1471-2105-4-44.PubMed CentralPubMedView ArticleGoogle Scholar
- Eddy SR: Non-coding RNA genes and the modern RNA world. Nat Rev Genet. 2001, 2 (12): 919-929. 10.1038/35103511.PubMedView ArticleGoogle Scholar
- Eddy SR: Computational genomics of noncoding RNA genes. Cell. 2002, 109 (2): 137-140. 10.1016/S0092-8674(02)00727-4.PubMedView ArticleGoogle Scholar
- Rivas E, Eddy SR: Noncoding RNA gene detection using comparative sequence analysis. BMC Bioinformatics. 2001, 2 (1): 8-10.1186/1471-2105-2-8.PubMed CentralPubMedView ArticleGoogle Scholar
- di Bernardo D, Down T, Hubbard T: ddbRNA: detection of conserved secondary structures in multiple alignments. Bioinformatics. 2003, 19 (13): 1606-1611. 10.1093/bioinformatics/btg229.PubMedView ArticleGoogle Scholar
- Coventry A, Kleitman DJ, Berger B: MSARI: multiple sequence alignments for statistical detection of RNA secondary structure. Proc Natl Acad Sci U S A. 2004, 101 (33): 12102-12107. 10.1073/pnas.0404193101.PubMed CentralPubMedView ArticleGoogle Scholar
- Washietl S, Hofacker IL, Stadler PF: Fast and reliable prediction of noncoding RNAs. Proc Natl Acad Sci U S A. 2005, 102 (7): 2454-2459. 10.1073/pnas.0409169102.PubMed CentralPubMedView ArticleGoogle Scholar
- Washietl S, Hofacker IL, Lukasser M, Huttenhofer A, Stadler PF: Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome. Nat Biotechnol. 2005, 23 (11): 1383-1390. 10.1038/nbt1144.PubMedView ArticleGoogle Scholar
- Macke TJ, Ecker DJ, Gutell RR, Gautheret D, Case DA, Sampath R: RNAMotif, an RNA secondary structure definition and search algorithm. Nucleic Acids Res. 2001, 29 (22): 4724-4735. 10.1093/nar/29.22.4724.PubMed CentralPubMedView ArticleGoogle Scholar
- Gautheret D, Lambert A: Direct RNA motif definition and identification from multiple sequence alignments using secondary structure profiles. J Mol Biol. 2001, 313 (5): 1003-1011. 10.1006/jmbi.2001.5102.PubMedView ArticleGoogle Scholar
- Hoechsmann M, Toeller T, Giegerich R, Kurtz S: Local Similarity in RNA Secondary Structures. Proc Computational Systems Bioinformatics Conf. 2003, 2: 159-168.Google Scholar
- Hofacker IL: Vienna RNA secondary structure server. Nucleic Acids Res. 2003, 31 (13): 3429-3431. 10.1093/nar/gkg599.PubMed CentralPubMedView ArticleGoogle Scholar
- Hofacker IL, Priwitzer B, Stadler PF: Prediction of locally stable RNA secondary structures for genome-wide surveys. Bioinformatics. 2004, 20 (2): 186-190. 10.1093/bioinformatics/btg388.PubMedView ArticleGoogle Scholar
- Havgaard JH, Lyngso RB, Stormo GD, Gorodkin J: Pairwise local structural alignment of RNA sequences with sequence similarity less than 40%. Bioinformatics. 2005, 21 (9): 1815-1824. 10.1093/bioinformatics/bti279.PubMedView ArticleGoogle Scholar
- Pavesi G, Mauri G, Stefani M, Pesole G: RNAProfile: an algorithm for finding conserved secondary structure motifs in unaligned RNA sequences. Nucleic Acids Res. 2004, 32 (10): 3258-3269. 10.1093/nar/gkh650.PubMed CentralPubMedView ArticleGoogle Scholar
- Liu J, Wang JT, Hu J, Tian B: A method for aligning RNA secondary structures and its application to RNA motif detection. BMC Bioinformatics. 2005, 6 (1): 89-10.1186/1471-2105-6-89.PubMed CentralPubMedView ArticleGoogle Scholar
- Freyhult EK, Bollback JP, Gardner PP: Exploring genomic dark matter: a critical assessment of the performance of homology search methods on noncoding RNA. Genome Res. 2007, 17 (1): 117-125. 10.1101/gr.5890907.PubMed CentralPubMedView ArticleGoogle Scholar
- Bafna V, Zhang S: FastR: Fast database search tool for non-coding RNA. 2004, 52-61.Google Scholar
- Weinberg Z, Ruzzo WL: Exploiting conserved structure for faster annotation of non-coding RNAs without loss of accuracy. Bioinformatics. 2004, 20 Suppl 1: I334-I341. 10.1093/bioinformatics/bth925.PubMedView ArticleGoogle Scholar
- Weinberg Z, Ruzzo WL: Faster genome annotation of non-coding RNA families without loss of accuracy. ACM Press, 243-251.
- Ning Z, Cox AJ, Mullikin JC: SSAHA: a fast search method for large DNA databases. Genome Res. 2001, 11 (10): 1725-1729. 10.1101/gr.194201.PubMed CentralPubMedView ArticleGoogle Scholar
- Smith TF, Waterman MS: Identification of common molecular subsequences. J Mol Biol. 1981, 147 (1): 195-197. 10.1016/0022-2836(81)90087-5.PubMedView ArticleGoogle Scholar
- Hofacker IL, Fontana W, Stadler PF, Bonhoeffer S, Tacker M, Schuster P: Fast folding and comparison of RNA secondary structures. Monatshefte f Chemie. 1994, 125: 167-188. 10.1007/BF00818163.View ArticleGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215 (3): 403-410.PubMedView ArticleGoogle Scholar
- Griffiths-Jones S, Bateman A, Marshall M, Khanna A, Eddy SR: Rfam: an RNA family database. Nucleic Acids Res. 2003, 31 (1): 439-441. 10.1093/nar/gkg006.PubMed CentralPubMedView ArticleGoogle Scholar
- Bafna V, Zhang S: FastR: fast database search tool for non-coding RNA. Proc IEEE Comput Syst Bioinform Conf. 2004, 52-61.Google Scholar
- Sankoff D: Simultaneous solution of the RNA folding, alignment and protosequence problems. SIAM J Appl Math. 1985, 45: 810-825. 10.1137/0145048.View ArticleGoogle Scholar
- Bonnet E, Wuyts J, Rouze P, Van de Peer Y: Evidence that microRNA precursors, unlike other non-coding RNAs, have lower folding free energies than random sequences. Bioinformatics. 2004, 20 (17): 2911-2917. 10.1093/bioinformatics/bth374.PubMedView ArticleGoogle Scholar
- Griffiths-Jones S, Moxon S, Marshall M, Khanna A, Eddy SR, Bateman A: Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res. 2005, 33 (Database issue): D121-4. 10.1093/nar/gki081.PubMed CentralPubMedView ArticleGoogle Scholar
- Griffiths-Jones S: The microRNA Registry. Nucleic Acids Res. 2004, 32 Database issue: D109-11. 10.1093/nar/gkh023.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.