Stem-loop structures in prokaryotic genomes
© Petrillo et al; licensee BioMed Central Ltd. 2006
Received: 15 February 2006
Accepted: 04 July 2006
Published: 04 July 2006
Prediction of secondary structures in the expressed sequences of bacterial genomes allows to investigate spontaneous folding of the corresponding RNA. This is particularly relevant in untranslated mRNA regions, where base pairing is less affected by interactions with the translation machinery. Relatively large stem-loops significantly contribute to the formation of more complex secondary structures, often important for the activity of sequence elements controlling gene expression.
Systematic analysis of the distribution of stem-loop structures (SLSs) in 40 wholly-sequenced bacterial genomes is presented. SLSs were searched as stems measuring at least 12 bp, bordering loops 5 to 100 nt in length. G-U pairing in the stems was allowed. SLSs found in natural genomes are constantly more numerous and stable than those expected to randomly form in sequences of comparable size and composition. The large majority of SLSs fall within protein-coding regions but enrichment of specific, non random, SLS sub-populations of higher stability was observed within the intergenic regions of the chromosomes of several species. In low-GC firmicutes, most higher stability intergenic SLSs resemble canonical rho-independent transcriptional terminators, but very frequently feature at the 5'-end an additional A-rich stretch complementary to the 3' uridines. In all species, a clearly biased SLS distribution was observed within the intergenic space, with most concentrating at the 3'-end side of flanking CDSs. Some intergenic SLS regions are members of novel repeated sequence families.
In depth analysis of SLS features and distribution in 40 different bacterial genomes showed the presence of non random populations of such structures in all species. Many of these structures are plausibly transcribed, and might be involved in the control of transcription termination, or might serve as RNA elements which can enhance either the stability or the turnover of cotranscribed mRNAs. Three previously undescribed families of repeated sequences were found in Yersiniae, Bordetellae and Enterococci.
The tremendous flow of information generated by large scale genome-sequencing provided, as far as the prokaryotic world is concerned, the complete DNA sequence of over 200 bacterial strains, and more are becoming available every month. Most annotation work has been directed to the assessment of the protein repertoire encoded by a given microbe, aiming to the genome-scale reconstruction of bacterial metabolism , the identification of gene sets unique to pathogenic microorganisms [2, 3] or the development of new vaccines . The availability of massive amount of sequence data also stimulated in depth evaluation of the organization of the bacterial chromosome [5–9]. The basic organization of the genetic material (DNA curvature and stacking energy, base and oligo skews, etc.; see ref. ), and the presence of simple or more complex sequence repeats [11, 12] have also been analyzed for most sequenced bacterial genomes.
Information associated to the folding of specific, single stranded sequence regions into secondary structures is relatively ill-defined in prokaryotes. Prediction of RNA secondary structures may show different and even contrasting results, depending on the methodologies and the genomic regions evaluated [13–15].
In bacteria, protein coding sequences may be regarded as able to be transcribed and to form predictable secondary structures, although in many instances the spontaneous folding of the corresponding mRNA may be affected by interactions with the translation machinery. Stem-loop-structures (SLSs) in RNA may in turn control transcription, as in the attenuation mechanism , or influence translation, as SECIS elements do for the insertion of selenocysteine at stop codons . Secondary structure prediction is very effective for relatively small RNA with defined ends, especially when corroborated by phylogenetic data, but it is more ambiguous in larger RNAs, where SLSs, especially those containing short stems, are easily formed, or lost, when a sliding window is used to tentatively delimit the boundaries of a folding domain.
Longer stems significantly contribute to the formation of complex secondary structures where they affect RNA stability and functionality. Many non coding RNA structures are known to fold around a stem which delimits either a small, simple, single-strand loop or a larger, highly structured sequence. Examples are found in self-splicing introns , riboswitches , transcribed intergenic repeats such as E.coli BIME, Yersinia ERIC and Neisseria NEMIS sequences. In these cases the stem is often essential to the attainment of the correct secondary structure and may be directly recognized by ribonucleases [20–23]. Some predicted SLSs might also form in DNA and affect its conformation: base pairing of single stranded DNA is known to play a role in recombination, replication and transcription [24–26].
Here we present a systematic analysis of SLS distribution in prokaryotic genomes. Sequences able to fold into stem-loop structures featuring relatively large (12 or more bp) stems have been searched and analyzed in 40 wholly-sequenced bacterial chromosomes. SLSs found in searched bacterial genomes are more numerous and more stable than those randomly expected to form in sequences of comparable size and composition. The enrichment of specific SLS sub-populations may be observed within selected intergenic regions (IGRs).
Identification of stem-loop structures (SLSs)
Bacterial species analyzed in this study are numbered 1 to 40. The strains used for in silico analyses, the size of their genomes in base pairs and their relative GC content are shown. Representative species chosen for comparative analyses are labeled a through to v.
Escherichia coli K12
Salmonella typhimurium LT2
The large majority of SLSs falls within, or spans the ends of, genic regions; only about 10% of SLSs were found in IGRs. This distribution is not surprising as it reflects the high fraction (87–90%) of sequences annotated as coding in most tested genomes. In some species, however, the number of SLSs found in the IGRs was noticeably higher. In B. anthracis, C. perfringens and N. meningitidis, the fraction of SLSs found in non-coding sequences exceeds 20%. A slightly lower number of intergenic SLSs was found in the P. putida genome.
SLSs in naturally occurring and reshuffled genomes
The attitude of a sequence to randomly give rise to stem-loop structures is expected to depend on a number of features, such as base composition and word frequencies. Moving away from the equally-split 25% frequency of each base, or 50% GC content, sequence complexity is reduced, and this facilitates the formation of complementary structures. This is easily seen in Fig. 2B, where SLSs found in naturally occurring genomes are plotted against GC content, after sequence length normalization. The dotted line represents SLSs found in random sequences of different GC content, all 1 Mbase long, produced by ten runs of the reshuffle tool. Variations are always within a very small (about 1%) range. As expected, random sequences stochastically give rise to a number of SLSs, which regularly grows from a minimum, for a 50% GC content, to larger numbers as GC content either decreases or increases (Fig. 2B).
Specific SLS subsets are selectively enriched in the natural genomes. The largest differences are observed with higher stability structures where the random component is expected to be lower. SLSs including the smallest loops (shorter than 20 bases) also appear to be more frequent in natural genomes, possibly including specific classes of RNA structures (Fig. 3).
Identification of specific SLS groups
AT-rich terminator-like sequences in low-GC firmicutes
Distribution of intergenic SLSs
SLSs spanning repetitive DNA elements
The ability of a DNA or an RNA segment to fold into a stem-loop structure derives from the presence of complementary bases, and such segments stochastically occur in every large sequence, no matter the origin, even randomly generated, provided that some level of balanced distribution of nucleotides within single strand is guaranteed. This is certainly true in bacterial genome sequences, where oligonucleotide distribution reveals compositional symmetries in a variety of complete genomes [36, 37]. The problem of evaluating the relevance of a particular motif in terms of the likelihood of generating it by chance in a given sequence has been extensively faced (see for example the work by Robin and coworkers  for the probability of finding a motif composed of two 'boxes' separated by a variable distance). Here we chose an 'experimental' approach, based on randomized genomes produced by reshuffling the natural one, with two types of constraints: preservation of a variety of k-let frequencies and a more complex model where genic and intergenic regions are separately shuffled with conservation of aminoacid sequence and codon usage. SLSs found in naturally occurring genomes clearly outnumber those expected from the result of similar analysis in their randomized counterparts (Figs. 2 and 3). It appears that natural genomes somehow tend to favour the formation of specific sets of stem-loop structures, typically the more stable ones. These sets significantly contribute to the higher SLS numbers observed in naturally occurring genomes, compared to their random counterparts. The phenomenon has been observed in bacterial genomes which widely differ in terms of size, GC content, evolutionary relatedness. Data are in agreement with literature reports, showing that, in large-scale analyses of prokaryotic mRNA populations, coding regions had a significant bias toward more local secondary structure potential than expected .
The evolutionary pressure promoting the potential formation of stem-loop structures at genome-wide level may serve different functional purposes. At the DNA level, stem-loop structures may play a role in replication, transcription, and recombination. However, as the vast majority of prokaryotic genomes is composed of expressed, protein-coding, regions, the contribution to mRNA secondary structure formation should be taken into account for most SLSs, especially those including G-U pairs. Most SLSs fall within coding regions (Fig. 1), in agreement with their size, which typically exceeds those of non-coding regions by a factor of ten. Still, when evaluating their significance, ribosome coverage and formation of secondary structures within protein-coding regions should be regarded as alternative, ribosomes being expected to prevent the formation of most low stability mRNA structures. Higher stability structures may however result in translational pausing, possibly used in regulatory mechanisms such as attenuation . In specific instances, coding SLSs correspond either to remnants of transposon-like sequences , or to regions encoding repetitive protein domains, such as those found in the mycobacterial PE genes or in anchored cell-wall proteins conserved in several microorganisms (not shown).
Although less numerous, SLSs tend to be more frequent within the much smaller IGRs, where a typical bias towards energy levels and genome localization may be observed, highly indicative of specific, non-random, SLS subpopulations. All the analyzed low-GC firmicutes feature a marked enrichment in higher stability intergenic SLSs. Both structure and genomic location suggest that most of these sequences may function as rho-independent transcriptional terminators. The finding is not surprising per se, since the transcriptional factor rho is not essential in Bacillus subtilis and Staphylococcus aureus, and other Gram-positive bacteria with a low GC-content lack a rho homolog [39, 40]. However, SLSs found in low-GC firmicutes are atypical as transcriptional terminators, as most of them carry, in addition to the canonical 3' U-rich tract, a complementary A-rich tract at the 5'-end (Fig. 5). This arrangement is known not to impair termination as, for example, in the E. coli thr operon attenuator, the terminator features a GC-rich stem-loop flanked by 9 Us at the 3'-, and 6 As at the 5'-end, and site-directed mutagenesis has shown that upstream adenines are neither essential, nor detrimental to transcription termination . The 4A/4T containing SLSs found in low-GC firmicutes, when located at a short distance from convergently transcribed genes, may function as bi-directional terminators . Alternatively, these AU-rich SLSs may serve additional functions, such as mRNA stabilization, as point mutations in transcription terminators are known to affect the stability of upstream RNA segments [43, 44].
Bacteria other than low-GC firmicutes do not feature similar AT-rich terminator-like structures, still the distribution of SLSs within IGRs is clearly non random. When the frequency of SLSs is analyzed according to the type of IGR, all bacteria show a strong preference for SLSs within convergent, i.e. flanking the 3'-end of CDSs, rather than divergent IGRs (Fig. 6). Furthermore within unidirectional IGRs, higher stability intergenic SLSs are also preferentially found within the 50 bp tract immediately following the stop codon of the neighbouring CDS (Fig. 7). This distribution strongly favours the notion that most higher stability intergenic SLSs are transcribed, and may therefore function at the RNA level. Although termination is the expected role for a large fraction of them, especially in bacteria where rho dependent termination is not relevant, their number and the observed sequence features leave open the possibility of additional roles, such as RNA stabilization, translational regulation by riboswitches and attenuators [19, 16]. Alternatively these SLS may be targeted by specific nucleases and rapidly degraded, thus functioning as RNA instability determinants. Finally, it must be recalled that some intergenic SLSs may be transcribed independently of the flanking genes. In recent years several groups provided support to the notion that prokaryotic intergenic sequences encode a variety of small, non-coding (nc) RNAs fulfilling diverse functions [reviewed ]. It will be of interest to assess whether selected intergenic SLSs may lead to the identification of novel nc-RNAs in RNA populations.
Some SLSs show strong similarity with each other, and may be grouped into families of repetitive sequences. Here we describe Bor sequences (Fig. 8), a set of palindromic elements, over-represented in all Bordetellae, which recall in length and sequence the E. coli REP sequences. Bor containing RNA may fold into hairpins similar to REP RNA, and possibly play an analogous role, i.e. the stabilization of the cotranscribed mRNA . The larger Ype and Efa elements (Fig. 8) are members of less numerous DNA families spread in the genomes of Y. pestis and E. faecalis, respectively. These sequences are similar in size and abundance to other intergenic repeats, such as NEMIS in N. meningitidis  and ERIC in Yersiniae , which are cotranscribed with flanking genes and may fold into similarly organized RNA hairpins. Preliminary data indicate that both Ype and Efa RNA elements may indeed enhance the stability of cotranscribed mRNA sequences [De Gregorio E, Silvestro G and Di Nocera PP, unpublished results]. Quantitatively, members of these families only account for a small fraction of intergenic SLSs. As revealed by a preliminary BLAST analyses (not shown), further substantial similarities may be detected within the identified SLSs. Each of these families may therefore be extended, by including more elements sharing sequence similarity, but not initially found because of the presence of defective, or less pronounced, secondary structures. Further work will be necessary to eventually obtain a systematic classification of bacterial DNA families spanning, or coinciding with SLSs.
An in-depth analysis of SLS features and distribution was carried out in 40 different bacterial species. Data suggest that an evolutionary pressure preserved specific non random populations of higher stability SLSs in most of the analyzed genomes. Many of these sequences are plausibly transcribed, and may be involved in transcriptional and/or post-transcriptional control. Specific SLS containing sequences are members of three previously undescribed families of repeated sequences found in Yersiniae, Bordetellae and Enterococci.
Genomic sequence data
Complete genomic sequences and their annotations about CDS, rRNA and tRNA were downloaded from the online repository made available at The Institute for Genomic Research (TIGR). Automatic annotations have been stored into a SQL database (SLS-DB), for further analysis. PostgreSQL has been used as the SQL Database Management System , according to techniques previously described [47, 48].
SLS identification was performed by using the program rnamotif of the package RNAMOTIF, version 2.1.2  according to the following rules:
-GU pairing in the stem was allowed
-the minimal stem length was 12 bp
-loop length could vary from 5 to 100 nt
-1 bulged or 1 mispaired base, at least two matches away from the ends of the stem, was allowed.
As a consequence of the constraints imposed, the smallest SLS that could be found is 29 bp. Due to the allowance for GU pairing, rnamotif had to be run on both strands of the input sequence. Completely overlapping SLSs were discarded by 6 runs of the rmprune tool, also from the RNAMOTIF package.
The Gibbs free energy (dG) of each SLS containing region was calculated by calling the built-in function efn2 of rnamotif. The minimum free energy with no constraint for SLS formation was obtained by running the program mfold developed by Zuker and coworkers  on the SLS sequences.
When two or more SLSs where found overlapping, only the most stable one was counted.
Intergenic regions (IGRs) were derived, stored and annotated into the SLS-DB, according to the ORF collection provided by TIGR. For some tests, IGRs of size ranging from 29 to 500 bp were selected (see also legend to Figure 6).
The program Shufflet  was used to generate random sequences and to shuffle bacterial genomes by preserving k-lets of different lengths. In order to shuffle sequences with k-let higher than 6, Shufflet was compiled by setting the variable MAXORDER to 15. An alternative shuffling method (referred to as DS in Fig. 3), was used to take into account the information about protein coding sequences. Basically coding regions were shuffled by using the program Dicodonshuffle , while Shufflet set to k-let = 2 was used for non coding regions.
We wish to thank Luca Cozzuto for suggestions and useful discussions, Tommaso Russo, Carmelo Bruno Bruni, Concetta Pietropaolo and Maria Stella Carlomagno for critically reading the manuscript. Informatic support by Gianluca Busiello is also acknowledged.
This work has been supported by a grant of the agency PRIN 2004 to PPDN, by a grant of the agency PRIN 2005 to GP, by a MIUR grant to CEINGE (12/2000) and a MIUR FIRB (LITBIO).
- Borodina I, Krabben P, Nielsen J: Genome-scale analysis of Streptomyces coelicolor A3(2) metabolism. Genome Res. 2005, 15: 820-829. 10.1101/gr.3364705.PubMedPubMed CentralView ArticleGoogle Scholar
- Hayashi T, Makino K, Ohnishi M, Kurokawa K, Ishii K, Yokoyama K, Han CG, Ohtsubo E, Nakayama K, Murata T, Tanaka M, Tobe T, Iida T, Takami H, Honda T, Sasakawa C, Ogasawara N, Yasunaga T, Kuhara S, Shiba T, Hattori M, Shinagawa H: Complete genome sequence of enterohemorrhagic Escherichia coli O157:H7 and genomic comparison with a laboratory strain K-12. DNA Res. 2001, 8: 11-22. 10.1093/dnares/8.1.11.PubMedView ArticleGoogle Scholar
- Dobrindt U, Agerer F, Michaelis K, Janka A, Buchrieser C, Samuelson M, Svanborg C, Gottschalk G, Karch H, Hacker J: Analysis of genome plasticity in pathogenic and commensal Escherichia coli isolates by use of DNA arrays. J Bacteriol. 2003, 185: 1831-1840. 10.1128/JB.185.6.1831-1840.2003.PubMedPubMed CentralView ArticleGoogle Scholar
- Grifantini R, Bartolini E, Muzzi A, Draghi M, Frigimelica E, Berger J, Ratti G, Petracca R, Galli G, Agnusdei M, Giuliani MM, Santini L, Brunelli B, Tettelin H, Rappuoli R, Randazzo F, Grandi G: Previously unrecognized vaccine candidates against group B meningococcus identified by DNA microarrays. Nat Biotechnol. 2002, 20: 914-921. 10.1038/nbt728.PubMedView ArticleGoogle Scholar
- Frank AC, Amiri H, Andersson SG: Genome deterioration, loss of repeated sequences and accumulation of junk DNA. Genetica. 2002, 115: 1-12. 10.1023/A:1016064511533.PubMedView ArticleGoogle Scholar
- Achaz G, Coissac E, Netter P, Rocha EP: Associations between inverted repeats and the structural evolution of bacterial genomes. Genetics. 2003, 164: 1279-1289.PubMedPubMed CentralGoogle Scholar
- Audit B, Ouzounis CA: From genes to genomes, universal scale-invariant properties of microbial chromosome organisation. J Mol Biol. 2003, 332: 617-633. 10.1016/S0022-2836(03)00811-8.PubMedView ArticleGoogle Scholar
- Rocha EP, Danchin A: Gene essentiality determines chromosome organisation in bacteria. Nucleic Acids Res. 2003, 31: 6570-6577. 10.1093/nar/gkg859.PubMedPubMed CentralView ArticleGoogle Scholar
- Chain PS, Carniel E, Larimer FW, Lamerdin J, Stoutland PO, Regala WM, Georgescu AM, Vergez LM, Land ML, Motin VL, Brubaker RR, Fowler J, Hinnebusch J, Marceau M, Medigue C, Simonet M, Chenal-Francisque V, Souza B, Dacheux D, Elliott JM, Derbise A, Hauser LJ, Garcia E: Insights into the evolution of Yersinia pestis through whole-genome comparison with Yersinia pseudotuberculosis. Proc Natl Acad Sci U S A. 2004, 101: 13826-13831. 10.1073/pnas.0404012101.PubMedPubMed CentralView ArticleGoogle Scholar
- Hallin PF, Ussery DW: CBS Genome Atlas Database: a dynamic storage for bioinformatic results and sequence data. Bioinformatics. 2004, 20: 3682-3686. 10.1093/bioinformatics/bth423.PubMedView ArticleGoogle Scholar
- van Belkum A, van Leeuwen W, Scherer S, Verbrugh H: Occurrence and structure-function relationship of pentameric short sequence repeats in microbial genomes. Res Microbiol. 1999, 150: 617-626. 10.1016/S0923-2508(99)00129-1.PubMedView ArticleGoogle Scholar
- Salaun L, Linz B, Suerbaum S, Saunders NJ: The diversity within an expanded and redefined repertoire of phase-variable genes in Helicobacter pylori. Microbiology. 2004, 150: 817-830. 10.1099/mic.0.26993-0.PubMedView ArticleGoogle Scholar
- Seffens W, Digby D: RNAs have greater negative folding free energies than shuffled or codon choice randomized sequences. Nucleic Acids Res. 1999, 27: 1578-1584. 10.1093/nar/27.7.1578.PubMedPubMed CentralView ArticleGoogle Scholar
- Workman C, Krogh A: No evidence that mRNAs have lower folding free energies than random sequences with the same dinucleotide distribution. Nucleic Acids Res. 1999, 27: 4816-4822. 10.1093/nar/27.24.4816.PubMedPubMed CentralView ArticleGoogle Scholar
- Katz L, Burge CB: Widespread selection for local RNA secondary structure in coding regions of bacterial genes. Genome Res. 2003, 13: 2042-2051. 10.1101/gr.1257503.PubMedPubMed CentralView ArticleGoogle Scholar
- Henkin TM, Yanofsky C: Regulation by transcription attenuation in bacteria, how RNA provides instructions for transcription termination/antitermination decisions. Bioessays. 2002, 8: 700-707. 10.1002/bies.10125.View ArticleGoogle Scholar
- Berg BL, Baron C, Stewart V: Nitrate-inducible formate dehydrogenase in E. coli K12. II. Evidence that a mRNA stem-loop structure is essential for decoding opal (UGA) as selenocysteine. J Biol Chem. 1991, 33: 22386-22391.Google Scholar
- Martínez-Abarca F, Toro N: Group II introns in the bacterial world. Mol Microbiol. 2000, 38: 917-926. 10.1046/j.1365-2958.2000.02197.x.PubMedView ArticleGoogle Scholar
- Nudler E, Mironov AS: The riboswitch control of bacterial metabolism. Trends Biochem Sci. 2004, 29: 11-17. 10.1016/j.tibs.2003.11.004.PubMedView ArticleGoogle Scholar
- Coburn GA, Mackie GA: Degradation of mRNA in Escherichia coli: an old problem with some new twists. Prog Nucleic Acid Res Mol Biol. 1999, 62: 55-108.PubMedView ArticleGoogle Scholar
- Gilson E, Saurin W, Perrin D, Bachellier S, Hofnung M: The BIME family of bacterial highly repetitive sequences. Res Microbiol. 1991, 142: 217-222. 10.1016/0923-2508(91)90033-7.PubMedView ArticleGoogle Scholar
- De Gregorio E, Abrescia C, Carlomagno MS, Di Nocera PP: Ribonuclease III-mediated processing of specific Neisseria meningitidis mRNAs. Biochem J. 2003, 374: 799-805. 10.1042/BJ20030533.PubMedPubMed CentralView ArticleGoogle Scholar
- De Gregorio E, Silvestro G, Petrillo M, Carlomagno MS, Di Nocera PP: Genomic organization and functional properties of ERIC DNA repeats in Yersiniae. J Bact. 2005, 187: 7945-7954. 10.1128/JB.187.23.7945-7954.2005.PubMedPubMed CentralView ArticleGoogle Scholar
- Krasilnikov AS, Podtelezhnikov A, Vologodskii A, Mirkin SM: Large-scale effects of transcriptional DNA supercoiling in vivo. J Mol Biol. 1999, 292: 1149-1160. 10.1006/jmbi.1999.3117.PubMedView ArticleGoogle Scholar
- Jin R, Novick RP: Role of the double-strand origin cruciform in pT181 replication. Plasmid. 2001, 46: 95-105. 10.1006/plas.2001.1535.PubMedView ArticleGoogle Scholar
- Yamada K, Ariyoshi M, Morikawa K: Three-dimensional structural views of branch migration and resolution in DNA homologous recombination. Curr Opin Struct Biol. 2004, 14: 130-137. 10.1016/j.sbi.2004.03.005.PubMedView ArticleGoogle Scholar
- Smith HO, Gwinn ML, Salzberg SL: DNA uptake signal sequences in naturally transformable bacteria. Res Microbiol. 1999, 150: 603-616. 10.1016/S0923-2508(99)00130-8.PubMedView ArticleGoogle Scholar
- Davidsen T, Rodland EA, Lagesen K, Seeberg E, Rognes T, Tonjum T: Biased distribution of DNA uptake sequences towards genome maintenance genes. Nucleic Acids Res. 2004, 32: 1050-1058. 10.1093/nar/gkh255.PubMedPubMed CentralView ArticleGoogle Scholar
- Parkhill J, Achtman M, James KD, Bentley SD, Churcher C, Klee SR, Morelli G, Basham D, Brown D, Chillingworth T, Davies RM, Davis P, Devlin K, Feltwell T, Hamlin N, Holroyd S, Jagels K, Leather S, Moule S, Mungall K, Quail MA, Rajandream MA, Rutherford KM, Simmonds M, Skelton J, Whitehead S, Spratt BG, Barrell BG: Complete DNA sequence of a serogroup A strain of Neisseria meningitidis Z2491. Nature. 2000, 404: 502-506. 10.1038/35006655.PubMedView ArticleGoogle Scholar
- Claverie JM, Ogata H: The insertion of palindromic repeats in the evolution of proteins. Trends Biochem Sci. 2003, 28: 75-80. 10.1016/S0968-0004(02)00036-1.PubMedView ArticleGoogle Scholar
- Ermolaeva MD, Khalak HG, White O, Smith HO, Salzberg SL: Prediction of transcription terminators in bacterial genomes. J Mol Biol. 2000, 301: 27-33. 10.1006/jmbi.2000.3836.PubMedView ArticleGoogle Scholar
- Lesnik EA, Sampath R, Levene HB, Henderson TJ, McNeil JA, Ecker DJ: Prediction of rho-independent transcriptional terminators in Escherichia coli. Nucleic Acids Res. 2001, 29: 3583-3594. 10.1093/nar/29.17.3583.PubMedPubMed CentralView ArticleGoogle Scholar
- Unniraman S, Prakash R, Nagaraja V: Conserved economics of transcription termination in eubacteria. Nucleic Acids Res. 2002, 30: 675-684. 10.1093/nar/30.3.675.PubMedPubMed CentralView ArticleGoogle Scholar
- Bachellier S, Clement JM, Hofnung M: Short palindromic repetitive DNA elements in enterobacteria: a survey. Res Microbiol. 1999, 150: 627-639. 10.1016/S0923-2508(99)00128-X.PubMedView ArticleGoogle Scholar
- Aranda-Olmedo I, Tobes R, Manzanera M, Ramos JL, Marques S: Species-specific repetitive extragenic palindromic (REP) sequences in Pseudomonas putida. Nucleic Acids Res. 2002, 30: 1826-1833. 10.1093/nar/30.8.1826.PubMedPubMed CentralView ArticleGoogle Scholar
- Qi D, Cuticchia AJ: Compositional symmetries in complete genomes. Bioinformatics. 2001, 17: 557-559. 10.1093/bioinformatics/17.6.557.PubMedView ArticleGoogle Scholar
- Baisnee PF, Hampson S, Baldi P: Why are complementary DNA strands symmetric?. Bioinformatics. 2002, 18: 1021-1033. 10.1093/bioinformatics/18.8.1021.PubMedView ArticleGoogle Scholar
- Robin S, Daudin JJ, Richard H, Sagot MF, Schbath S: Occurrence probability of structured motifs in random sequences. J Comput Biol. 2002, 9: 761-773. 10.1089/10665270260518254.PubMedView ArticleGoogle Scholar
- Ingham CJ, Tennis J, Furneaux PA: Autogenous regulation of transcription termination factor Rho and the requirement for Nus factors in Bacillus subtilis. Mol Microbiol. 1999, 31: 651-663. 10.1046/j.1365-2958.1999.01205.x.PubMedView ArticleGoogle Scholar
- Washburn RS, Marra A, Bryant AP, Rosenberg M, Gentry DR: Rho is not essential for viability or virulence in Staphylococcus aureus. Antimicrob Agents Chemother. 2001, 45: 1099-1103. 10.1128/AAC.45.4.1099-1103.2001.PubMedPubMed CentralView ArticleGoogle Scholar
- Yang MT, Scott HB, Gardner JF: Transcription termination at the thr attenuator Evidence that the adenine residues upstream of the stem and loop structure are not required for termination. J Biol Chem. 1995, 270: 23330-23336. 10.1074/jbc.270.40.23330.PubMedView ArticleGoogle Scholar
- Carlomagno MS, Riccio A, Bruni CB: Convergently functional, Rho-independent terminator in Salmonella typhimurium. J Bacteriol. 1985, 163: 362-368.PubMedPubMed CentralGoogle Scholar
- Abe H, Aiba H: Differential contributions of two elements of rho-independent terminator to transcription termination and mRNA stabilization. Biochimie. 1996, 78: 1035-1042. 10.1016/S0300-9084(97)86727-2.PubMedView ArticleGoogle Scholar
- Cisneros B, Court D, Sanchez A, Montanez C: Point mutations in a transcription terminator, lambda tI, that affect both transcription termination and RNA stability. Gene. 1996, 181: 127-133. 10.1016/S0378-1119(96)00492-1.PubMedView ArticleGoogle Scholar
- Storz G, Altuvia S, Wassarman KM: An abundance of RNA regulators. Annu Rev Biochem. 2005, 74: 199-217. 10.1146/annurev.biochem.74.082803.133136.PubMedView ArticleGoogle Scholar
- PostgreSQL Project. 2005, [http://www.postgresql.com/index.html]
- Boccia A, Petrillo M, di Bernardo D, Guffanti A, Mignone F, Confalonieri S, Luzi L, Pesole G, Paolella G, Ballabio A, Banfi S: DG-CST (Disease Gene Conserved Sequence Tags), a database of human-mouse conserved elements associated to disease genes. Nucleic Acids Res. 2005, D505-10. 33 Database
- Milanesi L, Petrillo M, Sepe L, Boccia A, D'Agostino N, Passamano M, Di Nardo S, Tasco G, Casadio R, Paolella G: Systematic analysis of human kinase genes: a large number of genes and alternative splicing events result in functional and structural diversity. BMC Bioinformatics. 2005, 6 (Suppl 4): S20-10.1186/1471-2105-6-S4-S20.PubMedPubMed CentralView ArticleGoogle Scholar
- Macke TJ, Ecker DJ, Gutell RR, Gautheret D, Case DA, Sampath R: RNAMotif, an RNA secondary structure definition and search algorithm. Nucleic Acids Res. 2001, 29: 4724-4735. 10.1093/nar/29.22.4724.PubMedPubMed CentralView ArticleGoogle Scholar
- Mathews DH, Sabina J, Zuker M, Turner DH: Expanded sequence dependence of thermodynamic parameters provides robust prediction of rna secondary structure. J Mol Biol. 1999, 288: 910-940. 10.1006/jmbi.1999.2700.View ArticleGoogle Scholar
- Coward E: Shufflet, shuffling sequences while conserving the k-let counts. Bioinformatics. 1999, 12: 1058-1059. 10.1093/bioinformatics/15.12.1058.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.