Overlapping genes: a window on gene evolvability
© Huvet and Stumpf; licensee BioMed Central Ltd. 2014
Received: 21 November 2013
Accepted: 18 August 2014
Published: 27 August 2014
The forces underlying genome architecture and organization are still only poorly understood in detail. Overlapping genes (genes partially or entirely overlapping) represent a genomic feature that is shared widely across biological organisms ranging from viruses to multi-cellular organisms. In bacteria, a third of the annotated genes are involved in an overlap. Despite the widespread nature of this arrangement, its evolutionary origins and biological ramifications have so far eluded explanation.
Here we present a comparative approach using information from 699 bacterial genomes that sheds light on the evolutionary dynamics of overlapping genes. We show that these structures exhibit high levels of plasticity.
We propose a simple model allowing us to explain the observed properties of overlapping genes based on the importance of initiation and termination of transcriptional and translational processes. We believe that taking into account the processes leading to the expression of protein-coding genes hold the key to the understanding of overlapping genes structures.
The extent to which the protein coding regions of different genes overlap is a striking feature of many genomes. When two genes overlap the same portion of the DNA codes for the constituent amino acids of the two, typically different, proteins involved in the overlap. These overlapping structures can be observed in viruses , prokaryotes , and also eukaryotes . Several previous studies have already made attempts at characterizing overlapping genes (OGs) in bacteria. These revealed a correlation between the number of genes in an organism and the number of OGs: approximately a third of genes are involved in an overlap arrangement . Co-oriented overlapping gene pairs (OGPs) represent the vast majority of these observed structures, and very few incidences of OGPs with divergent orientation have been found . In addition, the majority of OGPs overlap by 2 nucleotides, and more generally have a “+2” frame shift, and hardly any “+0” frame shift cases (overlap of three, or multiples of three, nucleotides) are observed [4, 6–8].
Different hypotheses have been put forward to explain the role and/or benefit of this configuration, including: (i) improved genome compaction [1, 9, 10], and (ii) implications for translation regulation through the mechanism of translational coupling [1, 10, 11]. The development of the genome compaction model is closely related to properties of those genomes in which the first overlaps were observed: viruses. In viruses a large proportion of genes overlap, which leads to a significant reduction in the number of nucleotides needed for encoding genes, and reduces the time and material needed to generate new genetic material for the creation of new viruses. The second hypothesis involving translational coupling is based on the molecular properties of the translation machinery. Usually protein-coding genes are transcribed into RNA, which is then (potentially after further editing) translated into a protein by the translation machinery. In bacteria some protein coding genes form sets (operons), which are transcribed together leading to polycistronic RNA. This is then translated into as many different proteins as there were genes in the operon. Diverse lines of evidence suggest that the production of these proteins may not be independent. After translating the sequence corresponding to one gene, the same translational machinery could continue and directly start the translation of the following gene [11, 12]. This process, referred to as translational coupling, is dependent on the distance between the successive protein coding sequences in the polycistronic mRNA. The presence of OGs could impact on the feasibility of this process. Despite the accumulation of information related to OGs, the possible biological implications are still far from being understood. However, this has not stopped researchers from developing mechanistic models that lead to the creation of these structures. The models of the underlying mechanism are based on modification of the genes’ boundaries. One model is in favor of STOP codon position change , while a more recent one makes the case for a START codon position change . In both cases the models rely on the possibility for a gene to lose its START or STOP codon due to a mutation leading to the use of an alternative START or STOP codon.
Previously reported work mainly falls into two categories: (i) statistical analyses of large numbers of OGs considered together and often compared with the observed frequencies of START and STOP codons along the different reading frames surrounding the genes [4, 5, 13, 14]; and (ii) “anecdotal” analyses of specific OGPs, usually related to a specific species or biological system [6, 15–17], together with an analysis of the extent of the overlap through evolution. Only a few have looked at the phylogenetic profiles of large number of OGPs [8, 18]. Developing this type of approach has the potential to characterize the salient features of OGs, such as the stability of the overlap, and identify the scenarios under which the creation of an overlap is not deleterious.
In this paper we describe an analysis of the genomic organization of OGs in bacteria. By taking a comparative perspective (using data from almost 700 bacterial genomes) we are able to shed light on the evolutionary dynamics of this important and widely shared genomic structure. In contrast to earlier analyses [4, 9] we argue that OG structures have high plasticity, and (within the scope of suitable arrangements, which we will also discuss at some length) can easily be gained or lost. Finally, we discuss how failure to properly account for genomic structures such as operons in the analysis of overlapping genes can (and indeed has) lead to misleading interpretations.
Properties of OGs in 699 species
The observed number of genes involved in more than one overlap might simply be a byproduct of the creation of OG. To test this hypothesis we computed the expected number of genes involved in overlaps according to two sampling approaches. In a first case, we have sampled 680 adjacent gene pairs based on the E. coli genomic organization and computed the number of genes sampled more than once. In the second case, we have restricted the sampling to the co-oriented adjacent gene pairs based on the fact that most of the genes involved in OGPs are co-oriented. The expected number of genes involved in more than one OGP is 111 (sd = 8.7) and 92 (sd = 8), respectively, both smaller than the 174 observed in E. coli (t-test p-value < 0.05). Independent creation of OGPs thus cannot explain the observed abundances. This seems to support a possible biological function of OGP structures [4, 5, 14]. However, we argue that the number of genes involved in more than one overlap could be explained by taking into account genomic structures (see Discussion).
Evolution of OGs
In comparison to the statistical characteristics of OGs, their evolutionary characteristics have received little attention: previous work suggests that genes involved in overlaps tend to be more conserved . However, a large proportion of OGPs are part of operons (in E. coli 889 genes have their overlapping partner in the same operon). Because of this we have considered the evolution of overlaps in more detail. Using E. coli as a reference species we have identified orthologous genes in 698 bacterial species using reciprocal best hit and merged these data with information on genomic organization.
The different models put forward to explain the creation of overlapping gene structures are based on modification of the genes’ boundaries, with models in favour of either STOP  or START  codon positions changing as the main, or indeed sole, mechanism. We would like to start this discussion by reinforcing the idea that the movement of both START and STOP codons are needed to explain all observable OGP configurations; e.g. in order to create convergent and divergent OGPs changes of STOP, respectively, START codons are needed. The main question is associated with the mechanism leading to the creation of overlapping genes for co-oriented genes, especially since co-oriented overlapping genes represent the largest population of OGPs. Importantly, the different studies of OGPs published focused on the mechanism leading to their creation, but not to their loss. Our results suggest a relatively high turnover of the properties of adjacent gene pairs, including the creation and loss of overlap between coding sequence of proteins, and relatively rapid variation of the distance separating the coding sequences, including the size of overlaps. Because of these observations we believe that more work needs to be done in order to understand OGP evolution. Before progressing further with this discussion we would like to acknowledge that the analysis performed in this paper relies on the correct annotation of the different genes (just as any other bioinformatics study of OGPs, or other gene analysis in general). By using RefSeq, a non-redundant and curated database, we tried to minimize the risk of misannotated gene extremities/boundaries (and minimize the risk of misannotated overlaps along the way).
Because both a movement of START or STOP codons can lead to the creation of OGP in a co-oriented context, disentangling the mechanisms is challenging. However, if we assume that the different OGPs are created by the same mechanisms, the properties of the OGPs that are not co-oriented should give us relevant insights into the OGP formation mechanisms. Remarkably, there are more convergent than divergent OGPs (p-value ≈ 2.2e-16 < 0.05 obtained using Fisher’s Exact Test, which is not the case when all adjacent genes pairs are analyzed together), suggesting that the creations of different types of OGPs are not equivalent. The excess of convergent overlapping genes suggests that the movement of STOP codons is more frequent in the creation of OGPs. If the process leading to the creation of OGPs is independent of the orientation of the genes, this observation suggests that movement of STOP should be a predominant mechanism in the creation of OGPs. However, Cock et al. put forward the importance of the movement of START codons in the creation of OGPs for co-oriented genes OGPs based on the position of alternative START codons . Therefore, if we assume that START codon movement is really a predominant mechanism leading to the creation of co-oriented OGPs, then negative selection probably acts against the movement of STOP codons. This aspect has, however, not been discussed previously. It is important to note that movement of STOP codons has been documented independently from overlapping genes . This reinforces the idea that, in addition to being part of the evolution of coding genes, the movement of STOP codon is also likely to be part of the evolution of OGPs.
In addition, intrinsic differences between START and STOP codons can lead to a bias in the creation of OGPs. The most obvious one is probably the importance of the START codon in the establishment of the reading frame of a gene. A change of position of a STOP codon should have little impact on the reading frame. Whereas changing the position of a START codon could lead to a reading frame change in two cases out of three. This could lead to a negative selection against OGP formation by START codon movement. Other signals (e.g. signals regulating initiation and termination of translation) could also impose further limitations. For example, an alternative START is unlikely to be used if it is upstream of the ribosome binding site. It is only once such additional information is taken into account that we will be able to fully grasp the properties of gene extremities movement.
This fact highlights that some properties, independent from the presence of alternative START and STOP codons, could lead to bias regarding the use of potential alternative START and STOP codons. The points made earlier are far from representing an exhaustive list of the possible sources of constrains regarding the creation and loss of OGPs. This just highlights the fact that studying OGPs by only looking at the properties of alternative START and STOP could lead to an incomplete understanding of their properties and evolution, which could potentially explain why we still lack of unifying model of OGP evolution.
The relative number of convergent and divergent OGPs gives us a potential insight into the mechanism leading to the creation of OGPs. We believe that the relative number of co-oriented OGPs compared to the other types of OGPs is also revealing. As stated before, in a system where all genes were under the same type of restriction on the creation of OGPs, the properties of co-oriented OGPs should be related to the properties of convergent and divergent OGPs. This is due to the fact that in co-oriented gene configurations, the first gene (according to the order of transcription) should behave like a gene in a convergent configuration, and the second like a gene found in a divergent configuration. In this case, the number of co-oriented OGPs should be equivalent to the number of convergent and divergent OGPs. However, co-oriented OGPs occur much more frequently than convergent and divergent OGPs (p-value ≈ 2.2e-16 < 0.05, obtained using Fisher’s Exact Test; this is also the case when all adjacent gene pairs are analyzed. However the ratio of co-oriented genes divided by the number of convergent plus divergent pairs is significantly larger for overlapping genes than for all adjacent pairs with p-value ≈ 2.2e-16 < 0.05, obtained using Kolmogorov-Smirnov test). No explanations have been proposed thus far for this large difference.
We have seen earlier how the relationship between transcription and translation can impact the creation of OGPs. The existence of operons could explain the large number of co-oriented OGPs . In operons each gene has its own translation regulation signal but there is just one transcription initiation and one termination event. When a gene goes through an event leading to the creation of an overlap with one of the surrounding genes, the new coding sequence remains within the transcript and should not impact the regulation of transcription (with the exception of the movement of the START of the first gene or the STOP of the last). Therefore creation of OGPs inside operons could be more easily achieved than for genes transcribed into different mRNAs. This would explain the large number of co-oriented OGPs. In this operon configuration it is then possible to hypothesize that movement of the position of gene extremities could be relatively easily achieved, in the same way that the protein-coding sequence itself changes over time in all genes, leading to the observed plasticity of OGPs. Unfortunately, the lack of experimental characterization of operons in most species, in addition to the properties of the protocols used to predict operons (often relying solely on distances between coding sequences), make the assessment of this hypothesis challenging. A large-scale analysis of transcription profiles from different species will be needed to learn more about the possible impact of polycistronic RNAs on the creation of OGs.
Finally, the different points presented until now are associated to the creation of an overlap, which has been the main focus of the published work on OGPs. Our results show that OGP structures show high plasticity, stressing the need for a model describing the loss of OGPs. Here, a difference between the properties of START and STOP codon can provide a simple mechanism model based on the loss of OGPs. Moving the START codon downward the sequence, or the STOP codon upward, could lead to a loss of the overlap. However, the appearance of a START codon downward will not be enough. The best illustration of this is the use of ATG (the most frequent START codon) as a codon to encode methionine. For a START codon present inside the coding region of the gene to be used (a process that could lead to lead to a loss of overlap between OGPs) the original START needs to be lost. So for this process to lead to a loss of overlap or change in the length of the overlap two events are needed: the loss of the initial START codon and the acquisition of an alternative START inside the protein coding gene sequence in the relevant reading frame. Whereas, a STOP codon appearing upstream from an existing STOP codon one will be sufficient to lead to a shorter coding region independently of the loss of the original STOP. Once again, a change of STOP codon will have no impact on the reading frame. This suggests that movement of STOP codon is a likely process leading to the loss or change of length of overlap between adjacent genes.
At this stage, it is still difficult to appreciate fully all aspects involved in the evolution of OGPs. However, because our results suggests that OGPs are created and lost relatively easily it is only by building a model explaining gain and loss of OGPs that we will be able to understand overlapping gene structures.
All genes probably see their START and STOP codons change over time (as a natural and probably common occurrence during evolutionary history) leading to frequent creation of alternative version of genes, in the same way that mutations lead to changes inside genes. This would represent a generalized version of the OGP creation model and is reinforced by a recent study showing shift of STOP codons even over short evolutionary distances . In this model the presence of overlapping genes can then be seen as a by-product of sequence evolution and is not directly associated to a biological function as had been hypothesized (e.g. improving compression of the genome). This simple “neutral” model is in agreement with the high turnover of creation and loss of OGPs. This does not exclude, of course, the possibility that some of the OGs to have a biological function.
Our understanding of genome evolution is far from complete. Characterizing the different mechanisms associated to genome evolution, such as the creation of overlapping gene structures, will give us a better understanding of the important properties of biological systems and has the potential to improve fields such as synthetic biology; in this latter context in particular coordinating the expression of genes through such coupling is an attractive option to maintain control over the expression of different genes in response to simple transcriptional programs. More generally, such understanding is also essential to link genotypic and phenotypic evolution.
Sequences and annotations
The sequences and annotations for the 699 bacterial species (the genomic information is encapsulated in 759 chromosomes and 529 plasmids) considered here were retrieved from the “Genome Assembly/Annotation Projects” data from NCBI (ftp://ftp.ncbi.nih.gov/genomes/Bacteria/). For each species we used the refSeq annotation library. We decided not to apply any further filtering regarding genes or types of overlap in order to avoid the creation of bias in our dataset. Operon information was extracted from RegulonDB.
A Reciprocal Best Hit (RBH) approach was used to identify orthologues (using Blastall 2.2.19, ftp://ftp.ncbi.nih.gov/blast/executables/LATEST/). Each protein of Escherichia coli K12 MG 1655 was aligned against the full protein set of each of the 697 remaining bacterial species. The first hit of each Blast run was then aligned against the full set of E. coli K12 MG1655 proteins. Two proteins were defined as potential orthologues if and only if the first hit of the second blast search corresponded to the protein used as a query in the first one .
To minimize the number of false positive these results were further filtered by demanding that the alignment had to involve more than 50% of the sequence of the smaller protein, and sequence identity had to be larger than 30%.
We have analyzed the difference in size for the genes identified as ortholog. Out of the 688570 ortholog pairs identified less than 1% have a substantial size difference (3× or more). In addition, 98% of the orthologs have a size difference smaller than 2 folds.
The phylogeny was inferred using the 23S rRNA ; one sequence was chosen randomly from each organism. This set of 23S rRNA sequences was aligned using MAFFT 6.611 (http://align.bmr.kyushu-u.ac.jp/mafft/software/) and the phylogeny was inferred using PhyML 2.4.4 (http://atgc.lirmm.fr/phyml/). Different alignments and phylogeny inference procedures resulted in very similar trees.
Computation of the transition rates associated with overlapping pairs
For each OGP in E. coli, we characterized (and numerically encoded) the organization of the orthologues in each species as follows: overlapping (“3”), non-overlapping but adjacent (“2”), non-adjacent (“1”) and, finally, lacking orthologous gene for one or both genes (“0”). We then estimated the transition rates leading from one state to any other for all E. coli overlapping gene pairs using the BayesTrait software package (http://www.evolution.reading.ac.uk/BayesTraits.html), leading to a total of 663 matrices of the 12 transition rates. In order to compare results obtained from different pairs, each transition matrix was also normalized and we analyzed the inferred rates in order to infer which processes predominate the evolutionary dynamics of overlapping genes.
Simulation of random gene pair sampling
We computed the expected distribution of OGPs under the assumption that the creation of overlapping gene pairs is random. This number is then compare to the observations in E. coli. Two particular sampling schemes were considered: (i) uniform random sampling over all E. coli gene pairs; and (ii) uniform random sampling over only the co-oriented genes pairs present in E. coli. In both cases 10,000 samples of 680 adjacent gene pairs were generated.
Availability of supporting data
The tree supporting the results of this article is available in the treeBase repository: http://purl.org/phylo/treebase/phylows/study/TB2:S16248.
The authors thank the members of the Theoretical Systems Biology group at Imperial College for helpful discussion.
MH is an Imperial College Junior Research Fellow. MPHS is a Royal Society Wolfson Research Merit Award Holder. The funders played no part in the design and conduct of the research presented here.
- Normark S, Bergstrom S, Edlund T: Overlapping genes. Annu Rev Genet. 1983, 17: 499-525. 10.1146/annurev.ge.17.120183.002435.PubMedView ArticleGoogle Scholar
- Rogozin IB, Spiridonov AN, Sorokin AV, Wolf YI, Jordan IK, Tatusov RL, Koonin EV: Purifying and directional selection in overlapping prokaryotic genes. Trends Genet. 2002, 18: 228-232. 10.1016/S0168-9525(02)02649-5.PubMedView ArticleGoogle Scholar
- Makalowska I, Lin C-F, Makalowski W: Overlapping genes in vertebrate genomes. Comput Biol Chem. 2005, 29: 1-12. 10.1016/j.compbiolchem.2004.12.006.PubMedView ArticleGoogle Scholar
- Johnson ZI, Chisholm SW: Properties of overlapping genes are conserved across microbial genomes. Genome Res. 2004, 14: 2268-2272. 10.1101/gr.2433104.PubMed CentralPubMedView ArticleGoogle Scholar
- Fukuda Y, Nakayama Y, Tomita M: On dynamics of overlapping genes in bacterial genomes. Gene. 2003, 323: 181-187.PubMedView ArticleGoogle Scholar
- Cock PJA, Whitworth DE: Evolution of gene overlaps: relative reading frame bias in prokaryotic two-component system genes. J Mol Evol. 2007, 64: 457-462. 10.1007/s00239-006-0180-1.PubMedView ArticleGoogle Scholar
- Eyre-Walker A: The close proximity ofEscherichia coli genes: consequences for stop codon and synonymous codon use. J Mol Evol. 1996, 42: 73-78. 10.1007/BF02198830.PubMedView ArticleGoogle Scholar
- Lillo F, Krakauer DC: A statistical analysis of the three-fold evolution of genomic compression through frame overlaps in prokaryotes. Biol Direct. 2007, 2: 22-10.1186/1745-6150-2-22.PubMed CentralPubMedView ArticleGoogle Scholar
- Clark MA, Baumann L, Thao ML, Moran NA, Baumann P: Degenerative Minimalism in the Genome of a Psyllid Endosymbiont. J Bacteriol. 2001, 183: 1853-1861. 10.1128/JB.183.6.1853-1861.2001.PubMed CentralPubMedView ArticleGoogle Scholar
- Krakauer DC: Stability and evolution of overlapping genes. Evolution. 2000, 54: 731-739. 10.1111/j.0014-3820.2000.tb00075.x.PubMedView ArticleGoogle Scholar
- Inokuchi Y, Hirashima A, Sekine Y, Janosi L, Kaji A: Role of ribosome recycling factor (RRF) in translational coupling. EMBO J. 2000, 19: 3788-3798. 10.1093/emboj/19.14.3788.PubMed CentralPubMedView ArticleGoogle Scholar
- Ben B, van Duin J: Mechanism of translational coupling between coat protein and replicase genes of RNA bacteriophage MS2. Nucleic Acids Res. 1985, 13: 6955-6967. 10.1093/nar/13.19.6955.View ArticleGoogle Scholar
- Kingsford C, Delcher AL, Salzberg SL: A unified model explaining the offsets of overlapping and near-overlapping prokaryotic genes. Mol Biol Evol. 2007, 24: 2091-2098. 10.1093/molbev/msm145.PubMed CentralPubMedView ArticleGoogle Scholar
- Cock PJA, Whitworth DE: Evolution of relative reading frame bias in unidirectional prokaryotic gene overlaps. Mol Biol Evol. 2010, 27: 753-756. 10.1093/molbev/msp302.PubMedView ArticleGoogle Scholar
- Sakharkar KR, Sakharkar MK, Verma C, Chow VTK: Comparative study of overlapping genes in bacteria, with special reference to Rickettsia prowazekii and Rickettsia conorii. Int J Syst Evol Microbiol. 2005, 55: 1205-1209. 10.1099/ijs.0.63446-0.PubMedView ArticleGoogle Scholar
- Pallejà A, Harrington ED, Bork P: Large gene overlaps in prokaryotic genomes: result of functional constraints or mispredictions?. BMC Genomics. 2008, 9: 335-10.1186/1471-2164-9-335.PubMed CentralPubMedView ArticleGoogle Scholar
- Kaw MK, Blumenthal RM: Translational independence between overlapping genes for a restriction endonuclease and its transcriptional regulator. BMC Mol Biol. 2010, 11: 87-10.1186/1471-2199-11-87.PubMed CentralPubMedView ArticleGoogle Scholar
- Cheng C-H, Yang C-H, Chiu H-T, Lu CL: Reconstructing genome trees of prokaryotes using overlapping genes. BMC Bioinformatics. 2010, 11: 102-10.1186/1471-2105-11-102.PubMed CentralPubMedView ArticleGoogle Scholar
- Vakhrusheva AA, Kazanov MD, Mironov AA, Bazykin GA: Evolution of prokaryotic genes by shift of stop codons. J Mol Evol. 2011, 72: 138-146. 10.1007/s00239-010-9408-1.PubMedView ArticleGoogle Scholar
- Blattner FR, Plunkett G, Bloch CA, Perna NT, Burland V, Riley M, Collado-Vides J, Glasner JD, Rode CK, Mayhew GF, Gregor J, Davis NW, Kirkpatrick HA, Goeden MA, Rose DJ, Mau B, Shao Y: The complete genome sequence of Escherichia coli K-12. Science. 1997, 277: 1453-1462. 10.1126/science.277.5331.1453.PubMedView ArticleGoogle Scholar
- Cho B-K, Zengler K, Qiu Y, Park YS, Knight EM, Barrett CL, Gao Y, Palsson BØ: The transcription unit architecture of the Escherichia coli genome. Nat Biotechnol. 2009, 27: 1043-1049. 10.1038/nbt.1582.PubMedView ArticleGoogle Scholar
- Browning DF, Busby SJ: The regulation of bacterial transcription initiation. Nat Rev Microbiol. 2004, 2: 57-65. 10.1038/nrmicro787.PubMedView ArticleGoogle Scholar
- Ciampi MS: Rho-dependent terminators and transcription termination. Microbiology. 2006, 152: 2515-2528. 10.1099/mic.0.28982-0.PubMedView ArticleGoogle Scholar
- Henkin TM: Transcription termination control in bacteria. Curr Opin Microbiol. 2000, 3: 149-153. 10.1016/S1369-5274(00)00067-9.PubMedView ArticleGoogle Scholar
- Laursen BS, Sørensen HP, Mortensen KK, Sperling-Petersen HU: Initiation of protein synthesis in bacteria. Microbiol Mol Biol Rev. 2005, 69: 101-123. 10.1128/MMBR.69.1.101-123.2005.PubMed CentralPubMedView ArticleGoogle Scholar
- Güell M, Yus E, Lluch-Senar M, Serrano L: Bacterial transcriptomics: what is beyond the RNA horiz-ome?. Nat Rev Microbiol. 2011, 9: 658-669. 10.1038/nrmicro2620.PubMedView ArticleGoogle Scholar
- Wang J, Ai X, Mei H, Fu Y, Chen B, Yu Z, He J: High-throughput identification of promoters and screening of highly active promoter-5′-UTR DNA region with different characteristics from Bacillus thuringiensis. PLoS One. 2013, 8: e62960-10.1371/journal.pone.0062960.PubMed CentralPubMedView ArticleGoogle Scholar
- Kim D, Hong JS-J, Qiu Y, Nagarajan H, Seo J-H, Cho B-K, Tsai S-F, Palsson BO: Comparative analysis of regulatory elements between Escherichia coli and Klebsiella pneumoniae by genome-wide transcription start site profiling. PLoS Genet. 2012, 8: e1002867-10.1371/journal.pgen.1002867.PubMed CentralPubMedView ArticleGoogle Scholar
- Unniraman S, Prakash R, Nagaraja V: Conserved economics of transcription termination in eubacteria. Nucleic Acids Res. 2002, 30: 675-684. 10.1093/nar/30.3.675.PubMed CentralPubMedView ArticleGoogle Scholar
- Osbourn AE, Field B: Operons. Cell Mol Life Sci. 2009, 66: 3755-3775. 10.1007/s00018-009-0114-3.PubMed CentralPubMedView ArticleGoogle Scholar
- Moreno-Hagelsieb G, Latimer K: Choosing BLAST options for better detection of orthologs as reciprocal best hits. Bioinformatics. 2008, 24: 319-324. 10.1093/bioinformatics/btm585.PubMedView ArticleGoogle Scholar
- Ludwig W, Schleifer K: Bacterial phylogeny based on 16S and 23S rRNA sequence analysis. FEMS Microbiol Rev. 1994, 15: 155-173. 10.1111/j.1574-6976.1994.tb00132.x.PubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.