Separating the effects of mutation and selection in producing DNA skew in bacterial chromosomes
© Morton and Morton; licensee BioMed Central Ltd. 2007
Received: 11 April 2007
Accepted: 12 October 2007
Published: 12 October 2007
Many bacterial chromosomes display nucleotide asymmetry, or skew, between the leading and lagging strands of replication. Mutational differences between these strands result in an overall pattern of skew that is centered about the origin of replication. Such a pattern could also arise from selection coupled with a bias for genes coded on the leading strand. The relative contributions of selection and mutation in producing compositional skew are largely unknown.
We describe a model to quantify the contribution of mutational differences between the leading and lagging strands in producing replication-induced skew. When the origin and terminus of replication are known, the model can be used to estimate the relative accumulation of G over C and of A over T on the leading strand due to replication effects in a chromosome with bidirectional replication arms. The model may also be implemented in a maximum likelihood framework to estimate the locations of origin and terminus. We find that our estimations for the origin and terminus agree very well with the location of genes that are thought to be associated with the replication origin. This indicates that our model provides an accurate, objective method of determining the replication arms and also provides support for the hypothesis that these genes represent an ancestral cluster of origin-associated genes.
The model has several advantages over other methods of analyzing genome skew. First, it quantifies the role of mutation in generating skew so that its effect on composition, for example codon bias, can be assessed. Second, it provides an objective method for locating origin and terminus, one that is based on chromosome-wide accumulation of leading vs lagging strand nucleotide differences. Finally, the model has the potential to be utilized in a maximum likelihood framework in order to analyze the effect of chromosome rearrangements on nucleotide composition.
With the recent accumulation of complete bacterial genome sequences there has been increased attention to prokaryote chromosome organization. One prominent aspect of most of these genomes is that several features, such as nucleotide composition and coding strand bias, display an organization that is centered on the origin of replication . In these chromosomes, as exemplified by Escherichia coli [2, 3], replication initiates at a single origin (Ori) and proceeds bi-directionally to a terminus (Ter) where the two forks meet . This divides the chromosome into a replichore , defined as a chromosome with two oppositely replicated halves (or replication arms), within each of which there is a leading and lagging strand such that one DNA strand is leading within one replication arm but lagging within the other. Many bacterial genomes display a compositional asymmetry between the two DNA strands within a replication arm meaning that Parity Rule 2, which stipulates that the frequencies of A and T are equal as are the frequencies of G and C, is violated [6–8]. Observed strand asymmetry, or skew, in base composition is either a purine (G and A) or a keto (G and T) excess on one strand and the leading strand in one replication arm shows the same skew as the leading strand in the other despite the fact that they are opposite genome strands.
One issue that has arisen from these observations is the cause of the compositional asymmetry between strands, with evidence having been presented for contributions from both mutation and selection [1, 9]. Many of the prokaryotes that have a replichore structure also have a bias towards coding genes, particularly 'essential' genes , on the leading strand [6, 8, 10–12] suggesting that composition asymmetry could result from selection and/or transcription-coupled mutation and repair processes [9, 10, 13]. There is also evidence, though, that leading and lagging strands differ in mutation bias [7, 8, 14], which has interesting and important implications for genome evolution [14, 15]. Although there have been estimates of the contribution of mutation to skew in several genomes that suggest a role for selection  the issue has not been studied within a statistical framework.
Replichores with strand asymmetry have also been exploited to make inferences about the location of an origin of replication when the origin has not been mapped experimentally, which is the case in the vast majority of sequenced genomes. The compositional difference between leading and lagging strands, and the replichore structure in general, means that the two DNA strands have complementary composition biases in the two replication arms of these genomes. Plotting composition skew along a sliding window leads to a characteristic pattern in which the origin lies at a point where a given measurement of strand asymmetry switches between positive and negative values. This type of graphical approach has been used frequently to infer the location of replication origins [4, 16–20] but these approaches have the disadvantage that determining the existence of skew and where it switches strand is subjective . A more objective linear discriminant analysis has also been developed , but this method does not account for gene density nor does it utilize intergenic regions .
Chromosomes analyzed from eubacterial phyla1
Results and discussion
We applied the bipartition model to 352 fully sequenced bacterial chromosomes. An assessment of the mutational (which we call R-dependent) component of compositional skew (see Methods) requires an identification of the origin and terminus of replication. Since these have not been empirically identified in most genomes we first use the model to generate a maximum likelihood estimation of the two loci and discuss the accuracy of this approach. Once the putative origin and terminus have been identified for each chromosome we use the model to quantify the degree to which the mutational difference between leading and lagging strands generates a skew.
Replication arms comprise half of most bacterial chromosomes
Chromosomes with unequal replication arms1
Acidobacteria bacterium Ellin345
Aquifex aeolicus VF5
Aster yellows witches'-broom phytoplasma AYWB
Bordetella pertussis Tohama I
Burkholderia cenocepacia AU1054
Burkholderia thailandensis E264
Candidatus Blochmannia pennsylvanicus str BPEN
Candidatus Pelagibacter ubique HTCC1062
Desulfitobacterium hafniense Y51
Francisella tularensi subsp. holarctica
Haemophilus ducreyi 35000HP
Heliobacter hepaticus ATCC 51499
Idiomarina loihiensis L2TR
Mycoplasma hyopneumoniae J
Mycoplasma mobile 163K
Mycoplasms penetrans HF2
Nitrosomona europaea ATCC 19718
Prochlorococcus marinus MIT9313
Pseudomonas aeruginosa PAO1
Pseudomonas putida KT2440
Shigella dysenteriae Sd197
Silicibacter pomeroyi Megaplasmid
Sinorhizobium meliloti plasmid pSymA
Sodalis glossinidius str. 'morsitans
Synechoccus elongatus PCC7942
Thermus thermophilus HB8
Wolbachia ehdosymbiont of D. melanogaster
Xylella fastidiosa 9a5c
Yersenia pestis Antiqua
Yersenia pestis MED
Given an expectation for physical balance of chromosomes, it is possible that the 31 chromosomes with an inequitable division have undergone recent rearrangements. Large indels in either replication arm or inversions that include either the origin or the terminus could lead to deviations from equitable distribution. (Inversions that do not include the real replication origin could influence the ability to detect significant compositional skew but will not affect Cd.) This cannot be investigated for all 31 cases since many chromosomes have not been studied in great detail or do not have close relatives for comparison, but evidence suggests that at least some of these chromosomes have undergone the predicted rearrangements. For example, Pseudomonas aeruginosa PA01 has a large inversion encompassing about 1/3 the genome , Yersinia pestis str. 91001 has significant inversions surrounding the putative origin  and Prochlorococcus marinus MIT9313 has several genome rearrangements relative to strain MED4 and is also much larger . Candidatus Blochmannia pennsylvanicus str. BPEN has undergone significant gene loss relative to B. floridanus , Halobacterium sp. NRC-1 contains 91 Insertion Sequences from 12 different families  and Candidatus Pelagibacter ubique HTCC1062 has a very small genome that appears to have undergone numerous recent deletions . These rearrangements suggest the possibility that the origin and terminus in these chromosomes do, in fact, yield a noticeably inequitable division. Although it seems likely that selection would favor an equitably divided genome, it is not known how rapidly a chromosome would regain equitable division following such a rearrangement and the genomes in Table 2 may be evolving towards a more equitable division. Alternatively, it could be that the genome assembly was performed incorrectly (see  for such an example) or that in these cases skew is not at equilibrium as a result of genome rearrangement and [S1, S2]ML does not represent the origin and terminus. If this is the case then any method that utilizes composition skews to estimate the origin of replication, whether our ML approach or a graphical approach, would be misled.
Locating the putative replication origins of bacterial chromosomes
We used [S1, S2]ML to assign a putative origin/terminus pair (which we then call [PO, PT]ML where POML is either S1ML or S2ML and PTML is the other location) for the 352 chromosomes in our dataset. As described in the Methods we used several different methods to accomplish this; each are described separately. Where possible we can compare our results to empirically identified replication origins to assess the accuracy of our approach.
Comparison of the origin of replication and the putative ML origin for each of the five linear chromosomes
A. tumefaciens 2
Ribosomal RNA genes in circular chromosomes
There is strong evidence that circular chromosomes are organized such that the ribosomal RNA (rRNA) genes tend to be located on the leading strand of replication regardless of where they are distributed along the length of the chromosome . We used this as a basis for assigning [PO, PT]ML in the 319 primary and 28 secondary circular chromosomes in our dataset. If S1ML is assigned as the origin then in the chromosome portion S1ML → S2ML the + strand is the leading strand while in the S2ML → S1ML portion the – strand is leading. Assigning S2ML as the origin reverses this leading strand assignment. We calculated the proportion of rRNA coded on the leading strand in the two possible organizations and assigned the origin based on which of the two resulted in a majority of rRNA genes on the leading strand. In 14 chromosomes there was no annotation of rRNA genes while in 8 others we could not assign an origin because both possibilities yielded 50% rRNA genes on each strand. For the remaining 325 chromosomes, 304 had 100% of the rRNA genes on the putative leading strand, 18 others had > 75% of the rRNA genes on the putative leading strand and the 3 others had 60% of the rRNA genes on the putative leading strand. An interesting point involves assignment of the origin to the middle of the linear Borrelia chromosomes. In Borrelia garinii only 20% of the rRNA genes are on the leading strand. The 23S rRNA genes in Borrelia garinii are inverted with respect to Borrelia burgdorferi where 97% of the rRNA sites are on the leading strand.
Evidence other than rRNA genes
For the 22 circular chromosomes noted above for which an origin could not be assigned based on the distribution of rRNA genes, we designated [PO, PT]ML based on assigning the leading strand in each replication arm as the strand for which the replication-induced effect was inferred to make G > C at four-fold degenerate (D4) sites. The rationale for this is discussed in the Methods and necessarily limits the conclusions we can draw about composition bias in these chromosomes.
Identification of genes consistently located near the origin of bacterial chromosomes
One interesting feature of the circular chromosomes in our dataset is the existence of genes that may function in segregating the two replication products during cell division  and which appear to be located near the origin of replication in many chromosomes. In this section we compare the locations of POML in each chromosome to these genes in order to determine the degree of concurrence. We will do this separately for the 28 secondary and 319 primary circular chromosomes; the 5 linear chromosomes will not be analyzed in this section. This comparison is not essential for our identification of [PO, PT]ML but it provides a way of both assessing our model as well as the proposal that these genes do tend to be located near the replication origin.
Comparison of the location of partition genes and the putative ML origin for each of the 28 secondary circular chromosomes
Brucella melitensis, 16M
Brucella melitensis, Abortus 2308
Burkholderia sp. 383, Chr 2
Burkholderia sp. 383, Chr 3
Burkholderia cenocepacia, Chr 2
Burkholderia cenocepacia, Chr 3
Burkholderia pseudomallei, 1710b
Burkholderia pseudomallei, K96243
Burkholderia xenovorans, Chr 2
Burkholderia xenovorans, Chr 3
Silicibacter sp. TM1040
S. meliloti, pSymA
S. meliloti, pSymB
Vibrio vulnificus CMCP6
Vibrio vulnificus, YJ016
For the analysis of the 319 primary, circular chromosomes we excluded 21 chromosomes for which the statistical significance of [S1, S2]ML is uncertain; 14 of them because there was no evidence for a replication-induced effect on composition, and the other 7 because the confidence limit of the genome division parameter Cd spanned more than 20% of the chromosome. For the remaining 298 chromosomes we used what we will call an "origin gene" approach to assess the location of POML, an approach that is similar to what was applied to the secondary chromosomes. Unlike the secondary chromosomes, there is no single cluster of genes that can be used to locate the origin in all of the primary genomes in the dataset. Despite this lack of a universal gene cluster, evidence indicates that the origin of replication in these primary chromosomes is frequently located nearby any or all of several genes which may actually have formed an ancestral origin gene "cluster" . These genes, which we will refer to as "origin genes", are parA, parB, gidA, gidB, yidC, yidD, rnpA, rpmH, dnaA and dnaN . These genes vary in location with respect to one another and thus cannot always be used to determine a single chromosome location, but despite this, evidence from chromosomes where these genes have been studied indicates that at least one of these genes is close to the origin in any given chromosome .
Number of chromosomes in which the putative origin is located near one or more origin gene pairs
Number of origin gene pairs near the putative origin2
R-dependent component of skew across bacterial chromosomes
Once we have identified [PO, PT]ML for each chromosome we can use our model to calculate the effect of leading vs lagging strand mutation differences, the R-dependent component of skew. As discussed in the Methods, the R-dependent component is best estimated by using sites least affected by selection. The intergenic (IG) regions and four fold degenerate (D4) sites of CDS regions are possible choices. We used relative σ values (RG and RT, see Equation 9, Methods) to compare the R-dependent component of skew across all 352 bacterial chromosomes. These measures represent the proportional increase of G and T, respectively, on the leading strand as a result of mutational differences between the leading and lagging strands.
There is a strong correlation between IG and D4 for both RG (Slope = 0.68; R2 = 0.76, P < < 10-6) and RT (Slope = 0.46; R2 = 0.73, P < < 10-6), indicating that they give similar estimates of the relative contribution of the R-dependent mutation effect to skew. It also suggests that transcription-coupled mutation effects are not strongly affecting the estimates of R-independent effects at D4 sites. However, the data also show that for most chromosomes the absolute value of the skew is stronger for D4 sites than IG sites, for both RG and RT. We propose, based on the compact nature of intergenic sites in bacterial genomes and the existence of regulatory sequences within these regions, that this is most likely due to selective constraints on many intergenic sites similar to recent findings from Drosophila , and that the D4 sites provide the more accurate estimation of the contribution of the R-dependent component. Therefore, we will use these sites to examine this component of skew across bacterial chromosomes.
Chromosomes with extreme skew
Chromosomes with strongest overall skew
Candidatus Blochmannia floridanus
Clostridium acetobutylicum ATCC 824
Ehrlichia canis str. Jake
Ehrlichia ruminantium str. Welgevonden
Ehrlichia chaffeensis str. Arkansas
Ehrlichia ruminantium str. Gardel
Ehrlichia ruminantium str. Welgevonden
Lactobacillus salivarius subsp. salivarius UCC118
Borrelia burgdorferi B31
Clostridium perfringens str. 13
Borrelia garinii PBi
Xylella fastidiosa Temecula1
Bartonella quintana str. Toulouse
Clostridium tetani E88
Bartonella henselae str. Houston-1
Xylella fastidiosa 9a5c
Lactobacillus acidophilus NCFM
Buchnera aphidicola str. Bp
Lactobacillus johnsonii NCC 533
Carboxydothermus hydrogenoformans Z-2901
Chromosomes with weakest overall skew
Corynebacterium efficiens YS-314
Nostoc sp. PCC 7120
Sinorhizobium meliloti 1021 plasmid pSymA
Mycobacterium avium subsp. paratuberculosis K-10
Anaeromyxobacter dehalogenans 2CP-C
Mycoplasma hyopneumoniae 232
Mycoplasma synoviae 53
Mycoplasma mycoides subsp. mycoides SC str. PG1
Mycoplasma gallisepticum R
Aquifex aeolicus VF5
Mycoplasma hyopneumoniae 7448
Thermosynechococcus elongatus BP-1
Synechocystis sp. PCC 6803
Synechococcus sp. JA-2-3B'a(2–13)
Baumannia cicadellinicola str. Hc
Anabaena variabilis ATCC 29413
Mycoplasma pneumoniae M129
Synechococcus sp. JA-3-3Ab
Gloeobacter violaceus PCC 7421
The G/C skew is almost exclusively biased towards G, which agrees with the observation of Lobry and Sueoka  who found that G > C on the leading strand in almost all of the bacterial genomes they surveyed, the one exception being the linear chromosome of Streptomyces coelicolor. Since we used the criterion of G > C on the leading strand to assign POML for 22 of the circular chromosomes, we excluded these from an assessment of the pattern of G/C bias. Of the remaining 330 chromosomes, only 9 show a significant leading strand skew towards C. These are Bifidobacterium longum, Thermobifida fusca YX, Deinococcus radiodurans R1, Tropheryma whipplei TW08/27, T. whipplei str. Twist, Mycoplasma mobile 163K and M. penetrans HF-2 as well as the two linear Streptomyces chromosomes, Streptomyces avermitilis MA-4680 and S. coelicolor. We also find that among those chromosomes with a significant T/A skew (above or below the RT = 0 line), 245 have a significant skew towards T and 65 a significant skew towards A. Overall, there were 237 chromosomes with a significant G&T bias, 60 with a G&A bias, 5 with a C&T bias and 4 with a C&A bias on the leading strand. This analysis supports the finding of Lobry and Sueoka  that the G&T (keto) skew is more common than G&A (purine) skew on the leading strand. Statistical analysis is again confounded by non-independence of points.
The difference in composition bias between the Firmicutes and the other bacterial lineages may be related to a general difference in the mode of replication. It has been noted previously that those species that have a leading strand bias towards A over T have a polC homolog in addition to a dnaE homolog . These authors point out that proofreading in species with only a dnaE homolog involves an interaction of the Θ and α subunits while in species that also have a polC homolog proofreading involves only the α subunit. This difference is probably sufficient to result in the general difference in mutation pattern but it is also the case that the Firmicutes show a stronger tendency to code genes on the leading strand than do other lineages . This could conceivably contribute to an R-dependent mutation effect if there is a transcription-coupled repair system or if the lagging strand is more frequently in the single stranded state as a result, given the much greater tendency of ssDNA than dsDNA to undergo cytosine deamination .
If we consider overall skew (Table 6) there are some general points that are apparent, although these might partly reflect the non-random sample of genomes that have been sequenced. The high degree of overall skew in Borrelia noted by Loby and Sueoka  is apparent in our data, and the other taxa they noted as having strong skew (Treponema pallidum, Chlamydia muridarum and C. trachomatis) rank in the 100 chromosomes with strongest skew (see Additional file 1). The Alphaproteobacteria (particularly the genera Ehrlichia, Gluconobacter, Bartonella and Brucella) and the Firmicutes are disproportionately represented in the high skew genomes (both make up 35% of the 20 chromosomes with highest skew while Alphaproteobacteria and Firmicutes are 15.6% and 22.4% of the sequenced chromosomes respectively) while the Cyanobacteria tend to have low skew (7 of the 17 sequenced Cyanobacteria are in the 20 chromosomes with lowest skew). There are exceptions in each case (e.g. the endosymbiont Alphaproteobacterium Wolbachia and the Mycoplasma species) suggesting that mutational biases can vary dramatically and, perhaps, rapidly across lineages. In a previous survey, Rocha  also reported a low general skew in Cyanobacteria which is consistent with our finding, but also reported a low skew in the Alphaproteobacteria. The difference in the later case could be due to our use of an estimate from putatively neutral sites instead of a general composition bias.
The bipartition model allows us to quantify the contribution of the mutational difference between leading and lagging strands to nucleotide skew and also allows us to estimate the locations of the origin and terminus in chromosomes with bi-directional replication when the accumulation of skew is sufficiently strong. The model has several advantages over other methods of analyzing genome skew. First, using the model we can quantify the role of mutation in generating skew so that the effect on composition, for example codon bias, can be assessed. Second, it provides an objective method for locating origin and terminus that exploits composition bias. Finally, the model has the potential to be utilized in a maximum likelihood framework in order to analyze various aspects of genome structure, such as the effect of chromosome rearrangements on nucleotide composition.
Primary chromosomes from 350 bacterial strains at the NCBI Genome Project website  as of June 20 2006 were downloaded as were 29 secondary chromosomes chosen from these species on the basis of annotation. Given the evidence for multiple origins of replications in Archaeabacetria chromosomes [39–41] the 27 chromosomes, which include one secondary chromosome, annotated as Archaea were removed to give a total of 352 chromosomes studied. Of this total, 5 chromosomes are linear and all others are circular. (Two chromosomes were incorrectly annotated by the NCBI file as linear; Baumannia cicadellinicola str. Hc NC_007984 , and Staphylococcus aureus susp. aureus NCTC8325 NC_007795 . Linear chromosomes were analyzed in the same manner since they can also be divided into two replication arms. The taxonomic distribution of these 352 chromosomes is given in Table 1 and summary data are provided (see Additional file 1).
The bipartition model
We first develop a bipartition model that is the basis for ML methods. After deriving the core equations of the model we will discuss two specific applications of the model; an inference of the replication origin and terminus and an analysis of mutational contribution to skew. The approach of  to estimate the contributions of selection and mutation to skew is a special case of our model (see below). After developing the model we will illustrate it by application to Escherichia coli strain K12.
R-dependent and R-independent components of skew
The nucleotide substitution process can in principle be divided into what we will call R(eplication arm)-dependent and R(eplication arm)-independent components. The R-dependent component is defined as whatever effect is generated due to mutational differences between leading and lagging strands while the R-independent component is comprised of those factors, mutational or selective, that affect substitutions in the same manner regardless of which strand is being considered. This latter definition includes any selective pressure at a given site as long as this selective pressure does not depend upon which of the two strands happens to be the coding strand (i.e., if switching the coding strand of a gene does not affect selective pressures for nucleotides on the coding strand). However, if we want to use the model to measure the contribution of the R-dependent component to compositional skew, it is important to note that when there is an inequitable distribution of coding genes and/or regulatory elements on the leading strand, there can be compositional skew even in the absence of an R-dependent component. This potential effect of selection and/or other transcription-coupled effects will be considered in specific applications of the model.
Given this separation of the substitution process, we can express the frequency of nucleotide i at any given site on a specified strand (e.g., the + strand of the NCBI annotation) of the genome by Equation 1 where πi represents the contribution of the R-independent and σi the R-dependent factors respectively.
fi = πi + σi (1)
Although it is not possible to calculate π and σ separately for each nucleotide, this division does allow us to estimate R-dependent and R-independent contributions to overall compositional skew (A-T and G-C differences). A circular chromosome that is replicated bi-directionally from a single origin is divided into two replication arms, each with its leading and lagging strands. We designate the annotated strand as + and define P as the arm in which this strand is the leading strand and ρ as the arm within which it is the lagging strand. The R-dependent factors within P affecting the + strand are given by σLi while within ρ they are given by σli where L and l refer to the leading and lagging strand respectively. Since σL and σl are complementary (for example, σLA = σlT), R-dependent components can be represented throughout the genome using only the σL parameters, which from hereon will be written without a superscript designation. We can then write the frequencies of nucleotide i on the + strand within a replication arm (i.e. Pi or ρi) as in Equations 2a and 2b where i refers to any of A, C, G or T and j to the complementary nucleotide. Note that the R-dependent parameter (σi) changes to the complementary nucleotide (σj) between leading and lagging strands.
Pi = πi + σi (2a)
ρi = πi + σj (2b)
Given our partition of substitution dynamics, the A+T composition (CA+T) of the + strand within either replication arm can be written as Equation 3, with a similar equation for CG+C (CG+C = 1 – CA+T).
CA+T = πA + πT + σA+ σT (3)
The skew parameters are given in Equations 4a and 4b where πij represents the R-independent and σij the R-dependent components of compositional skew.
πAT = πA - πT πGC = πG - πC (4a)
σAT = σA - σT σGC = σG - σC (4b)
In the absence of selection and transcription-coupled effects, πij = 0 and σij represents the skew generated by mutational differences between the leading and lagging strands.
We can now express the nucleotide frequencies for the + strand from Equation 1 in terms of CA+T and the skew parameters. For the P replication arm these are given in Equations 5a through 5d and similar equations hold for ρ.
PA = (1/2)(CA+T + πAT +σAT) (5a)
PT = (1/2)(CA+T - πAT - σAT) (5b)
PC = (1/2)(CG+C - πGC - σGC) (5c)
PG = (1/2)(CG+C + πGC + σGC) (5d)
The values CA+T, πAT and σAT can now be rewritten as in Equations 6a through 6c.
CA+T = (1/2) [(PA + PT) + (ρA + ρT)] (6a)
πAT = (1/2) [(PA - PT) + (ρA - ρT)] (6b)
σAT = (1/2) [(PA - PT) - (ρA - ρT)] (6c)
The parameters for GC content and skew (CG+C, πGC and σGC) can be calculated in a similar manner. These equations (6a, 6b and 6c) represent the core of our model for a chromosome with two replication arms since they allow us to estimate the R-independent (π) and R-dependent (σ) components of skew from nucleotide composition within each replication arm. Note that although we have been discussing circular chromosomes, the model also applies to linear chromosomes with bi-directional replication from a single origin. This is the general model that is implemented below in ML applications.
Classifying sites to minimize effects of selection
As already noted, selective pressures for amino acid composition can contribute to skew, particularly if there is a coding strand bias in the genome . Since these pressures are part of the R-independent component, if we want to measure the R-dependent component of skew then we will need to apply our model to neutral sites. This is possible since Equations 5 & 6 can be applied to any subset of sites within a chromosome. Therefore, if we use only the composition of relatively neutral sites, such as fourfold degenerate sites or intergenic sites, we can use the parameter σ determined from them as an estimate of the R-dependent component of skew. Nucleotide sites in each genome were classified based on the NCBI annotation according to codon position (C1, C2, C3 for CDS-coding genes) on both + and - strands, as RNA-coding or as intergenic (IG). Third codon position sites were further sub-classified as D4 if four-fold degenerate (this did not include those that are from a codon coding a six-fold degenerate amino acid).
Maximum likelihood Implementation
Equations 6a-c (and similar ones for GC) allow an estimation of model parameters directly from nucleotide composition parameters within replication arms. The model can also be implemented in a ML framework using a variety of specific methods, discussed separately below, that differ in the constraints they introduce. A comparison of different methods will allow us to assess the explanatory power of the constraints. To implement these ML methods, the genome is divided into two replication arms. (This can be achieved either by using an annotated [Ori, Ter] or the putative locations determined using the method we will describe below.) This genome division defines the number of sites of each type within each replication arm such that there are Mki sites of type k that are nucleotide i in region P.
Model parameters for each site type are calculated according to Equations 6a-c (and corresponding equations for GC) using the observed nucleotide frequencies within each replication arm. The likelihood is not maximized. No constraints are introduced but sites are classified into 7 types: intergenic (IG), first codon position (C1), second codon position (C2) and third codon position (C3). The latter three are further divided into + and - strand (+ strand being defined by the NCBI file) to yield C1+, C1-, C2+, C2-, C3+ and C3-. RNA coding sites are ignored, as are sites that are ambiguous in the NCBI annotation.
This method uses the same approach and site classification as Mobs. An initial guess for site parameters was obtained from Mobs and maximum likelihood parameters were obtained using a simplex algorithm . Each site type is optimized independently. There are 5 DF for each site type giving a total of 35 DF.
This method introduces the constraint that the π parameters calculated for coding sites on the two chromosome strands, such as for example C1+ and C1-, be complementary (π+ij = -π-ij). This constraint allows us to assess whether or not there is a significant difference between coding sites on the two strands using a likelihood ratio test. Thus there is only a single set of 5 parameters for each of the 3 CDS site types and 5 for IG sites, yielding 20 DF. An initial guess for these 20 parameters was obtained from Mobs and then the values that maximized the total likelihood were obtained using the simplex algorithm.
This method has the same constraints as the M1 method with the additional constraint that the σAT and σGC parameters are equal across all site types. Therefore, if mutational biases are consistent across sites, then comparing this method to an unconstrained method allows us to assess whether or not selection significantly affects our measurement of the R-dependent component of skew. There are 14 DF in this method (3 for IG and each of the 3 CDS types, plus a single σAT and σGC for all site types). Parameters were calculated as described for M1.
This method removes the site classification such that all nucleotide sites in the chromosome are assumed to be equivalent. It retains the constraint that parameters for the two strands be complementary. This allows us to assess the value of site classification. This method has 5 DF and values were obtained as described for M1.
Application to the E. coli K12 chromosome
Likelihood comparisons for the different methods when applied to the E. coli K12 chromosome CDS and intergenic sites
-2 × Diff1
Probability (DF1, DF2)2
5.5 × 10-5 (35, 20)
< < 10-6 (35, 14)
< < 10-6 (35, 5)
Model parameters for the E. coli K12 chromosome when the MO method is implemented
- 0.0053 [-0.007,-0.003]
- 0.0023 [-0.003,0.0002]
- 0.0954 [-0.097,-0.094]
- 0.0010 [-0.003,0.0004]
- 0.107 [-0.108,-0.105]
- 0.0168 [-0.018,-0.015]
- 0.0461 [-0.048,-0.045]
- 0.0019 [-0.004,0.001]
- 0.0782 [-0.080,-0.077]
- 0.0056 [-0.008,-0.005]
- 0.0064 [-0.008,-0.005]
- 0.0198 [-0.021,-0.018]
There is a significant R-dependent effect (σ) for IG sites. The σ values are very similar for + vs - strands within the C1 and C3 site types, but second position codon sites (type C2) have significantly different estimates of σAT. This difference indicates that selective constraints on the sites of the C2 class are not distributed equitably across the two leading strands of the genome. As with the different π values at C2 sites this suggests that there is a difference in protein composition between + and - strands, but it also suggests that there is a preference for coding certain types of genes (that differ in average composition) on the leading strand.
Overall, the results from this analysis of E. coli provide two specific points that are relevant to the analysis of other chromosomes. First, the method with the fewest constraints (M0) provides the best fit so we will use this method in our large-scale analyses. Second, given the influence of selection on estimates of σAT and σGC, we must limit the analysis to relatively neutral sites in order to separate the mutational contribution from selection. Two possible choices are intergenic (IG) sites and four-fold degenerate (D4) CDS sites and both of these will be utilized and compared in our applications of the bipartition model.
Estimating the R-dependent component of composition skew
Lobry and Sueoka  introduced a graphical method for estimating the contributions of selection and mutation to composition skew. This involves plotting the T/A skew (i.e., [fT - fA]/[fT + fA]) against the G/C skew for third codon (i.e., putatively neutral) sites of each gene in a genome and calculating mid-points for leading and lagging strand genes. The length of the line connecting these two points (their BI parameter) is an estimate of the role of mutational bias to skew while the distance from the (0, 0) point to the midpoint of this connecting line (their BII parameter) is an estimate of the contribution of selection .
The parameters derived from our model can be used to estimate these two parameters as summarized in Equations 8a and 8b. They only requires the determination of the unordered pair [S1, S2]ML since BI and BII are invariant 0
BI = SQRT ((σAT /CA+T)2 + (σGC /CG+C)2) (8a)
BII = SQRT ((πAT /2CA+T)2 + (πGC /2CG+C)2) (8b)
However, there are some disadvantages to this graphical approach that our model can improve on. One is that it weights genes equally regardless of length, another is that the statistical confidence of the parameters is unclear as is the biological meaning of the values obtained. Most importantly, it does not allow us to estimate the contribution of mutation to G/C and T/A skew separately. Using the bipartition model we can make a direct estimate of the relative contributions of R-dependent (σ) effects to the composition of G and T (since a keto skew is most commonly observed) for each type of site that can be classified. The measures we propose are RG and RT as given in Equations 9a and 9b.
RG = σGC /(CG+C - πGC) (9a)
RT = - σAT /(CA+T - πAT) (9b)
At selectively neutral sites, and assuming that there are no transcription-coupled effects, these two values represent the fractional increase (or decrease) in the G and T compositions on the leading strands as a result of a difference between leading and lagging strand mutation bias. Thus, the model provides a statistical framework to assess significance and to easily compare mutational effects across genomes. We calculated RG and RT for each bacterial chromosome from Equations 9a and 9b (on the leading strand) based on the assignation of [PO, PT]ML as described. To use sites that we thought would be relatively neutral we used model M0 parameters for intergenic (IG) and for C3 sites that coded for a four-fold degenerate amino acid (D4). Since parameters for D4 sites were determined separately for both plus and minus strands the values from the two strands were averaged for an overall estimate of the R-dependent effect.
Estimation of parameter uncertainty
The 95% uncertainty range in model parameters and in RT and RG was estimated using the Metropolis-Hastings rejection sampling Monte Carlo algorithm [45, 46] as follows. First, the ML vector of parameters, with Likelihood L0, was determined by the simplex algorithm. Next, one parameter was altered using a Gaussian random number with mean 0 and standard deviation σ which is input arbitrarily to start. The Likelihood for the new vector, L1, was calculated and the new vector "accepted" if L1 > L0 or with probability L1/L0 if L1 < L0. A burn-in phase involved repeating the acceptance process in sets of 100 with successively decreasing σ values until the average acceptance rate of new vectors is 50% over the 100 trials. After the burn-in was complete, the final burn-in vector of parameters and σ value were used to generate 10,000 parameter vectors by altering parameter values as above. The 10,000 were sorted for each parameter and the 95% range determined from the sorted set.
Application to the E. coli K12 chromosome
Estimates of R-dependent effects, RG and RT were calculated for D4 and IG sites from the data in Table 8. For D4 sites in E. coli K12 the M0 model gives RG = 0.0716 (+ strand = 0.0778; - strand = 0.0653) and RT = 0.0215 (+ strand = 0.0181; - strand = 0.0248), while for IG sites the values are RG = 0.0428 and RT = 0.009. These results indicate that R-dependent effects increase the content of G at D4 sites on the leading strand by about 7% relative to the composition in the absence of R-dependent effects (decreasing C by the same amount) and the content of T at these sites by about 2%.
ML determination of replication arms based on composition skew
The bipartition model can be used to infer the replication arms in genomes with a replichore structure by maximum likelihood since it provides a probability for each of the four nucleotides. This will be important for applications of our model to genomes for which an origin and terminus have not been annotated but it also introduces a more formal approach than skew plots to estimating these two loci.
Two chromosomal locations, S1 and S2 (which need not divide the chromosome into equal halves), define two replication arms, P and ρ, and thus determine the number and kind of sites within each. The likelihood of such a division is calculated according to model Mobs for C1 [+/-], C2 [+/-], C3 [+/-] and IG sites. The likelihood for all possible unordered [S1, S2] pairs determines the global ML (designated here as [S1, S2]ML). Ideally we would calculate the likelihood for every possible [S1, S2] division in order to determine [S1, S2]ML but given computational constraints we utilized the following strategy. A rough, initial likelihood map is made by computing log likelihood values over a 50 × 50 grid on the [S1, S2] plane. The maximum of this rough map was then used as the initial point to maximize the likelihood using the same simplex algorithm used for parameter optimization . From [S1, S2]ML we calculate the chromosome division statistic Cd by Equation 10, in which min(f1, f2) is the minimum of the two chromosome fractions generated by S1ML and S2ML. The Cd statistic can range from 0, in cases where there is equitable division of the genome, to 1, in cases where the origin and terminus are at the same chromosome location yielding essentially a single replication arm.
Cd = (0.5 - min(f1, f2))/0.5 (10)
A visualization of the ML surface and estimates of uncertainty in Cd, S1 and S2 (as well as model parameters, see below) were obtained by the Metroplois-Hastings rejection sampling Monte Carlo algorithm, using a random walk from [S1, S2]ML. Proposal parameter (S1 and S2) values were obtained by a normally distributed random step of zero mean and a standard deviation that was chosen after a burn-in series to give an approximately 50% acceptance rate. If the proposed parameters improved the likelihood they were accepted. Otherwise, they were accepted with a randomly generated probability equal to the likelihood ratio. Ten thousand accepted parameter sets were sorted to give a 5-to-95 percentile range.
Application to the E. coli K12 chromosome
Identification of the putative origin and terminus of replication
The likelihood surface has two peaks because exchanging S1ML and S2ML yields complementary σ parameters but the same likelihood. Thus, independent information must be used to determine a putative origin and terminus pair. We assigned the origin in one of several ways. For the 5 linear chromosomes we made use of the annotated origins in the NCBI files, which are all located near the center of the chromosome, by assigning whichever of S1ML or S2ML was closest to the annotated origin. For circular chromosomes we used annotated ribosomal RNA (rRNA) genes, where available, which are organized such that they are predominantly on the leading strand of replication across microbial genomes . A third method for assigning the origin was utilized in the 14 chromosomes for which the rRNA annotation was not available and the 8 chromosomes for which there was an equal division of rRNA genes between the leading and lagging strands. In these cases we assigned the leading strand in each replication arm as the strand for which σGC > 0 at fourfold degenerate sites (σ is defined for the leading strand). This use of σGC > 0 as the criterion is based on the identification by Lobry and Sueoka  that, in the vast majority of bacterial chromosomes they studied, the leading strand has a skew of G > C. This last approach means that we are limited to knowing whether or not we find a G&T or G&A bias on one strand (and thus C&A or C&T on the other) in these particular chromosomes without definitively assigning the skew to leading or lagging strand. Regardless of the method used, the result is that we define either S1ML or S2ML as the putative origin, now designated POML, and the other site as the putative terminus, now designated PTML, to yield the ordered pair of sites [PO, PT]ML. This provides a formal approach that is preferable to estimates based on graph-based approaches [16–19] particularly in cases where there is a small amount of skew that can be difficult to study by eye. Additionally, the model allows us to generate confidence intervals by a Monte Carlo method as outlined above for the ML surface.
All analyses were performed using C source code programs written and compiled for Mac OSX by RAM. A description of the three core programs used to determine model M0 parameters, the ML [Ori, Ter] location and plot a visualization of the ML surface is provided as additional material (see Additional file 3). Code is available upon request from firstname.lastname@example.org.
List of abbreviations
nucleotide sites in a protein-coding DNA sequence, divided into Ci for the ith codon position and further into D4 for four-fold degenerate C3 sites, R-(in)dependent – Replication arm-, DF – degrees of freedom.
DF – degrees of freedom.
We would like to thank three anonymous reviewers for helpful comments.
- Rocha EP: The replication-related organization of bacterial genomes. Microbiology. 2004, 150: 1609-1627. 10.1099/mic.0.26974-0.PubMedView ArticleGoogle Scholar
- Bird RE, Louarn J, Martuscelli J, Caro L: Origin and sequence of chromosome replication in Escherichia coli. J Mol Biol. 1972, 70: 549-566. 10.1016/0022-2836(72)90559-1.PubMedView ArticleGoogle Scholar
- Marians KJ: Prokaryotic DNA replication. Annu Rev Biochem. 1992, 61: 673-715. 10.1146/annurev.bi.61.070192.003325.PubMedView ArticleGoogle Scholar
- Song J, Ware A, Liu SL: Wavelet to predict bacterial ori and ter: a tendency towards a physical balance. BMC Genomics. 2003, 4: 17-30. 10.1186/1471-2164-4-17.PubMed CentralPubMedView ArticleGoogle Scholar
- Blattner FR, Plunkett G, Bloch CA, Perna NT, Burland V, Riley M, Collado-Vides J, Glasner JD, Rode CK, Mayhew GF, Gregor J, Davis NW, Kirkpatrick HA, Goeden MA, Rose DJ, Mau B, Shao Y: The Complete Genome Sequence of Escherichia coli K-12. Science. 1997, 277: 1453-1462. 10.1126/science.277.5331.1453.PubMedView ArticleGoogle Scholar
- McLean MJ, Wolfe KH, Devine KM: Base composition skews, replication orientation and gene orientation in 12 prokaryote genomes. J Mol Evol. 1998, 47: 691-696. 10.1007/PL00006428.PubMedView ArticleGoogle Scholar
- Lobry JR, Sueoka N: Asymmetric directional mutation pressures in bacteria. Genome Biology. 2002, 3 (10): research0058-10.1186/gb-2002-3-10-research0058.PubMed CentralPubMedView ArticleGoogle Scholar
- Lobry JR, Louarn JM: Polarisation of prokaryotic chromosomes. Curr Opin Microbiol. 2003, 6: 101-108. 10.1016/S1369-5274(03)00024-9.PubMedView ArticleGoogle Scholar
- Karlin S: Bacterial DNA strand compositional asymmetry. Trends Microbiol. 1999, 7: 305-308. 10.1016/S0966-842X(99)01541-3.PubMedView ArticleGoogle Scholar
- Francino MP, Ochman H: Strand asymmetries in DNA evolution. Trends Genet. 1997, 13: 240-245. 10.1016/S0168-9525(97)01118-9.PubMedView ArticleGoogle Scholar
- McInerney JO: Replicational and transcriptional selection on codon usage in Borrelia burgdorferi. Proc Natl Acad Sci USA. 1998, 95: 10698-10703. 10.1073/pnas.95.18.10698.PubMed CentralPubMedView ArticleGoogle Scholar
- Tillier ER, Collins RA: The contributions of replication orientation, gene direction, and signal sequences to base-composition asymmetries in bacterial genomes. J Mol Evol. 2000, 50: 249-257.PubMedGoogle Scholar
- Frank AC, Lobry JR: Asymmetric substitution patterns: a review of possible underlying mutational or selective mechanisms. Gene. 1999, 238: 65-77. 10.1016/S0378-1119(99)00297-8.PubMedView ArticleGoogle Scholar
- Mitchell A, Graur D: Inferring the pattern of spontaneous mutation from the pattern of substitution in unitary pseudogenes of Mycobacterium leprae and a comparison of mutation patterns among distantly related organisms. J Mol Evol. 2005, 61: 795-803. 10.1007/s00239-004-0235-0.PubMedView ArticleGoogle Scholar
- Mackiewicz P, Gierlik A, Kowalczuk M, Dudek MR, Cebrat S: How does replication-associated mutational pressure influence amino acid composition of proteins?. Genome Res. 1999, 9: 409-416.PubMed CentralGoogle Scholar
- Lobry JR: Origin of replication of Mycoplasma genitalium. Science. 1996, 272: 745-746. 10.1126/science.272.5262.745.PubMedView ArticleGoogle Scholar
- Picardeau M, Lobry JR, Hinnebusch BJ: Physical mapping of an origin of bidirectional replication at the center of the Borrelia burgdorferi linear chromosome. Mol Microbiol. 2000, 32: 437-445. 10.1046/j.1365-2958.1999.01368.x.View ArticleGoogle Scholar
- Myllykallio H, Lopez P, Lopez-Garcia P, Hellig R, Saurin W, Zivanovic Y, Phillipe H, Forterre P: Bacterial mode of replication with eukaryotic-like machinery in a hyperthermophilic archaeon. Science. 2000, 288: 2212-2215. 10.1126/science.288.5474.2212.PubMedView ArticleGoogle Scholar
- Zawilak A, Cebrat S, Mackiewicz P, Krol-Hulewicz A, Jakimowicz D, Messer W, Gosciniak G, Zakrewska-Czerwinska J: Identification of a putative chromosomal replication origin from Heliobacter pylori and its interaction with the initiator protein DnaA. Nucleic Acids Res. 2001, 29: 2251-2259. 10.1093/nar/29.11.2251.PubMed CentralPubMedView ArticleGoogle Scholar
- Worning P, Jensen LJ, Hallin PF, Staerfeldt HH, Ussery DW: Origin of replication in circular prokaryotic chromosomes. Environ Microbiol. 2006, 8: 353-361. 10.1111/j.1462-2920.2005.00917.x.PubMedView ArticleGoogle Scholar
- Rocha EP, Danchin A, Viara A: Universal replication biases in bacteria. Mol Microbiol. 1999, 32: 11-16. 10.1046/j.1365-2958.1999.01334.x.PubMedView ArticleGoogle Scholar
- Guy L, Roten CA: Genometric analyses of the organization of circular chromosomes: a universal pressure determines the direction of ribosomal RNA genes transcription relative to chromosome replication. Gene. 2004, 304: 45-52. 10.1016/j.gene.2004.06.056.View ArticleGoogle Scholar
- Leonard TA, Møller-Jensen J, Löwe J: Towards understanding the molecular basis of bacterial DNA segregation. Philos Trans R Soc B Biol Sci. 2005, 360: 523-535. 10.1098/rstb.2004.1608.View ArticleGoogle Scholar
- Liu SL, Schryvers AB, Sanderson KE, Johnston RN: Bacterial phylogenetic clusters revealed by genome structure. J Bacteriol. 1999, 181: 6747-6755.PubMed CentralPubMedGoogle Scholar
- Hill CW, Gray JA: Effects of Chromosomal Inversion on Cell Fitness in Escherichia coli K-12. Genetics. 1987, 119: 771-778.Google Scholar
- Stover CK, Pham XQ, Erwin AL, Mizoguchi SD, Warrener P, Hickey MJ, Brinkman FS, Hufnagle WO, Kowalik DJ, Lagrou M, Garber RL, Goltry L, Tolentino E, Westbrock-Wadman S, Yuan Y, Brody LL, Coulter SN, Folger KR, Kas A, Larbig K, Lim R, Smith K, Spencer D, Wong GK, Wu Z, Paulsen IT, Reizer J, Saier MH, Hancock RE, Lory S, Olson MV: Complete genome sequence of Pseudomonas aeruginosa PA01, an opportunistic pathogen. Nature. 2000, 406: 959-964. 10.1038/35023079.PubMedView ArticleGoogle Scholar
- Song Y, Tong Z, Wang J, Wang L, Guo Z, Han Y, Zhang J, Pei D, Zhou D, Qin H, Pang X, Han Y, Zhai J, Li M, Cui B, Qi Z, Jin L, Dai R, Chen F, Li S, Ye C, Du Z, Lin W, Wang J, Yu J, Yang H, Wang J, Huang P, Yang R: Complete genome sequence of Yersinia pestis strain 9 an isolate avirulent to humans. DNA Res. 2004, 11: 179-197. 10.1093/dnares/11.3.179.PubMedView ArticleGoogle Scholar
- Rocap G, Larimer FW, Lamerdin J, Malfatti S, Chain P, Ahlgren NA, Arellano A, Coleman M, Hauser L, Hess WR, Johnson ZI, Land M, Lindell D, Post AF, Regala W, Shah M, Shaw SL, Steglich C, Sullivan MB, Ting CS, Tolonen A, Webb EA, Zinser ER, Chisholm SW: Genome divergence in two Prochlorococcus ecotypes reflects oceanic niche differentiation. Nature. 2003, 424: 1042-1047. 10.1038/nature01947.PubMedView ArticleGoogle Scholar
- Degnan PH, Lazarus AB, Wernegreen JJ: Genome sequence of Blochmannia pennsylvanicus indicates parallel evolutionary trends among bacterial mutualists of insects. Genome Res. 2005, 15: 1023-1033. 10.1101/gr.3771305.PubMed CentralPubMedView ArticleGoogle Scholar
- Ng WV, Kennedy SP, Mahairas GG, Berquist B, Pan M, Shukla HD, Lasky SR, Baliga NS, Thorsson V, Sbrogna J, Swartzell S, Weir D, Hall J, Dahl TA, Welti R, Goo YA, Leithauser B, Keller K, Cruz R, Danson MJ, Hough DW, Maddocks DG, Jablonski PE, Krebs MP, Angevine CM, Dale H, Isenbarger TA, Peck RF, Pohlschroder M, Spudich JL, Jung KW, Alam M, Freitas T, Hou S, Daniels CJ, Dennis PP, Omer AD, Ebhardt H, Lowe TM, Liang P, Riley M, Hood L, DasSarma S: Genome sequence of Halobacterium species NRC-1. Proc Natl Acad Sci USA. 2000, 97: 12176-12181. 10.1073/pnas.190337797.PubMed CentralPubMedView ArticleGoogle Scholar
- Giovannoni SJ, Tripp HJ, Givan S, Podar M, Vergin KL, Baptista D, Bibbs L, Eads J, Richardson TH, Noordewier M, Rappe MS, Short JM, Carrington JC, Mathur EJ: Genome streamlining in a cosmopolitan oceanic bacterium. Science. 2005, 309: 1242-1245. 10.1126/science.1114057.PubMedView ArticleGoogle Scholar
- Guy L, Karamata D, Moreillon P, Roten CAH: Genometrics as an essential tool for the assembly of whole genome sequences: the example of the chromosome of Bifidobacterium longum NCC2705. BMC Microbiology. 2005, 5: 60-69. 10.1186/1471-2180-5-60.PubMed CentralPubMedView ArticleGoogle Scholar
- Zakrzewska-Czerwinska J, Schrempf H: Characterization of an autonomously replicating region from the Streptomyces lividans chromosome. J Bacteriol. 1992, 174: 2688-2693.PubMed CentralPubMedGoogle Scholar
- Goodner B, Hinkle G, Gattung S, Miller N, Blanchard M, Qurollo B, Goldman BS, Cao Y, Askenazi M, Halling C, Mullin L, Houmiei K, Gordon J, Vaudin M, Iartchouk O, Epp A, Liu F, Wollam C, Allinger M, Doughty D, Scott C, Lappas C, Markelz B, Flanagan C, Crowell C, Gurson J, Lomo C, Sear C, Strub G, Cielo C, Slater S: Genome sequence of the plant pathogen and biotechnology agent Agrobacterium tumefaciens C58. Science. 2001, 294: 2323-2328. 10.1126/science.1066803.PubMedView ArticleGoogle Scholar
- Bignell C, Thomas CM: The bacterial ParA-ParB partitioning proteins. J Biotechnol. 2001, 91: 1-34. 10.1016/S0168-1656(01)00293-0.PubMedView ArticleGoogle Scholar
- Ogasawara N, Yoshikawa H: Genes and their organization in the replication origin region of the bacterial chromosome. Mol Microbiol. 1992, 6: 629-634. 10.1111/j.1365-2958.1992.tb01510.x.PubMedView ArticleGoogle Scholar
- Andolfatto P: Adaptive evolution of non-coding DNA in Drosophila. Nature. 2005, 437: 1149-1152. 10.1038/nature04107.PubMedView ArticleGoogle Scholar
- NCBI Genome Project. [http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi]
- Lundgren M, Andersson A, Chen L, Nilsson P, Bernander R: Three replication origins in Sulfolobus species: Synchronous initiation of chromosome replication and asynchronous termination. Proc Natl Acad Sci USA. 2004, 101: 7046-7051. 10.1073/pnas.0400656101.PubMed CentralPubMedView ArticleGoogle Scholar
- Lundgren M, Bernander R: Archaeal cell cycle progress. Curr Opin Microbiol. 2005, 8: 662-668.PubMedView ArticleGoogle Scholar
- Kelman Z, White MF: Archaeal DNA replication and repair. Curr Opin Microbiol. 2005, 8: 669-676.PubMedView ArticleGoogle Scholar
- Wu D, Daugherty SC, Van Aken SE, Pai GH, Watkins KL, Khouri H, Tallon LJ, Zaborsky JM, Dunbar HE, Tran PL, Moran NA, Eisen JA: Metabolic complementarity and genomics of the dual bacterial symbiosis of sharpshooters. PLoS Biology. 2006, 4: 1079-1092. 10.1371/journal.pbio.0040188.View ArticleGoogle Scholar
- Pattee PA: Chromosomal map location of the alpha-hemolysin structural gene in Staphylococcus aureus NCTC 8325. Infect Immun. 1986, 54: 593-596.PubMed CentralPubMedGoogle Scholar
- Press WH, Teukolsky SA, Vetterling WT, Flannery BP: Numerical Recipes in C. 1992, Cambridge University Press, 41-2Google Scholar
- Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E: Equations of state calculations by fast computing machines. J Chem Phys. 1953, 21: 1087-1092. 10.1063/1.1699114.View ArticleGoogle Scholar
- Hastings WK: Monte Carlo sampling methods using Markov Chains and their applications. Biometrika. 1970, 57: 97-109. 10.1093/biomet/57.1.97.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.