Quantitative analysis of replication-related mutation and selection pressures in bacterial chromosomes and plasmids using generalised GC skew index
© Arakawa et al; licensee BioMed Central Ltd. 2009
Received: 19 June 2009
Accepted: 30 December 2009
Published: 30 December 2009
Due to their bi-directional replication machinery starting from a single finite origin, bacterial genomes show characteristic nucleotide compositional bias between the two replichores, which can be visualised through GC skew or (C-G)/(C+G). Although this polarisation is used for computational prediction of replication origins in many bacterial genomes, the degree of GC skew visibility varies widely among different species, necessitating a quantitative measurement of GC skew strength in order to provide confidence measures for GC skew-based predictions of replication origins.
Here we discuss a quantitative index for the measurement of GC skew strength, named the generalised GC skew index (gGCSI), which is applicable to genomes of any length, including bacterial chromosomes and plasmids. We demonstrate that gGCSI is independent of the window size and can thus be used to compare genomes with different sizes, such as bacterial chromosomes and plasmids. It can suggest the existence of different replication mechanisms in archaea and of rolling-circle replication in plasmids. Correlation of gGCSI values between plasmids and their corresponding host chromosomes suggests that within the same strain, these replicons have reproduced using the same replication machinery and thus exhibit similar strengths of replication strand skew.
gGCSI can be applied to genomes of any length and thus allows comparative study of replication-related mutation and selection pressures in genomes of different lengths such as bacterial chromosomes and plasmids. Using gGCSI, we showed that replication-related mutation or selection pressure is similar for replicons with similar machinery.
DNA replication makes up a significant proportion of the bacterial cell cycle, especially in fast-growing bacteria where chromosomes undergo multiple rounds of replication in order to compensate for a short generation time . Therefore, bacterial chromosomes are structured by the requirement to be an efficient medium for replication . Eubacterial species typically have circular chromosomes that are partitioned into two replichores by one finite set of a symmetrically located replication origin and terminus . Accordingly, many genomic features exhibit characteristic replication-related organisation, including the nucleotide compositional bias, distribution of signal oligonucleotides such as Chi sites [4, 5] and KOPS motifs [6, 7], as well as gene positioning and strand preference . Nucleotide compositional asymmetry in the leading and lagging strands has been extensively studied using GC skew analysis, which calculates the excess of C over G normalised to the GC content ([C-G]/[C+G]) along the chromosome [9, 10]. In many bacterial genomes, GC skew graphs "shift" their polarity between the two replichores, and thus the shift points of GC skew correspond to the replication origin and terminus. Analysis of the GC skew of a bacterial chromosome is therefore useful for the prediction of its replication origin and terminus  and, subsequently, its leading and lagging strands. The putative position of the replication origin predicted by computational methods based on GC skew is frequently used to define the first base position of circular genome sequences in many genome projects as an accurate and effective alternative to experimental means. Moreover, the polarisation of nucleotide composition is suggested to affect the replication-directed architecture of genomes. This includes the aforementioned replication-oriented sequence elements and gene orientation ; therefore, the degree of strand-specific mutational bias observed with GC skew analysis can be used as a reference for mutation or selection pressures that a genome receives due to the replication machinery [12–15].
Bacterial species exhibit highly diverse GC skew . Many fast-growing bacteria show extremely biased GC skew, whereas only weak skew can be discerned in the chromosomes of slow-growing bacteria [17–19]. Therefore, the prediction of the replication origin with GC skew could be erroneous in genomes with only weak skew, requiring a quantitative confidence measure of GC skew strength. In order to allow comparative study of the degree of GC skew in bacterial genomes, we have previously reported the GC skew index (GCSI), which quantifies the strength of GC skew in given bacterial chromosomes and can be used as a confidence measure for GC skew-based predictions or for the comparative study of replication-related mutation or selection pressures in bacterial chromosomes . The GCSI ranges from 0 to 1 and is calculated as an arithmetic mean of two indices: spectral ratio (SR) and dist. SR is the signal/noise (S/N) ratio of the 1 Hz signal in the Fourier power spectrum of a GC skew graph; it captures the fitness of the shape of the GC skew graph to be partitioned into two segments of opposite polarity having equal length (a discrete sine curve) , and dist measures the Euclidean distance between the two vertices in cumulative GC skew graphs. SR is essential for accurate quantification of a weak GC skew whose dist is affected by local regions of biased nucleotide content, such as large insertions. In order to eliminate the effects of biased nucleotide composition in coding regions, the GCSI is calculated with a fixed number of windows (4096, considering an average gene length of 1 kbp and a genome size of 2 to 4 Mbp). This use of a fixed number of windows limits the applicability of the GCSI to bacterial chromosomes and does not allow it to be used for shorter sequences, such as plasmids. Many plasmids are circular DNA molecules that exhibit nucleotide compositional asymmetry. GC skew is therefore frequently utilised for the prediction of replication origins in plasmids, which creates a need for extended applicability of the GCSI.
Circular plasmids can be categorised into two groups according to their replication machineries: theta and rolling circle replication (RCR). Theta replication requires the Rep protein and characteristic origins as well as DNA polymerase I from the host bacterium . When there is only one origin of replication, theta replication results in two replichores of opposite polarity due to bi-directional replication forks, and hence these plasmids exhibit GC skew. Therefore, the shift points of the GC skew are indicative of the positions of the replication origin and terminus. The other type of replication, RCR, requires the RepABC family of proteins, and replication occurs through strand displacement [23–25]. In RCR, one of the two strands is always the template, and therefore plasmids that undergo RCR usually do not show significant GC skew. Instead, RCR plasmids show continuously biased nucleotide composition, resulting in linear cumulative GC skew, as opposed to the V-shaped graph observed in genomes with GC skew that indicates the existence of clear shift points.
It has been suggested that any genetic elements that reproduce inside the cell (chromosomes, plasmids, and phages) using the same replication machinery might have the same nucleotide composition and that recently acquired elements with unusual nucleotide compositions would drift towards the average nucleotide composition of the host genome by amelioration [26, 27]. To investigate the evolution of plasmids in their hosts, comparisons have been made at the levels of GC content [28, 29] and dinucleotide composition [30, 31], but not from the viewpoint of replication strand asymmetry.
To this end, here we report a novel quantitative measure of GC skew strength called the generalised GC skew index (gGCSI) that is independent of window size and is therefore applicable to comparative studies of genomes of any length. Using this new index, we show discriminant criteria for the replication machinery of plasmids and the correlation of the degree of replication-related mutation or selection pressures in the host chromosome and plasmids.
Results and Discussion
Principle and Design of gGCSI
The original GCSI required the use of 4096 windows for optimal computation in bacterial genomes, but this fixed number of windows made GCSI only applicable to genomes larger than approximately 400 kbp; thus, each window contained at least 100 bp. The use of sliding windows is a simple means for increasing the number of windows, but this is technically just the moving average, which therefore diminishes the degree of GC skew and is not a solution to the problem. The limitations of the original GCSI were derived from the dependence of SR and dist on the number of windows; therefore, in order to generalise the GCSI to be applicable for smaller genomic elements, such as plasmids, we have made three modifications.
First, SR and dist were replaced with the normalised measure SA (spectral amplitude) and the normalised distance of the maximum and minimum vertices in the cumulative GC skew graph, dist(norm). Window-size dependence of SR was primarily due to the variation in basal noise levels depending on the number of windows, so the gGCSI is calculated simply using the amplitude of the 1-Hz Fourier power spectrum, without taking the S/N ratio. Because the distribution of spectral amplitude is non-linear, unlike SR, the exponentially regressed and thus linearised value for the 1-Hz spectrum is defined as SA. The other measure, dist, proportionally changes according to the number of windows, so it is linearly normalised as dist(norm).
Second, the gGCSI is defined as the geometric mean of SA and dist(norm), instead of the arithmetic mean utilised in the original GCSI. The arithmetic mean results in a relatively large value when only one of the two indices exhibits a large value; the use of a geometric mean instead ensures a balance between them.
Third, the statistical significance of the calculated gGCSI can be tested using the z-score and the p-value. Although the gGCSI is independent of the number of windows, the use of very few windows produces more uncertain results compared with when a sufficient number of windows are used for the calculation. In order to provide confidence measures in such cases, the p value of the gGCSI is obtained by repeatedly calculating the gGCSI using randomly shuffled input GC skew data series. Because the randomised iterations were statistically confirmed to be normal, a z-score and a corresponding p-value are given to the gGCSI to indicate its significance.
Performance validation of the gGCSI
Comparison of GCSI and gGCSI values with different numbers of windows in the Escherichia coli K12 genome
number of windows
mean ± SD
0.0978 ± 0.156
319 ± 271
85 ± 162
0.0964 ± 0.003
485 ± 5
67 ± 3
gGCSI values of genomes with different sizes calculated with varying numbers of windows
mean ± SD
0.2135 ± 0.002
0.1309 ± 0.005
0.4403 ± 0.016
0.2887 ± 0.035
0.4225 ± 0.020
Although the gGCSI is independent of the window size, in practice a sufficiently large window size should be chosen such that it is not affected by the local nucleotide compositional bias. In most genomes, a window size of 1000 bp, which corresponds to the average length of coding genes, is sufficient. This leads to the use of 512 to 4096 windows in bacteria for optimal performance, considering the distribution of genome size in the range of 0.5 to 5 Mbp. However, for small plasmids that are only several kilobases in size, the use of 1000 bp windows results in only 4 or 8 windows, which is not sufficient for the calculation of SA. Because there is a trade-off between window number and size, the use of 16 to 32 windows of more than 100 bp is desirable for these small genomes.
In order to identify the optimal window size, we further calculated gGCSI using number of windows from 8 to 32768 in all bacterial genomes used in this work, and identified the windows size where the change in gGCSI value is minimum compared to adjacent window counts. For example, in Table 1, window number of 4096 has the least difference with the next window counts (0.0001 difference with 2048 windows and 0.0003 difference with 8192 windows). As shown in Supplemental Figure S1 [see Additional File 1], the median of optimal window number in all bacteria is 1024, which corresponds to the median of 2511 bp/window. Therefore, if a genome is sufficiently large, use of 1024 windows (2511 bp/window) produces the most accurate gGCSI value.
SA and dist are generally correlated, and majority of the genomes exhibit dist/SA ratio of around 0.184 (Supplemental Figure S2 [see Additional File 1]. However, this ratio varies by about 10-fold among the genomes, so that the geometric mean better captures the balance between the two indices than the arithmetic mean: (10x + x)/2 = 5.5x, whereas . When GC skew continuously exists along one strand of the genome and does not shift its polarity, the strand results in extremely high dist while SA is low, deviating from the above dist/SA ratio. The genomes of Pseudoalteromonas haloplanktis TAC125 and Halorhodospira halophila SL1 are good examples for such continuously biased genomes, that show gGCSI < 0.1 with geometric mean, but exceed this threshold when calculated with arithmetic mean. This deviation is more pronounced with RCR plasmids that have the same non-shifting GC skew. Sixteen RCR plasmids used in this work showed gGCSI > 1.0 (with maximum of 1.544) when calculated with arithmetic mean, but the use of geometric mean limits to only one genome exceeding gGCSI > 1.0, with 1.069.
Difference in GC skew strength between eubacteria and archaea with different types of replication machinery
Note that the gGCSI is a measure of the clarity of V-shape cumulative GC skew. A high gGCSI score suggests strong mutation or selection pressures induced by bi-directional replication machinery starting from a single origin, whereas a low gGCSI score does not necessarily imply the existence of alternative replication machinery such as multiple replication origins. Weak GC skew can also result from long doubling times, as exemplified by low gGCSI scores in Mycoplasma and Cyanobacteria species. It is also worth noting that the gGCSI and z-score are weakly correlated (r = 0.578 and ρ = 0.678). Since z-score is calculated from the distribution of gGCSI values calculated for randomly shuffled genome sequences for 100 iterations, this value indicates the non-randomness of the observed gGCSI. Therefore, the correlation between the gGCSI score and its z-score indicates that high degree of skewness is not a random property that can happen by chance or due to certain bias in the genome such as extremely high GC content, and that certain mutation or selective pressure was required to shape the pronounced GC skew. Prediction of replication origins can be erroneous in species where GC skew is not clear or where multiple origins exist. gGCSI can thus be used as a confidence measure for GC skew-based predictions; according to the above results, chromosomes with gGCSI > 0.1 and z-score > 3 can be considered to have sufficient GC skew strength for accurate prediction with this number of windows.
Difference in GC skew strength between plasmids with different types of replication machinery
Correlation of GC skew strength between plasmids and their hosts
Previous work has shown similarity in dinucleotide composition between plasmids and host chromosomes [30, 31]. This similarity is assumed to be caused by host-specific mutation biases of replication machineries, but the exact mechanisms remain unknown. Our finding that plasmids tend to be similar in GC skew strength to their host chromosomes strongly supports the assumption that host-specific properties of replication machineries homogenise the nucleotide composition of replicons in the cell.
Application of gGCSI to other genomic compositional skews
This manuscript has thus far only considered the GC skew; however, other genomic compositional skews can be alternatively calculated using A+T, keto (G+T), or purine (A+G) bases, as AT skew (T-A)/(T+A), Keto skew (A+C-G-T)/(A+T+G+C), and Purine skew (C+T-A-G)/(A+T+G+C), respectively . By utilizing these skew values as input instead of GC skew, we can likewise obtain gATSI, gKetoSI, and gPurineSI. In order to assess the applicability of these indices in comparison to the gGCSI, we have reproduced the Figures 2 to 4 using these indices (Supplemental Figures S4a-c, S5a-c, and S6a-c [see Additional File 1]). In all analyses, skew index with non-GC skews distributed in much narrower range, and separation of different replication machineries was best demonstrated with gGCSI. Correlation between the skew indices of the plasmids and their host chromosomes was also highest with gGCSI, with gGCSI (r = 0.791), ATSI (r = 0.491), gKetoSI (r = 0.569), and gPurineSI (r = 0.528).
Implementation and availability
The algorithm described in this work is implemented as gcsi function in the 1.8.6 or above versions of G-language Genome Analysis Environment (G-language GAE) package [45–47], which includes the ability to calculate gATSI, gKetoSI, and gPurineSI along with gGCSI. G-language GAE is freely available with open source code licensed under GNU General Public License, Therefore, researchers can readily utilize gGCSI in their analyses through the Perl Application Programming Interface, or through web services provided by the G-language Project .
Generalised GC skew index (gGCSI) is a quantitative measure of GC skew strength in genomes of any length that enables comparative study of replication-related mutation or selection pressures in bacterial chromosomes and plasmids. The gGCSI can be used to suggest the type of replication machinery used, i.e., bi-directional replication from a single origin and replication from multiple origins in eubacteria and archaea, as well as RCR in plasmids. The correlation of the degree of GC skew between bacterial plasmids and their host chromosomes suggests that these replicons within the same cells have replicated using the same replication machinery. gGCSI can be a useful measure for the study of replication-related features in bacterial genomes, and the index also provides confidence measures for GC skew-based predictions of replication origins.
Software and genome sequences
Genome analyses were conducted using the G-language Genome Analysis Environment version 1.8.6 [45–47], and gGCSI is implemented and released with this software package. The 846 complete chromosome sequences of eubacteria (710 strains, note that several strains contain multiple chromosomes) and archaea (53 strains) and 713 plasmid genomes were obtained from the NCBI FTP repository . The 713 plasmids were further filtered to remove RCR replicons by excluding plasmids containing the RCR initiator protein Rep (COG5655: plasmid rolling circle replication initiator protein and truncated derivatives), leaving 697 genomes. A similarity search using BLASTP  with the 34 Rep sequences included in these genomes resulted in same number of filtered genomes. The 211 RCR plasmid genomes were downloaded through the links provided in the Database of Plasmid Replicons (DPR) . For comparison of the strength in replication-related mutation or selection pressures between host chromosomes and plasmids, 302 chromosomes of host bacteria that harbour 606 plasmids were used.
Calculation of the GCSI
dist is calculated as the absolute difference between the maximum and minimum values of cumulative GC skew graph.
Calculation of the gGCSI
where k3 = 600000, k4 = 40, and α = 0.4, as calculated by regression analysis.
where W is the number of windows used in the analysis.
Calculation of z-score and p-value
Because the gGCSI is independent of the window size and number of windows, the significance of the gGCSI value should be noted to determine whether the number of windows used in the analysis is statistically sufficient to give the resulting value. Therefore, the significance measure is calculated from the distribution of gGCSI values for a shuffled input signal. For a given discrete GC skew signal f(n), 100 randomly shuffled series f'(n) are generated for which the gGCSI is calculated. Iteration size of 100 is chosen by default for computational efficiency, and this number can be configured when necessary. Then, the significance of the gGCSI based on the original GC skew signal f(n) is statistically assessed using the z-score based on the shuffled iterations, from which the p-value is obtained. Normal distribution of shuffled iterations was confirmed with Kolmovorov-Smirnov-Lillifors test with p < 0.001, for all genomes used in this work. Because re-sampling methods change the necessary window numbers/sizes and the coordination of genomic loci and because purely random values ignore the effects of diverse GC content, we have chosen this parametric statistic.
List of abbreviations
- dist :
fast Fourier transform
GC skew index
generalised AT skew index
generalised GC skew index
generalised Keto skew index
generalised Purine skew index
rolling circle replication
- SA :
- SR :
signal to noise ratio
Pearson product moment correlation coefficient
- ρ :
Spearman rho rank correlation coefficient.
This research is supported by the Grant-in-Aid for Young Scientists No.20710158 from the Japan Society for the Promotion of Science (JSPS), as well as funds from the Yamagata Prefectural Government and Tsuruoka City.
- Couturier E, Rocha EP: Replication-associated gene dosage effects shape the genomes of fast-growing bacteria but only for transcription and translation genes. Mol Microbiol. 2006, 59 (5): 1506-1518. 10.1111/j.1365-2958.2006.05046.x.View ArticlePubMedGoogle Scholar
- Rocha EP: The replication-related organization of bacterial genomes. Microbiology. 2004, 150 (Pt 6): 1609-1627. 10.1099/mic.0.26974-0.View ArticlePubMedGoogle Scholar
- Lobry JR, Louarn JM: Polarisation of prokaryotic chromosomes. Curr Opin Microbiol. 2003, 6 (2): 101-108. 10.1016/S1369-5274(03)00024-9.View ArticlePubMedGoogle Scholar
- Arakawa K, Uno R, Nakayama Y, Tomita M: Validating the significance of genomic properties of Chi sites from the distribution of all octamers in Escherichia coli. Gene. 2007, 392 (1-2): 239-246. 10.1016/j.gene.2006.12.022.View ArticlePubMedGoogle Scholar
- Kowalczykowski SC, Dixon DA, Eggleston AK, Lauder SD, Rehrauer WM: Biochemistry of homologous recombination in Escherichia coli. Microbiol Rev. 1994, 58 (3): 401-465.PubMed CentralPubMedGoogle Scholar
- Bigot S, Saleh OA, Lesterlin C, Pages C, El Karoui M, Dennis C, Grigoriev M, Allemand JF, Barre FX, Cornet F: KOPS: DNA motifs that control E. coli chromosome segregation by orienting the FtsK translocase. Embo J. 2005, 24 (21): 3770-3780. 10.1038/sj.emboj.7600835.PubMed CentralView ArticlePubMedGoogle Scholar
- Hendrickson H, Lawrence JG: Selection for chromosome architecture in bacteria. J Mol Evol. 2006, 62 (5): 615-629. 10.1007/s00239-005-0192-2.View ArticlePubMedGoogle Scholar
- Rocha EP: The organization of the bacterial genome. Annu Rev Genet. 2008, 42: 211-233. 10.1146/annurev.genet.42.110807.091653.View ArticlePubMedGoogle Scholar
- Lobry JR: Asymmetric substitution patterns in the two DNA strands of bacteria. Mol Biol Evol. 1996, 13 (5): 660-665.View ArticlePubMedGoogle Scholar
- Lobry JR, Sueoka N: Asymmetric directional mutation pressures in bacteria. Genome Biol. 2002, 3 (10): RESEARCH0058-10.1186/gb-2002-3-10-research0058.PubMed CentralView ArticlePubMedGoogle Scholar
- Frank AC, Lobry JR: Oriloc: prediction of replication boundaries in unannotated bacterial chromosomes. Bioinformatics. 2000, 16 (6): 560-561. 10.1093/bioinformatics/16.6.560.View ArticlePubMedGoogle Scholar
- Arakawa K, Tomita M: Selection effects on the positioning of genes and gene structures from the interplay of replication and transcription in bacterial genomes. Evol Bioinform Online. 2007, 3: 279-286.PubMed CentralPubMedGoogle Scholar
- Chen C, Chen CW: Quantitative analysis of mutation and selection pressures on base composition skews in bacterial chromosomes. BMC Genomics. 2007, 8: 286-10.1186/1471-2164-8-286.PubMed CentralView ArticlePubMedGoogle Scholar
- Tillier ER, Collins RA: The contributions of replication orientation, gene direction, and signal sequences to base-composition asymmetries in bacterial genomes. J Mol Evol. 2000, 50 (3): 249-257.PubMedGoogle Scholar
- Touchon M, Rocha EP: From GC skews to wavelets: a gentle guide to the analysis of compositional asymmetries in genomic data. Biochimie. 2008, 90 (4): 648-659. 10.1016/j.biochi.2007.09.015.View ArticlePubMedGoogle Scholar
- Zhang CT, Zhang R, Ou HY: The Z curve database: a graphic representation of genome sequences. Bioinformatics. 2003, 19 (5): 593-599. 10.1093/bioinformatics/btg041.View ArticlePubMedGoogle Scholar
- Kowalczuk M, Mackiewicz P, Mackiewicz D, Nowicka A, Dudkiewicz M, Dudek MR, Cebrat S: DNA asymmetry and the replicational mutational pressure. J Appl Genet. 2001, 42 (4): 553-577.PubMedGoogle Scholar
- Salzberg SL, Salzberg AJ, Kerlavage AR, Tomb JF: Skewed oligomers and origins of replication. Gene. 1998, 217 (1-2): 57-67. 10.1016/S0378-1119(98)00374-6.View ArticlePubMedGoogle Scholar
- Worning P, Jensen LJ, Hallin PF, Staerfeldt HH, Ussery DW: Origin of replication in circular prokaryotic chromosomes. Environ Microbiol. 2006, 8 (2): 353-361. 10.1111/j.1462-2920.2005.00917.x.View ArticlePubMedGoogle Scholar
- Arakawa K, Tomita M: The GC Skew Index: A Measure of Genomic Compositional Asymmetry and the Degree of Replicational Selection. Evol Bioinform Online. 2007, 3: 159-168.PubMed CentralPubMedGoogle Scholar
- Arakawa K, Saito R, Tomita M: Noise-reduction filtering for accurate detection of replication termini in bacterial genomes. FEBS Lett. 2007, 581 (2): 253-258. 10.1016/j.febslet.2006.12.021.View ArticlePubMedGoogle Scholar
- del Solar G, Giraldo R, Ruiz-Echevarria MJ, Espinosa M, Diaz-Orejas R: Replication and control of circular bacterial plasmids. Microbiol Mol Biol Rev. 1998, 62 (2): 434-464.PubMed CentralPubMedGoogle Scholar
- Cevallos MA, Cervantes-Rivera R, Gutierrez-Rios RM: The repABC plasmid family. Plasmid. 2008, 60 (1): 19-37. 10.1016/j.plasmid.2008.03.001.View ArticlePubMedGoogle Scholar
- Khan SA: Plasmid rolling-circle replication: recent developments. Mol Microbiol. 2000, 37 (3): 477-484. 10.1046/j.1365-2958.2000.02001.x.View ArticlePubMedGoogle Scholar
- Khan SA: Plasmid rolling-circle replication: highlights of two decades of research. Plasmid. 2005, 53 (2): 126-136. 10.1016/j.plasmid.2004.12.008.View ArticlePubMedGoogle Scholar
- Lawrence JG, Ochman H: Amelioration of bacterial genomes: rates of change and exchange. J Mol Evol. 1997, 44 (4): 383-397. 10.1007/PL00006158.View ArticlePubMedGoogle Scholar
- Ochman H, Lawrence JG, Groisman EA: Lateral gene transfer and the nature of bacterial innovation. Nature. 2000, 405 (6784): 299-304. 10.1038/35012500.View ArticlePubMedGoogle Scholar
- Rocha EP, Danchin A: Base composition bias might result from competition for metabolic resources. Trends Genet. 2002, 18 (6): 291-294. 10.1016/S0168-9525(02)02690-2.View ArticlePubMedGoogle Scholar
- van Passel MW, Bart A, Luyf AC, van Kampen AH, Ende van der A: Compositional discordance between prokaryotic plasmids and host chromosomes. BMC Genomics. 2006, 7: 26-10.1186/1471-2164-7-26.PubMed CentralView ArticlePubMedGoogle Scholar
- Campbell A, Mrazek J, Karlin S: Genome signature comparisons among prokaryote, plasmid, and mitochondrial DNA. Proc Natl Acad Sci USA. 1999, 96 (16): 9184-9189. 10.1073/pnas.96.16.9184.PubMed CentralView ArticlePubMedGoogle Scholar
- Suzuki H, Sota M, Brown CJ, Top EM: Using Mahalanobis distance to compare genomic signatures between bacterial plasmids and chromosomes. Nucleic Acids Res. 2008, 36 (22): e147-10.1093/nar/gkn753.PubMed CentralView ArticlePubMedGoogle Scholar
- Grigoriev A: Analyzing genomes with cumulative skew diagrams. Nucleic Acids Res. 1998, 26 (10): 2286-2290. 10.1093/nar/26.10.2286.PubMed CentralView ArticlePubMedGoogle Scholar
- Fricke WF, Seedorf H, Henne A, Kruer M, Liesegang H, Hedderich R, Gottschalk G, Thauer RK: The genome sequence of Methanosphaera stadtmanae reveals why this human intestinal archaeon is restricted to methanol and H2 for methane formation and ATP synthesis. J Bacteriol. 2006, 188 (2): 642-658. 10.1128/JB.188.2.642-658.2006.PubMed CentralView ArticlePubMedGoogle Scholar
- Kennedy SP, Ng WV, Salzberg SL, Hood L, DasSarma S: Understanding the adaptation of Halobacterium species NRC-1 to its extreme environment through computational analysis of its genome sequence. Genome Res. 2001, 11 (10): 1641-1650. 10.1101/gr.190201.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhang R, Zhang CT: Multiple replication origins of the archaeon Halobacterium species NRC-1. Biochem Biophys Res Commun. 2003, 302 (4): 728-734. 10.1016/S0006-291X(03)00252-3.View ArticlePubMedGoogle Scholar
- Berquist BR, DasSarma S: An archaeal chromosomal autonomously replicating sequence element from an extreme halophile, Halobacterium sp. strain NRC-1. J Bacteriol. 2003, 185 (20): 5959-5966. 10.1128/JB.185.20.5959-5966.2003.PubMed CentralView ArticlePubMedGoogle Scholar
- Bernander R, Skarstad K: Mapping of a chromosome replication origin in an archaeon. Trends Microbiol. 2000, 8 (12): 535-537. 10.1016/S0966-842X(00)01878-3.View ArticlePubMedGoogle Scholar
- Matsunaga F, Forterre P, Ishino Y, Myllykallio H: In vivo interactions of archaeal Cdc6/Orc1 and minichromosome maintenance proteins with the replication origin. Proc Natl Acad Sci USA. 2001, 98 (20): 11152-11157. 10.1073/pnas.191387498.PubMed CentralView ArticlePubMedGoogle Scholar
- Matsunaga F, Glatigny A, Mucchielli-Giorgi MH, Agier N, Delacroix H, Marisa L, Durosay P, Ishino Y, Aggerbeck L, Forterre P: Genomewide and biochemical analyses of DNA-binding activity of Cdc6/Orc1 and Mcm proteins in Pyrococcus sp. Nucleic Acids Res. 2007, 35 (10): 3214-3222. 10.1093/nar/gkm212.PubMed CentralView ArticlePubMedGoogle Scholar
- Myllykallio H, Forterre P: Mapping of a chromosome replication origin in an archaeon: response. Trends Microbiol. 2000, 8 (12): 537-539. 10.1016/S0966-842X(00)01881-3.View ArticlePubMedGoogle Scholar
- Myllykallio H, Lopez P, Lopez-Garcia P, Heilig R, Saurin W, Zivanovic Y, Philippe H, Forterre P: Bacterial mode of replication with eukaryotic-like machinery in a hyperthermophilic archaeon. Science. 2000, 288 (5474): 2212-2215. 10.1126/science.288.5474.2212.View ArticlePubMedGoogle Scholar
- Database of Plasmid Replicon: [http://www.essex.ac.uk/bs/staff/osborn/DPR/DPR_database.htm]
- del Solar G, Espinosa M: Plasmid copy number control: an ever-growing story. Mol Microbiol. 2000, 37 (3): 492-500. 10.1046/j.1365-2958.2000.02005.x.View ArticlePubMedGoogle Scholar
- Freeman JM, Plasterer TN, Smith TF, Mohr SC: Patterns of Genome Organization in Bacteria. Science. 1998, 279 (5358): 1827a-10.1126/science.279.5358.1827a.View ArticleGoogle Scholar
- Arakawa K, Mori K, Ikeda K, Matsuzaki T, Kobayashi Y, Tomita M: G-language Genome Analysis Environment: a workbench for nucleotide sequence data mining. Bioinformatics. 2003, 19 (2): 305-306. 10.1093/bioinformatics/19.2.305.View ArticlePubMedGoogle Scholar
- Arakawa K, Suzuki H, Tomita M: Computational Genome Analysis Using The G-language System. Genes, Genomes and Genomics. 2008, 2 (1): 1-13.Google Scholar
- Arakawa K, Tomita M: G-language System as a platform for large-scale analysis of high-throughput omics data. Journal of Pesticide Science. 2006, 31 (3): 282-288. 10.1584/jpestics.31.282.View ArticleGoogle Scholar
- G-language REST Web Service: [http://rest.g-language.org/]
- NCBI RefSeq FTP Repository: [http://www.ncbi.nlm.nih.gov/Ftp/]
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25 (17): 3389-3402. 10.1093/nar/25.17.3389.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.