An evolutionary analysis of genome expansion and pathogenicity in Escherichia coli
© Bohlin et al.; licensee BioMed Central Ltd. 2014
Received: 24 March 2014
Accepted: 29 September 2014
Published: 9 October 2014
There are several studies describing loss of genes through reductive evolution in microbes, but how selective forces are associated with genome expansion due to horizontal gene transfer (HGT) has not received similar attention. The aim of this study was therefore to examine how selective pressures influence genome expansion in 53 fully sequenced and assembled Escherichia coli strains. We also explored potential connections between genome expansion and the attainment of virulence factors. This was performed using estimations of several genomic parameters such as AT content, genomic drift (measured using relative entropy), genome size and estimated HGT size, which were subsequently compared to analogous parameters computed from the core genome consisting of 1729 genes common to the 53 E. coli strains. Moreover, we analyzed how selective pressures (quantified using relative entropy and dN/dS), acting on the E. coli core genome, influenced lineage and phylogroup formation.
Hierarchical clustering of dS and dN estimations from the E. coli core genome resulted in phylogenetic trees with topologies in agreement with known E. coli taxonomy and phylogroups. High values of dS, compared to dN, indicate that the E. coli core genome has been subjected to substantial purifying selection over time; significantly more than the non-core part of the genome (p<0.001). This is further supported by a linear association between strain-wise dS and dN values (β = 26.94 ± 0.44, R 2 ~0.98, p<0.001). The non-core part of the genome was also significantly more AT-rich (p<0.001) than the core genome and E. coli genome size correlated with estimated HGT size (p<0.001). In addition, genome size (p<0.001), AT content (p<0.001) as well as estimated HGT size (p<0.005) were all associated with the presence of virulence factors, suggesting that pathogenicity traits in E. coli are largely attained through HGT. No associations were found between selective pressures operating on the E. coli core genome, as estimated using relative entropy, and genome size (p~0.98).
On a larger time frame, genome expansion in E. coli, which is significantly associated with the acquisition of virulence factors, appears to be independent of selective forces operating on the core genome.
It has been widely documented that horizontal gene transfer (HGT) can make potentially harmless, even probiotic, bacterial species lethal [1, 2]. Considerable research has focused on how bacteria can evolve from being nonthreatening, host-independent and free-living organisms to become obligatory intracellular parasites with reduced genomes [3–9]. However, the evolutionary mechanisms explaining genome expansion due to HGT are much less documented. One reason for this is the need for a large number of fully sequenced and assembled genomes from strains of species that are particularly well suited for such analyses. The recent development of high-throughput sequencing technology has reduced sequencing costs and for many microbial species there are now multiple strains, completely sequenced and assembled, available for analyses in public databases . This allowed us to explore strain-level relationships between base composition, genome size and predicted HGT in several microbial species in a recent study . We found that the genome size, compared at strain-level, was predominantly correlated with genomic AT content, contrary to what has been found for prokaryotes in general . Additionally, AT content correlated with predicted HGT size, which again correlated with chromosome size . In this study we also analyzed the influence of selective pressures on microbial genome size using the concept of relative entropy [13, 14].
Relative entropy can be used to measure genomic distance and is computed with the Kullback–Leibler measure between observed and expected codon frequencies (see  for more details). The expected codon frequencies are calculated from genomic nucleotide frequencies so that decreasing distances between observed and expected codon frequencies imply increased independence between the neighboring nucleotides constituting the codons. This implies more random distributions of codon frequencies presumably due to mutations/genetic drift [13, 15]. A negative correlation between relative entropy and AT content has previously been detected in microbial genomes, implying that AT-rich genomes tend to have, on average, a more random base composition than GC-rich genomes [11, 13, 16]. The greater similarity between AT-rich genomes and random DNA sequences, with similar base compositions, is a consequence of the fact that genomic mutations are in general biased towards AT-richness [17, 18].
Horizontally transferred DNA tends to have lower relative entropy than DNA of the host chromosome. Thus, it is likely that the genomes of strains with high levels of horizontally transferred DNA will, on average, have lower relative entropy than the genomes of strains having received less HGT . However, it may also suggest differences in how selective forces operate at the strain level, analogous to the general negative correlation between AT content and genome size, which appears to be largely reversed at the strain-level of bacterial species .
The dN /dS ratio, where dN describes the difference in non-synonymous substitutions between taxa and dS designates the difference in synonymous substitutions, has also been associated with selective pressures . Indeed, a large dS relative to dN is linked to purifying selection; dN = dS is assumed to indicate neutrality of selection, while a dN greater than dS is associated with positive selection . Not only does dS > dN provide an approximate quantitative measure of the selective pressures involved in purging non-synonymous substitutions resulting in reduced fitness, but the relation may also give clues about the species’ population structure . Additionally, time is a central factor . A recent divergence between two or more strains is often indicated by dN > dS, since such mutations are more likely to take place within a short time span .
It has previously been shown that purifying selection correlates with genome size for microbes above strain level . In the present study we wanted to examine whether selective forces would leave a base compositional pattern in the core genomes of bacterial strains undergoing genome expansion, mediated through HGT, since such a pattern has been observed for microbial species undergoing genome reduction [3, 13]. We focused our analysis on E. coli since this particular species is renowned for extensive HGT and has many strains sequenced and fully assembled [21, 22]. Since pathogenicity has been linked with HGT [1, 2] we also wanted to test whether the pathogenic potential of the E. coli strains correlated with genomic properties such as AT content, genome size, genomic drift, and selective pressures, as estimated using relative entropy and dN/dS. To reach our aim we extracted the E. coli core genome, consisting of 1729 genes, from 53 E. coli strains and estimated dS and dN, as well as the other genomic properties mentioned above. We also generated a maximum likelihood tree based on mutations in the mutT gene, which has been associated with hyper-mutable strains , and compared the congruency of that tree to the trees resulting from the dS- and dN-based hierarchical cluster analyses.
Results and discussion
Estimation of dN and dS from the E. colicore genome
We wanted to explore whether there was a relationship between the selective pressures that the E. coli core genome has been subjected to and genome expansion due to HGT since an association between purifying selection and genome size has previously been identified for microbial species in general .
Information about the different E. coli strains used in the study
Escherichia coli O7:K1 CE10
ExPEC (neonatal meningitis)
Escherichia coli IAI39
ExPEC (Uropathogenic E. coli (UPEC))
Escherichia coli SMS 3-5
Escherichia coli UMN026
Escherichia coli 536
Escherichia coli ED1A
Escherichia coli O83:H1 NRG 857C
AIEC (adherent-invasive E. coli)
Escherichia coli LF82
Escherichia coli clone D i2
Escherichia coli clone D i14
Escherichia coli CFT073
Escherichia coli ABU 83972
Escherichia coli UTI89
Escherichia coli UM146
Escherichia coli IHE3034
ExPEC (neonatal meningitis)
Escherichia coli APEC O1
Avian pathogenic E. coli (APEC)
Escherichia coli S88
ExPEC (neonatal meningitis)
Escherichia coli SE15
Escherichia coli NA114
ExPEC (multidrug-resistant UPEC)
Escherichia coli E2348_69 O127:H6
Enteropathogenic E. coli (EPEC)
Escherichia coli O157:H7 TW14359
Shiga toxin-producing E. coli (STEC/EHEC)
Escherichia coli O157:H7 EC4115
Escherichia coli O157:H7 Sakai
Escherichia coli Xuzhou21
Escherichia coli O55:H7 RM12579
Atypical EPEC (aEPEC)
Escherichia coli O55:H7 CB9615
Escherichia coli UMNK88
Enterotoxigenic E. coli (ETEC)
Escherichia coli P12b
Escherichia coli HS
Escherichia coli BL21 DE3
Lab strain – Non-pathogenic
Escherichia coli B REL606
Lab strain – Non-pathogenic
Escherichia coli BL21 Gold DE3 pLysS AG
Lab strain – Non-pathogenic
Escherichia coli BW2952
Lab strain – Non-pathogenic
Escherichia coli K12 substr DH10B
Lab strain – Non-pathogenic
Escherichia coli K12 substr MDS42
Lab strain – Non-pathogenic
Escherichia coli K12 substr W3110
Lab strain – Non-pathogenic
Escherichia coli DH1 (AP012030.1)
Lab strain – Non-pathogenic
Escherichia coli DH1 (CP001637.1)
Lab strain – Non-pathogenic
Escherichia coli O104:H4 str. 2009EL-2050
Enteroaggregative – EHEC (EAggEC-EHEC)
Escherichia coli O104:H4 str. 2009EL-2071
Escherichia coli O104:H4 str. 2011C-3493
Escherichia coli 55989
Escherichia coli W (CP002185.1)
Escherichia coli W (CP002967.1)
Escherichia coli KO11FL_162099 (CP002516.1)
Escherichia coli KO11FL_162099 (CP002970.1)
Escherichia coli SE11
Escherichia coli IAI1
Escherichia coli E24377A
Escherichia coli O103:H2 str. 12009
Escherichia coli O111:H- str. 11128
Escherichia coli O26:H11 str. 11368
Escherichia coli APEC O78
From the heatmaps in Figures 1 and 2 it can be seen that dS is considerably higher than dN implying that the E. coli core genome has been subjected to strong purifying selection . Since the core genome consists of all genes common to all the strains discussed here, these genes are presumably important for the species survival and the removal of fitness-reducing mutations appears to have been of considerable importance for the evolution of the different lineages.
Phylogenetic inferences from the mutTgene
Examination of the base composition in the E. colicore genome
As previously mentioned, the selective pressures that the E. coli core genome has been exposed to can be analyzed using relative entropy . The genomic frequencies of codons subjected to strong selective pressures are assumed to be substantially different than the corresponding products of nucleotide frequencies. Conversely, codons exposed to weak selective pressures will presumably have more similar frequencies to the corresponding product of nucleotide frequencies due to mutational bias [13, 29]. The relative entropy measure cannot separate between positive- and negative selective pressures associated with dN/dS-based methods. Therefore, with regards to relative entropy, selective pressures will denote both positive- and negative selective pressures.
We also examined whether there was any association between core- and whole genome levels of both relative entropy and AT content, which could point towards similar selective pressures operating on the core- and whole genome. Our findings indicate no correlation between core- and whole genome relative entropy (p ~ 0.26) suggesting that selective pressures operating on the core genome are most likely unrelated to selective forces effective on the rest of the genome. Core- and whole genome AT content may be negatively correlated (p ~ 0.058), albeit weakly. Since this negative correlation was produced with robust regression, the result was somewhat surprising. An extra generalized additive model (GAM)  was therefore fitted, since such models are more capable of modeling non-linear relations, but the association between core- and whole genome AT content was no longer statistically significant (p ~ 0.23). Hence, these results seem to suggest that different selective pressures form the E. coli core and non-core genomes.
The effect of selective pressures on E. coligenome size
Genome expansion and genome reduction in E. coli
To our knowledge, there are no previous studies of evolutionary forces responsible for genome expansion due to HGT. A recent study discusses evolutionary aspects of recombination in recently emerged clonal Staphylococcus aureus and Clostridium botulinum isolates by examining dN and dS of SNPs in core-, non-core- and recombined DNA, but does not deal with genome expansion as such . Our findings suggest that the E. coli core genome has been subjected to substantial selective pressures over time compared to the genome as a whole. The linear association between median dS and dN estimations for all strains indicates that purifying selection has been directing E. coli lineage evolution and the comparably low rates of non-synonymous substitutions (dN) may indicate that the core genome has remained intact for a longer time span . It should also be noted that all E. coli strains examined in this study are publicly available whole genome sequences, and the fact that they have been selected for sequencing may be due to some special traits not commonly observed in wild-type E. coli.
Our results support previous studies arguing that acquisition of traits through HGT may be a consequence of “spandrel”-like evolutionary processes  where the functions of acquired genes are formed through positive selection over time or eventually lost [34, 35]. Hence, increase of selective pressures appears not to be the driving force behind chromosome expansion and acquisition of new traits in E. coli, which is consistent with related findings, also those pointing to an analogous evolutionary trail for gene duplications [36, 37]. Pathogenic E. coli may thus have evolved as a consequence of a hostile environment, where virulence associated genes are abundant. We anticipate that a lot more will be said about this in the future.
59 E. coli genomes, with their annotated coding genes and corresponding proteins, were downloaded from NCBI/Genbank . In six of the genomes we discovered a lack of correspondence between the coding genes and their listed proteins, and these six genomes were discarded from the downstream analysis. See Additional file 1 for more information on the different E. coli strains used in the study. Genomic properties such as genome size, AT content and relative entropy were estimated using in-house scripts that are available upon request. All statistical analyses were performed with R .
Extraction of the core genome
All proteins from every genome were BLASTed (blastp)  against all proteins of all other genomes, and a distance was computed between all protein pairs as described in . Based on these distances, proteins were clustered using hierarchical clustering with complete linkage, and divided into clusters by cutting the dendrogram tree at distance 0,1. Loosely speaking, this means any two proteins in the same cluster share 90% similarity. Using this rather strict cutoff resulted in a set of 1729 core clusters, i.e. clusters with at least one protein for each of the 53 genomes in the study. Next, paralogs were eliminated from each cluster using the same procedure as described in . The 53 orthologs in each cluster were aligned using the MCoffee software  and the protein-alignments were back-translated to DNA-alignments using the TranslatorX software .
Estimation of core genome dN and dS
To calculate dN and dS we followed the method first described by Li et al. . Briefly, we sequentially performed gene-wise multiple alignments as described above on all 1729 core genes from the 53 strains used in the study. The alignment ends were trimmed manually so that the sequences within the alignments were all of the same length. We then used the seqinr package  in R to read the alignments, and subsequently calculated gene-by-strain dN and dS values using the kaks() command. For strain-wide assessments, the dN and dS estimates for individual genes were added up and weighted according to gene length. dN and dS for each strain were based on the median from all versus all comparisons.
Due to the bimodal distribution of the core genome-based dN and dS values, heatmaps based on hierarchical clustering with Euclidean distance were created for each of the dS, dN and dS/dN estimated distance matrices so that potential differences between the strains could be examined. These matrices are included in an R-file (see Additional file 2).
where the sum is taken over all 64 possible codons XYZ consisting of nucleotides X, Y and Z, respectively. F i is a function returning the frequency of codon XYZ, or nucleotides X, Y and Z, from genome i. A low D KL indicates that the observed codon frequencies are, on average, similar to the individual nucleotide frequencies, signifying that the codon frequencies are more random, presumably due to relaxation of the selective forces operating on the genome .
The mutTbased phylogenetic tree
The phylogenic tree based on the mutT gene was created, after sequence alignment, using maximum likelihood estimation and 500 bootstraps using the package Mega 6 . Based on statistical analyses carried out with the “Ape” package in R , we found that a nucleotide substitution model based on the Tamura-Nei 93 model , which assumes equal transversion rates and unequal transition rates, with Gamma-distributed among-site rate variation was the model with the lowest AIC  and therefore chosen. The Gamma distribution was discretized into 6 categories, which is the default number of categories; changes to this number did not notably affect the tree topology. The DNA sequences which the mutT based phylogenetic tree is based on are included in FASTA-format (Additional file 3).
The statistical analyses were carried out using an iterative robust MM-type regression (M-type estimator with Tukey’s biweight and initial coefficient estimates provided from an S-type estimator)  with significance estimates (p-values) obtained from t-statistics. All similar strains were discarded before these statistical analyses so that the sample size was reduced to 36 strains (see Additional file 4 and Additional file 5). Robust regression was used where there were several outlying residuals resulting in moderately skewed distributions, otherwise standard ordinary least squares regression was used, which additionally includes a goodness-of-fit estimate (R 2 ). The association between core- and whole genome AT content was also tested using a generalized additive model (GAM), where the predictor (whole genome AT content) was modeled using a spline-function . Additional file 5 contains all estimates resulting from the statistical analyses.
JB was supported by Norwegian Institute of Public Health, CS by The Norwegian Veterinary Institute. OB and LS were both funded by The Norwegian University of Life Sciences. We thank Dr. Christine L. Parr for help with language and grammar.
- Fournier PE, Drancourt M, Raoult D: Bacterial genome sequencing and its use in infectious diseases. Lancet Infect Dis. 2007, 7 (11): 711-723. 10.1016/S1473-3099(07)70260-8.PubMedView ArticleGoogle Scholar
- Pallen MJ, Wren BW: Bacterial pathogenomics. Nature. 2007, 449 (7164): 835-842. 10.1038/nature06248.PubMedView ArticleGoogle Scholar
- McCutcheon JP, Moran NA: Extreme genome reduction in symbiotic bacteria. Nat Rev Microbiol. 2012, 10 (1): 13-26.Google Scholar
- Moran NA: Microbial minimalism: genome reduction in bacterial pathogens. Cell. 2002, 108 (5): 583-586. 10.1016/S0092-8674(02)00665-7.PubMedView ArticleGoogle Scholar
- Moran NA, McLaughlin HJ, Sorek R: The dynamics and time scale of ongoing genomic erosion in symbiotic bacteria. Science (New York, NY). 2009, 323 (5912): 379-382. 10.1126/science.1167140.View ArticleGoogle Scholar
- Toh H, Weiss BL, Perkin SA, Yamashita A, Oshima K, Hattori M, Aksoy S: Massive genome erosion and functional adaptations provide insights into the symbiotic lifestyle of Sodalis glossinidius in the tsetse host. Genome Res. 2006, 16 (2): 149-156.PubMed CentralPubMedView ArticleGoogle Scholar
- Wernegreen JJ: Reduced selective constraint in endosymbionts: elevation in radical amino acid replacements occurs genome-wide. PLoS One. 2011, 6 (12): e28905-10.1371/journal.pone.0028905.PubMed CentralPubMedView ArticleGoogle Scholar
- Yus E, Maier T, Michalodimitrakis K, Van Noort V, Yamada T, Chen WH, Wodke JA, Guell M, Martinez S, Bourgeois R, Kühner S, Raineri E, Letunic I, Kalinina OV, Rode M, Herrmann R, Gutiérrez-Gallego R, Russell RB, Gavin AC, Bork P, Serrano L: Impact of genome reduction on bacterial metabolism and its regulation. Science (New York, NY). 2009, 326 (5957): 1263-1268. 10.1126/science.1177263.View ArticleGoogle Scholar
- Hershberg R, Tang H, Petrov DA: Reduced selection leads to accelerated gene loss in Shigella. Genome Biol. 2007, 8 (8): R164-10.1186/gb-2007-8-8-r164.PubMed CentralPubMedView ArticleGoogle Scholar
- Benson DA, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW: GenBank. Nucleic Acids Res. 2014, 42 (1): D32-D37.PubMed CentralPubMedView ArticleGoogle Scholar
- Bohlin J, Sekse C, Skjerve E, Brynildsrud O: Positive correlations between genomic% AT and genome size within strains of bacterial species. Environ Microbiol Rep. 2014, 6 (3): 278-286. 10.1111/1758-2229.12145.PubMedView ArticleGoogle Scholar
- Mitchell D: GC content and genome length in Chargaff compliant genomes. Biochem Biophys Res Commun. 2007, 353 (0006–291; 1): 207-210.PubMedView ArticleGoogle Scholar
- Bohlin J, Van Passel MW, Snipen L, Kristoffersen AB, Ussery D, Hardy SP: Relative entropy differences in bacterial chromosomes, plasmids, phages and genomic islands. BMC Genomics. 2012, 13: 66-2164-2113-2166-View ArticleGoogle Scholar
- Cover TM, Thomas JA: Elements of Information Theory. 1991, New York: John Wiley & Sons, IncView ArticleGoogle Scholar
- Bohlin J, Brynildsrud O, Vesth T, Skjerve E, Ussery DW: Amino acid usage is asymmetrically biased in AT- and GC-rich microbial genomes. PLoS One. 2013, 8 (7): e69878-10.1371/journal.pone.0069878.PubMed CentralPubMedView ArticleGoogle Scholar
- Bohlin J, Skjerve E, Ussery DW: Investigations of oligonucleotide usage variance within and between prokaryotes. PLoS Comput Biol. 2008, 4 (4): e1000057-10.1371/journal.pcbi.1000057.PubMed CentralPubMedView ArticleGoogle Scholar
- Hershberg R, Petrov DA: Evidence that mutation is universally biased towards AT in bacteria. PLoS Genet. 2010, 6 (9): e1001115-10.1371/journal.pgen.1001115.PubMed CentralPubMedView ArticleGoogle Scholar
- Rocha EP, Feil EJ: Mutational patterns cannot explain genome composition: Are there any neutral sites in the genomes of bacteria?. PLoS Genet. 2010, 6 (9): e1001104-10.1371/journal.pgen.1001104.PubMed CentralPubMedView ArticleGoogle Scholar
- Rocha EP, Smith JM, Hurst LD, Holden MT, Cooper JE, Smith NH, Feil EJ: Comparisons of dN/dS are time dependent for closely related bacterial genomes. J Theor Biol. 2006, 239 (2): 226-235. 10.1016/j.jtbi.2005.08.037.PubMedView ArticleGoogle Scholar
- Novichkov PS, Wolf YI, Dubchak I, Koonin EV: Trends in prokaryotic evolution revealed by comparison of closely related bacterial and archaeal genomes. J Bacteriol. 2009, 191 (1): 65-73. 10.1128/JB.01237-08.PubMed CentralPubMedView ArticleGoogle Scholar
- Touchon M, Hoede C, Tenaillon O, Barbe V, Baeriswyl S, Bidet P, Bingen E, Bonacorsi S, Bouchier C, Bouvet O, Calteau A, Chiapello H, Clermont O, Cruveiller S, Danchin A, Diard M, Dossat C, Karoui ME, Frapy E, Garry L, Ghigo JM, Gilles AM, Johnson J, Le Bouguénec C, Lescat M, Mangenot S, Martinez-Jéhanne V, Matic I, Nassif X, Oztas S, et al: Organised genome dynamics in the Escherichia coli species results in highly diverse adaptive paths. PLoS Genet. 2009, 5 (1): e1000344-10.1371/journal.pgen.1000344.PubMed CentralPubMedView ArticleGoogle Scholar
- Van Passel MW, Marri PR, Ochman H: The emergence and fate of horizontally acquired genes in Escherichia coli. PLoS Comput Biol. 2008, 4 (4): e1000059-10.1371/journal.pcbi.1000059.PubMed CentralPubMedView ArticleGoogle Scholar
- Wielgoss S, Barrick JE, Tenaillon O, Wiser MJ, Dittmar WJ, Cruveiller S, Chane-Woon-Ming B, Medigue C, Lenski RE, Schneider D: Mutation rate dynamics in a bacterial population reflect tension between adaptation and genetic load. Proc Natl Acad Sci U S A. 2013, 110 (1): 222-227. 10.1073/pnas.1219574110.PubMed CentralPubMedView ArticleGoogle Scholar
- Didelot X, Meric G, Falush D, Darling AE: Impact of homologous and non-homologous recombination in the genomic evolution of Escherichia coli. BMC Genomics. 2012, 13: 256-10.1186/1471-2164-13-256.PubMed CentralPubMedView ArticleGoogle Scholar
- Monk JM, Charusanti P, Aziz RK, Lerman JA, Premyodhin N, Orth JD, Feist AM, Palsson BO: Genome-scale metabolic reconstructions of multiple Escherichia coli strains highlight strain-specific adaptations to nutritional environments. Proc Natl Acad Sci U S A. 2013, 110 (50): 20338-20343. 10.1073/pnas.1307797110.PubMed CentralPubMedView ArticleGoogle Scholar
- Reeves PR, Liu B, Zhou ZM, Li D, Guo D, Ren Y, Clabots C, Lan RT, Johnson JR, Wang L: Rates of Mutation and Host Transmission for an Escherichia coli Clone over 3 Years. Plos One. 2011, 6 (10): e26907-10.1371/journal.pone.0026907.PubMed CentralPubMedView ArticleGoogle Scholar
- Balbi KJ, Feil EJ: The rise and fall of deleterious mutation. Res Microbiol. 2007, 158 (10): 779-786. 10.1016/j.resmic.2007.09.005.PubMedView ArticleGoogle Scholar
- Garcia-Gonzalez A, Rivera-Rivera RJ, Massey SE: The Presence of the DNA Repair Genes mutM, mutY, mutL, and mutS is Related to Proteome Size in Bacterial Genomes. Front Genet. 2012, 3: 3-PubMed CentralPubMedView ArticleGoogle Scholar
- Chen SL, Lee W, Hottes AK, Shapiro L, McAdams HH: Codon usage between genomes is constrained by genome-wide mutational processes. Proc Natl Acad Sci U S A. 2004, 101 (10): 3480-3485. 10.1073/pnas.0307827100.PubMed CentralPubMedView ArticleGoogle Scholar
- Hildebrand F, Meyer A, Eyre-Walker A: Evidence of selection upon genomic GC-content in bacteria. PLoS Genet. 2010, 6 (9): e1001107-10.1371/journal.pgen.1001107.PubMed CentralPubMedView ArticleGoogle Scholar
- Hastie TJ, Tibshirani RJ: Generalized additive models. 1990, New York: Chapman and Hall/CRC PressGoogle Scholar
- Castillo-Ramirez S, Harris SR, Holden MT, He M, Parkhill J, Bentley SD, Feil EJ: The impact of recombination on dN/dS within recently emerged bacterial clones. PLoS Pathog. 2011, 7 (7): e1002129-10.1371/journal.ppat.1002129.PubMed CentralPubMedView ArticleGoogle Scholar
- Fall S, Mercier A, Bertolla F, Calteau A, Gueguen L, Perriere G, Vogel TM, Simonet P: Horizontal gene transfer regulation in bacteria as a "spandrel" of DNA repair mechanisms. PLoS One. 2007, 2 (10): e1055-10.1371/journal.pone.0001055.PubMed CentralPubMedView ArticleGoogle Scholar
- Chattopadhyay S, Weissman SJ, Minin VN, Russo TA, Dykhuizen DE, Sokurenko EV: High frequency of hotspot mutations in core genes of Escherichia coli due to short-term positive selection. Proc Natl Acad Sci U S A. 2009, 106 (30): 12412-12417. 10.1073/pnas.0906217106.PubMed CentralPubMedView ArticleGoogle Scholar
- Treangen TJ, Rocha EP: Horizontal transfer, not duplication, drives the expansion of protein families in prokaryotes. PLoS Genet. 2011, 7 (1): e1001284-10.1371/journal.pgen.1001284.PubMed CentralPubMedView ArticleGoogle Scholar
- Kondrashov FA: Gene duplication as a mechanism of genomic adaptation to a changing environment. Proc Biol Sci. 2012, 279 (1749): 5048-5057. 10.1098/rspb.2012.1108.PubMed CentralPubMedView ArticleGoogle Scholar
- Paauw A, Leverstein-van Hall MA, Verhoef J, Fluit AC: Evolution in quantum leaps: multiple combinatorial transfers of HPI and other genetic modules in Enterobacteriaceae. PLoS One. 2010, 5 (1): e8662-10.1371/journal.pone.0008662.PubMed CentralPubMedView ArticleGoogle Scholar
- R Core Team: R: A language and environment for statistical computing. 2012Google Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215 (3): 403-410. 10.1016/S0022-2836(05)80360-2.PubMedView ArticleGoogle Scholar
- Snipen L, Wassenaar TM, Altermann E, Olson J, Kathariou S, Lagesen K, Takamiya M, Knochel S, Ussery DW, Meinersmann RJ: Analysis of evolutionary patterns of genes in Campylobacter jejuni and C. coli. Microb Inform Exp. 2012, 2 (1): 8-10.1186/2042-5783-2-8.PubMed CentralPubMedView ArticleGoogle Scholar
- Wallace IM, O'Sullivan O, Higgins DG, Notredame C: M-Coffee: combining multiple sequence alignment methods with T-Coffee. Nucleic Acids Res. 2006, 34 (6): 1692-1699. 10.1093/nar/gkl091.PubMed CentralPubMedView ArticleGoogle Scholar
- Abascal F, Zardoya R, Telford MJ: TranslatorX: multiple alignment of nucleotide sequences guided by amino acid translations. Nucleic Acids Res. 2010, 38 (Web Server issue): W7-W13.PubMed CentralPubMedView ArticleGoogle Scholar
- Li WH, Wu CI, Luo CC: A new method for estimating synonymous and nonsynonymous rates of nucleotide substitution considering the relative likelihood of nucleotide and codon changes. Mol Biol Evol. 1985, 2 (2): 150-174.PubMedGoogle Scholar
- Charif D, Thioulouse J, Lobry JR, Perriere G: Online synonymous codon usage analyses with the ade4 and seqinR packages. Bioinformatics. 2005, 21 (4): 545-547. 10.1093/bioinformatics/bti037.PubMedView ArticleGoogle Scholar
- Tamura K, Stecher G, Peterson D, Filipski A, Kumar S: MEGA6: Molecular Evolutionary Genetics Analysis Version 6.0. Mol Biol Evol. 2013, 30 (12): 2725-2729. 10.1093/molbev/mst197.PubMed CentralPubMedView ArticleGoogle Scholar
- Paradis E, Claude J, Strimmer K: APE: analyses of phylogenetics and evolution in R language. Bioinformatics. 2004, 20 (2): 289-290. 10.1093/bioinformatics/btg412.PubMedView ArticleGoogle Scholar
- Tamura K, Nei M: Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol Biol Evol. 1993, 10 (3): 512-526.PubMedGoogle Scholar
- Akaike H: A New Look at the Statistical Model Identification. IEEE Trans Auto Contrl. 1974, AC-19 (6): 716-723.View ArticleGoogle Scholar
- Langille MG, Brinkman FS: IslandViewer: an integrated interface for computational identification and visualization of genomic islands. Bioinformatics (Oxford, England). 2009, 25 (5): 664-665. 10.1093/bioinformatics/btp030.View ArticleGoogle Scholar
- Langille MG, Hsiao WW, Brinkman FS: Detecting genomic islands using bioinformatics approaches. Nat Rev Microbiol. 2010, 8 (5): 373-382. 10.1038/nrmicro2350.PubMedView ArticleGoogle Scholar
- Waack S, Keller O, Asper R, Brodag T, Damm C, Fricke WF, Surovcik K, Meinicke P, Merkl R: Score-based prediction of genomic islands in prokaryotic genomes using hidden Markov models. BMC Bioinformatics. 2006, 7: 142-10.1186/1471-2105-7-142.PubMed CentralPubMedView ArticleGoogle Scholar
- Yohai VJ, Stahel WA, Zamar RH: A procedure for robust estimation and inference in linear regression. Directions in robust statistics and diagnostics. 1991, New York: Springer, 365-374.View ArticleGoogle Scholar
- Wood SN: Generalized Additive Models: An Introduction with R. 2006, Boca Raton: Chapman & Hall/CRC PressGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.