Comparisons of infant Escherichia coli isolates link genomic profiles with adaptation to the ecological niche
© De Muinck et al.; licensee BioMed Central Ltd. 2013
Received: 2 November 2012
Accepted: 30 January 2013
Published: 5 February 2013
Despite being one of the most intensely studied model organisms, many questions still remain about the evolutionary biology and ecology of Escherichia coli. An important step toward achieving a more complete understanding of E.coli biology entails elucidating relationships between gene content and adaptation to the ecological niche.
Here, we present genome comparisons of 16 E.coli strains that represent commensals and pathogens isolated from infants during a specific time period in Trondheim, Norway. Using differential gene content, we characterized enrichment profiles of the collection of strains relating to phylogeny, early vs. late colonization, pathogenicity and growth rate. We found clear gene content distinctions relating to the various grouping criteria. We also found that different categories of strains use different genetic elements for similar biological processes. The sequenced genomes included two pairs of strains where each pair was isolated from the same infant at different time points. One pair, in which the strains were isolated four months apart, showed maintenance of an early colonizer genome profile but also gene content and codon usage changes toward the late colonizer profile. Lastly, we placed our sequenced isolates into a broader genomic context by comparing them with 25 published E.coli genomes that represent a variety of pathotypes and commensal strains. This analysis demonstrated the importance of geography in shaping strain level gene content profiles.
Our results indicate a general pattern where alternative genetic pathways lead toward a consistent ecological role for E.coli as a species. Within this framework however, we saw selection shaping the coding repertoire of E.coli strains toward distinct ecotypes with different phenotypic properties.
KeywordsEscherichia coli Comparative genomics Infant gut Commensal Pathogen Generation time Codon usage bias
Awareness of the importance of the gut microbial colonization for human health is growing as numerous links with a multitude of diseases are being discovered . Recent advances in sequencing technology have generated massive amounts of data, but much remains to be understood about the processes important for maintaining a healthy community structure. E.coli, as well as being a much studied model organism, is an important and ubiquitous member of the human gut microbial community. Although E.coli constitutes only a small fraction of the total gastrointestinal microbiota, it has a wide spectrum of potential interactions with the human host, ranging from probiotic to commensal and on to pathogenic .
As one of the most intensely studied organisms, much genomic information on this species has already been collected. Genbank has cataloged 60 complete chromosomal genomes and 346 draft genomes (at the time of writing). However, less sequencing effort has been directed toward truly commensal E.coli strains, relative to pathogenic isolates and derivatives of popular laboratory strains. Previous comparative analysis of the genome sequences of 61 isolates has helped develop a new view of the E.coli genetic landscape which highlights diversity at the genome level . A typical E.coli strain carries between 4,000 and 5,500 genes. On average, an E.coli strain will share about 40% of these with all other members of the species, while the remainder forms part of the pan-genome [4, 5]. Following these approaches, differential gene content between strains is thought to subdivide E.coli into ecological classes that may be more biologically informative than traditional phylogenetic categorization based for example on Multi-Locus Sequence Typing (MLST). Use of full genomes and subsequent gene-content profiling has thus become important for understanding the role of genome contents for defining a realized ecological niche .
This work is a continuation of a deep characterization of E.coli strains isolated from a cohort of infants and their mothers in Trondheim, Norway. The original study design was a nested case–control format created to examine the impact of whole gut microbial colonization on the development of atopic disease [7, 8]. In this original characterization of the cohort, qPCR was used to identify and quantify the microbial fecal composition of several classes of bacteria and these data were matched with cytokine profile development. From this, it was observed that early E.coli colonization was linked to protection from atopy and the mother was found to be a likely source of infant colonization . We have previously characterized the E.coli colonization pattern of a sub-cohort of this larger study, 85 infants and their mothers, and found limits on the diversity of strains and further evidence of transmission from the mothers to the infants . This same study then placed these strains into a phylogenetic context of the wider E.coli diversity.
Here, we built upon these earlier observations using whole genome sequencing. We compare the genomic content of strains with different phylogenetic, pathogenic vs. commensal, growth rate and early vs. late colonization characteristics in order to determine enrichment profiles that may explain these ecological traits. The signatures that were observed can be used for further investigations into genotype-phenotype mapping within the context of ecological adaptation, and for investigating the role of the many hypothetical proteins that we found differentiating the groups. The collection of strains that were used for this analysis offer insight into a temporally and geographically coherent population of gut colonizing E.coli, with additional context afforded by our previous characterizations of these strains . Methodological challenges that were addressed included developing a strategy for compensating for incomplete assembly, small sequencing errors, and potential loss of genetic information derived from genomes sequenced by 454 single-end shotgun sequencing. Dealing with incompletely assembled draft genomes, as we have done, may become less problematic for single isolate analysis as assembly algorithms and sequencing technologies progress. However, costs may hinder coverage for large collections of isolates and also for complex samples such as the soil or mammalian gut, which at minimum contains several hundred genomes [11, 12].
List of strains used in this study with corresponding genome information
ID (alt. ID)
Age at Sampling
Accession Accession Number
Phylogenetic and gene content comparisons
Criteria used for gene enrichment analyses
Focal group (nr. strains)
Gene presence in focal group
Gene absence in non-focal group
Criteria I, cladistic comparison
Criteria II, pathogen/commensal comparison
Criteria III, growth rate comparison
Criteria IV, colonization time comparison
Criteria V, pathogen/commensal comparison
Criteria VI, pathogen/commensal comparison
Pathogen and commensal comparisons
A noticeable contributor to the pathogen enrichment was the pathogenicity island carrying the type III secretion system (T3SS) and several effecter molecules associated with it. BLASTing the large contig sequences generated from the Newbler assemblies against a complete enterocyte effacement pathogenicity island (LEE) (35,624 bp)  revealed significant identity for many of the strains (Additional file 1: Table S1). All four pathogenic strains as well as three commensal isolates contained the pathogenicity island. Categories that were enriched in the pathogenic grouping relative to the commensal grouping included both nitrogen and primary metabolic processing (Figure 5). Not surprisingly, since the pathogenic strains were initially selected based on the presence of the intimin-encoding eae gene, and therefore belonged to pathotype enteropathogenic E.coli (EPEC) (Table 1), intimin enrichment was also observed in the pathogenic grouping.
Growth rate comparisons
Early vs. late colonizer comparisons
Codon usage bias and generation times
Codon usage bias in highly expressed genes has been found to be a strong predictor of maximal growth rate in prokaryotes . In order to investigate this relationship in our data we looked at correlations between our effective number of codons (ENC) estimates and growth rates under aerobic and anaerobic conditions. Mean genome wide ENC for the 10 EDM strains was 49.044 ± 0.182, while mean ENC for ribosomal protein genes was 35.790 ± 0.052. Mean generation times were 40.3 ± 1.2 min. and 52.7 ± 3.0 min. under aerobic and anaerobic conditions, respectively. We first looked at the relationship between whole genome ENC and growth rate. We found a positive correlation (two-sided Spearman correlation ρ = 0.71, p-value = 0.03) with anaerobic growth rate (Additional file 1: Figure S6A), but no significant relationship with aerobic growth rate (ρ = 0.05, p-value = 0.89). This result indicated that faster growing isolates tend to have more pronounced overall codon bias. As expected the within species variation in ENC for ribosomal protein genes was minimal and no relationship could be found between this index and growth rates. Also due to a lack of variation in ribosomal protein ENC the relationships between ΔENC and growth rates were essentially the same as the genome wide correlations (ρ = 0.72, p-value = 0.02 and ρ = 0.26, p-value = 0.47 for anaerobic and aerobic generation times, respectively) (Additional file 1: Figure S6B).
Strain evolution in the infant gut
Two pairs of strains from two infants (child 1891 and 1360, Table 1) were isolated at two different time points and had matching MLST profiles. The strains were isolated at four and eleven days of age (EDM49c and EDM101c) and at ten days and four months (EDM1c and EDM123c) of age respectively. The isolates from the same child had almost identical gene contents (Figure 2) and were subjected to closer scrutiny in order to shed light upon the selective pressures in a novel infant gut environment. The earlier isolate in each pair was thus defined as “the parent strain” and the later as “the evolved strain”.
Both gene content and codon usage indicated strain evolution in the infant gut from an early colonizer to late colonizer phenotype. From the EDM49c and EDM101c, only three genes, possibly phage related, were found in the parent strain but not in the evolved strain, and no genes were unique to the evolved strain relative to the parent strain. From EDM1c and EDM123c, 16 genes were found in the parent strain that were not in the evolved strain and 13 genes were found in the evolved strain not in the parent strain (Additional file 11 and Additional file 12). Interestingly, three of the genes unique to the parent strain were also called in the early colonization enrichment list whereas none of the genes unique to the parent strain were found in the list of genes from the late colonization enrichment. The three genes that were matched to the genes in the early enrichment list were GO categorized as a type-f conjugative transfer system pilin chaperone, hypothetical protein c4302 [uropathogenic, E.coli CFT073, NC_004431.1] and a tellurite resistance protein with transposon elements encoded nearby. Other genes unique to the evolved strain relative to the parent strain included a mercury resistance operon that has evidence of being carried on the transposon Tn21. Genome wide ENC and ΔENC comparison of EDM1c and EDM123c found reduced codon bias in the evolved strain (Additional file 1: Figure S6A and B).
Relationship with other E.coli strains
List of publically available genomes used in gene content comparisons
We then conducted a gene content enrichment analysis in order to ascertain if there are general profiles that can distinguish pathogenic from commensal E.coli. Using the enrichment criteria listed in Table 2 (criteria V), we identified only 17 genes that were enriched in the commensals and none in the pathogens. 16 of the 17 commensal enriched gene families were short (<250 bp) sequences encoding hypothetical proteins and all were preferentially found in the strains sequenced in this study.
Genome analysis methods
The methodological challenges we addressed in order to generate the genotype-phenotype profiles presented in this work require some discussion. The 454 pyrosequencing single-end shotgun data presented difficulties that would, in several cases, not have been ameliorated by increasing the sequencing coverage (Additional file 1: Figure S1D). This is partly due to the intrinsic variability of E.coli genomic content, which made it impossible to rely on reference-based assembly and necessitated the use of de novo assembly methods, and also because of the relatively error prone nature of the technology. Alternative sequencing technologies or laborious and costly paired-end/mate-pair DNA sample preparation would have been required to reduce the number of contigs. However, the single-end shotgun approach offers a number of advantages due to its simplicity and lower cost compared with paired-end library preparation . Furthermore, even though improvements in sequencing technologies will help genome assembly of bacterial isolates due to increased read length, sequencing of complex mixtures of bacteria such as gut or soil communities will continue to face some of the same challenges that we have addressed. The additional post-annotation search step employed in this study appears to have alleviated some of the biases introduced by an imperfect assembly (Additional file 1: Figure S1).
Pathogens vs. commensals
The factors that distinguish a pathogenic from a commensal E.coli remain contentious. Previous studies have failed to come up with pathotype specific genomic cores for strains classified as enteropathogenic or enterotoxigenic E.coli (EPEC and ETEC, respectively) [22, 23], but there have been studies reporting specific gene content profiles in extraintestinal pathogenic E.coli (ExPEC) [24, 35]. However, recent work indicates that many of these genes are primarily associated with gut colonization and that virulence is an incidental by-product of commensalism [36, 37]. In our case, using strict 100% presence/absence as an enrichment criterion failed to detect genes that separated commensals and pathogens (all four pathogenic strains were EPEC).
Relaxing the criteria resulted in a significant set of 164 genes that were preferentially found in the pathogenic group, but there was substantial gene overlap with commensal strains (Figure 7). The 33 genes enriched in the commensal group may represent a small part of the wide variety of genes necessary to be a successful colonizer. However, the weak commensal signature, compared with the pathogenic one, suggests that the term commensal may not be a meaningful descriptor in a phenotypic or evolutionary context as our analyses identified ‘pathogen-like’ commensals (e.g. commensal isolate EDM16c is closer to the pathogenic isolates when it comes to functional genetic profile than it is to the other commensals (Figures 3 & 7)) which may suggest a virulence potential of certain commensal strains. This is especially highlighted by the large pathogenicity island carrying the TTSS which was shared by all the pathogenic strains and a subset of the commensals (Additional file 1: Table S1). Recent work has shown that this system is important for bacterial competition in the gut in addition to its role in host interactions . If virulence is indeed an accidental by-product of adaptation to the gut environment it would explain why it is hard to find a non-clinical distinction between pathogenic and commensal strains, as virulence may rather be a matter of context and opportunism . Genomic signatures may nevertheless identify strains that have greater capacity to make the transition from commensalism to virulence, and could thus aid in designing preventive strategies.
Minimal generation time
Growth rate is a phenotype with quintessentially complex genetic underpinnings, and can hardly be ascribed to specific genes or alleles. Insight into the mechanisms underlying growth rate differences is highly desirable as it is related to other phenotypes of fundamental importance, such as virulence . Minimal generation time in a study comparing 214 bacterial and archaeal species was found to correlate with genomic features such rRNA and tRNA copy number and codon usage bias . However, minimal generation times were found to vary considerably within the E.coli isolates in our collection, even though these particular features were similar among our isolates.
We could not find any significant correlation between generation time and rRNA and tRNA copy number (results not shown), and codon usage bias was also found to be a poor predictor of aerobic generation time. Surprisingly, it correlated strongly with anaerobic generation time.
In contrast to the study by Vieira-Silva, we found a positive correlation between generation time and codon usage bias in highly expressed genes (ΔENC). This result is not necessarily in conflict with previous findings, as it may be explained by the fact that we were looking at strain level rather than species level relationships. Specifically, in contrast to the previous work covering many diverse species, the ribosomal protein genes were extremely conserved and the spread of ENC values for this set of sequences was less than a third of what was observed for genome wide ENC. Whole genome bias dominated our analysis and gave rise to the interpretation that a narrower general codon usage profile is associated with shorter anaerobic generation times. It is noteworthy that this relationship did not hold for aerobic growth. At face value it may seem paradoxical that codon usage specialization should be more important under anaerobic conditions when translation efficiency is presumably less of a limiting factor than under intrinsically faster aerobic growth. One explanation for this could be that gut adapted E.coli are primarily selected for anaerobic growth properties as the gut community matures and that aerobic growth leaves comparatively little systemic imprint on their genomes. Even though we found a significant correlation between aerobic and anaerobic generation time (R2 = 0.41, p < 0.001), we found and even stronger correlation between anaerobic generation time and anaerobic to aerobic generation time ratio (R2 = 0.51, p < 0.0001), suggesting that slow anaerobic growth entails disproportionally fast aerobic growth, and that the genomic bases for these two modes of growth might, at least in part, be uncoupled. This interpretation is supported by the fact that codon usage bias correlated with anaerobic but not aerobic growth rates. It would be interesting to compare these results with environmentally adapted E.coli isolates [6, 41] and discern if the genomic imprint of aerobic growth might be more visible.
Gene content analysis represents an entirely different approach to investigating the genomic basis of differential generation time, and one that would not be appropriate for inter-species comparisons. The fact that our E.coli isolates are closely related, as witnessed by the shared genomic core, yet display relatively high variation in generation time begs the question of whether there are signatures of coding potential that relate to this phenotypic diversity. To our knowledge, the results presented here are the first attempt at correlating growth rate phenotype with differential gene content. Even though the small sample sizes warrant some caution in interpreting the results, the gene profiles of the fast and slow growing groups are quite unlikely to have arisen by chance. It is also noteworthy, albeit perhaps not surprising, that the intermediate group failed to produce a significant enrichment profile and that differences are only visible when comparing the extremes.
Relative enrichment in the slow group (324 genes) compared to the fast growing group (227 genes) found that many of the same GO categories were enriched but the slow growing group had a greater enrichment in several metabolic processes, including nitrogen, macromolecular, and several genes important for iron uptake and utilization (Figure 5). In contrast, the fast growers had a larger relative enrichment for genes involved in response to chemical stimuli and cell wall organization. Perhaps, this represents an ability to quickly adapt to changes in the environment. The fact that we observed relatively clear gene content signatures in both the fast and slow groups may reflect an evolutionary trade-off between short minimal generation time and scavenging potential. Copiotrophic, fast growing bacteria tend to have low affinity transporters typically representing an adaptation towards “feast” conditions, resulting in reduced competitiveness during nutrient starvation . Slow growers, on the other hand, tend to have high affinity transporters, making them competitive in low nutrient environments, while at the same time making them susceptible to saturation or toxic effects when resources are plentiful . This interpretation is further supported by the enhanced presence of scavenging-associated genes in our slow growing isolates.
Early and late colonization
The infant gut environment is temporally dynamic in terms of reduction potential, nutrient availability, immune function and the structure of the resident microbial community [44, 45]. The infant gut microbiome has been found to undergo a smooth increase in phylogenetic diversity over the first few years, while broad scale taxonomic patterns are characterized by abrupt events, eventually conforming to a mature profile . The same study found concomitant changes in metagenomic content indicating that the community as a whole is responding to a changing environment. Selection pressures faced by members of the gut microbiota may therefore differ widely between the earlier and later stages of infancy. This pressure is reflected in the reduced relative abundance of E.coli in the mature microbiota relative to the infant gut community [45, 47] and suggests that strains present at different stages of development could differ widely in their characteristics. Dramatic changes in the gut microbiota of pregnant women have also been shown from the first to third trimester, resulting in increased abundances of Proteobacteria and Actinobacteria and reduced taxonomic richness , a community state more reminiscent of the infant gut structure. The mother may somehow prime the gut microbiota with a qualitatively different environment in preparation for transfer to the infant.
Both early and late colonizers had significant differential gene content profiles (178 and 238 gene families respectively). We found that early colonizers were enriched for type IV secretion system and fimbrial genes that are important for attachment and interaction with the host. This group also had an increased presence of colicin resistance genes, which may reflect the importance of competition with bacteria of the same or closely related species is in the low diversity conditions of the early gut environment. Furthermore, we found an increased number of genes involved in biosynthetic processes in the early colonizer group. This could also be an adaptation to low diversity conditions where production of secondary metabolites and secreted growth factors is potentially limited. The late colonizers were enriched for resistance to toxins such as arsenate and cyanate. This could indicate the importance of these pathways for survival in the complex ecosystem of the mature gut.
Evolution towards a late colonizer genomic profile
There is ample evidence that, given some selective regime, microbial evolution in the laboratory can be exceedingly rapid . A few studies have documented the evolution of pathogenic bacteria in infected individuals [50, 51] but reports of real-time evolution in natural environments remain scarce, and to our knowledge there are no such studies focusing on bacteria of the human gut. Isolate EDM123c was categorized as a late colonizer due to the fact that it was isolated from an infant at four months of age. EDM123c is by all probability clonally descended from EDM1c which had colonized that same infant already at 10 days after birth. Since this strain had spent nearly four months in the infant gut during an environmental transition period, we hypothesized that selection would push it toward a late colonizer genomic profile. There are two lines of evidence to suggest that this is the case. First, three of the genes that were present in the ancestral strain but lost from the evolved version matched genes in the early enrichment list. This list included a tellurite resistance protein which has been linked to resisting host defense [52, 53]. Further experimentation is necessary to fully characterize the effect of these particular genes on early colonizing ability and possible reasons for negative selection in a more mature microbiota. Secondly, we observed an increased anaerobic generation time from isolate EDM1c (52.6 ± 0.4 min.) to EDM123c (55.8 ± 1.1 min.). Interestingly, EDM123c also had and elevated genome wide ENC (and thus also ΔENC) (Additional file 1: Figure S6) relative to the parent strain. This indicates that from the parent to the evolved strain there has been selection for synonymous mutations pushing the strain toward reduced codon usage bias. Reduced codon bias and growth rate have previously been associated with late gut colonization , indicating that isolate EDM123c has in fact evolved toward a late colonizer profile.
Given the close relatedness between EDM1c and EDM123C, as witnessed by both sequence similarity and gene content (Figure 3), there can be little doubt that these isolates are clonally related, and genomic differences are probably due to evolution taking place in the gut. Indeed the other pair of parental (EDM49c) and evolved (EDM101c) strains displayed practically no divergence in gene content or codon usage bias, probably due to the fact that they were isolated only 7 days apart. We cannot discount the possibility that clonally related strains were introduced, outcompeted and then re-introduced at a later time. In this case at least part of any evolution taken place would have done so in a different environment. In the case of EDM123c, however, we feel that this is an unlikely scenario since adaptation took the direction predicted if the isolate had evolved in a maturing infant gut.
Cross category enrichment comparisons
Even though the different enrichment comparisons were fruitful for understanding functional categories, using this information across the different comparisons gave a better and more nuanced view. The main clade comparisons are very informative as they link a strain's evolutionary history to a measure of functional differentiation which can help define its ecological niche. For example, all early colonizers except EDM16c (which had an atypical gene content profile for an early colonizer) belong to clade2. The late colonizers all belong to clade1 except EDM123c, which is the evolved EDM1c and thus an atypical late colonizer. Thus there appears to be a phylogenetic split defining these ecological categories, and this split is reinforced by disparate gene content. Also, three of four pathogens group to clade1. Furthermore, and in contrast with the core genome phylogeny, the pan-genome phylogeny places the commensal strain EDM116c within the same subclade as these three pathogens (Figure 3). One could speculate that although EDM116c is an ostensibly asymptomatic isolate, its genetic makeup is such that given the right circumstances it may cause symptoms similar to known EPEC strains. The pathogenic isolate JEA242p, on the other hand, is placed within the otherwise exclusively commensal clade 2, demonstrating that virulence can emerge from quite different genomic backgrounds.
One of the two isolates classified as fast growing in this sample set of genome sequenced strains was a late colonizer (thus belonging to clade1) while early colonizers in this set tended to be slow growing (within clade2), but with disproportionally short aerobic relative to anaerobic generation times (Figure 8). This trend is not consistent with a previous study , but the disagreement is most likely attributable to sampling bias. Nevertheless some interesting associations emerged when making cross-grouping comparisons. Comparing the similarities in the gene content enrichments between all groupings found that the combined clade1-late-fast and clade2-early-slow designations shared the most (57 and 49 respectively; Additional file 14 and Additional file 15) (Additional file 1: Figure S7). Unique phosphotransferase systems (PTS) were enriched in each cross category grouping which are thought to enhance sugar utilization in general and possible bacterial uptake of sugars from breast milk . A similar general differential gene content profile was seen between the same combined groups in glycosyl transferases and glycosyl hydrolase genes which are important for obtaining nutrients from the host and correct “assembly of a microbiota” . The combined clade2-early-slow group further encoded arylsulfate sulfotransferase, which has been claimed to play a role in the detoxification of phenolic compounds . On the other hand, a gamma aminobutyrate utilization gene was enriched in the combined clade1-late-fast group. This polyamine utilization gene has roles in proliferation under stressful conditions and utilization of alternative sources of carbon and nitrogen, which could be an adaptation to the difficult conditions of a mature gut microbiota [57, 58]. Lastly, the clade1-late-fast group showed enrichment for the hydrogenase-4 operon, which is important in anaerobic growth . These cross-category comparisons provide a tentative link between the evolutionary history and functional phenotypes of our isolates where the two main branches of the core and pan-genome phylogenies may represent adaptive paths leading toward distinctive ecological properties.
Relationship with other E.coli strains
Expansion of the analysis to include published genome sequences reduced the core genome to 40.4%. This is rather high relative to previously reported values, but [3–5] considering the relatively close relatedness of the IMPACT strains (Figure 11) and the high degree of shared gene content (52.4%, Figure 1) among them, our result may not be so surprising. Methodological differences, e.g. cutoff values used for BLAST matching, will also have an impact on estimated core genome sizes.
The clustering of the IMPACT strains in the pan-genome tree constructed from all 41 genome sequences (Tables 1 and 3) illustrates the significance of geographical proximity in structuring of gene content profiles within the E.coli species. It is also evident from the pan-genome tree that pathogens and commensals are interspersed, and our general enrichment analysis confirmed that there were no significant gene sets discriminating pathogenic from commensal strains. This result highlights the genomic diversity within pathogenic E.coli, and that there are many different evolutionary pathways to pathogenicity. However, certain clades did seem to be dominated by either pathogens or commensals, but it is difficult to determine if this is due to sampling bias since most isolates in the commensal group are from the IMPACT study. Of note, the only IMPACT strain to group outside the two main IMPACT clades was JEA297p, a strain that stood out as having a low gene density relative to the other IMPACT isolates (Additional file 1: Figure S1F). This strain was located on a deep branch in separate clade shared only with the E.coli O157:H7 Sakai strain.
The lack of gene family enrichment observed in the general comparison between all 41 genomes is contrasted by the retention of an enrichment profile when the analysis is restricted to a single pathotype (Figure 12). The significant gene set that was retained in the enrichment analysis emphasized the importance of the LEE pathogenicity island components in defining EPEC strains (Additional file 13).
This study addresses the role of gene repertoire in bacterial niche ecology, including the genomic bases of phenotypes that are not directly linked with pathogenicity. This aspect of E.coli ecology has not been thoroughly explored, but may shed light on the evolutionary history of the species . The relatively small sample size and need for further molecular work precludes definitive conclusions regarding relationships between the array of genetic pathways and specific phenotypes. However, our results indicate a general pattern where alternative genetic pathways lead toward a consistent ecological role for E.coli as a species (Figure 5). Within this framework however, we saw selection shaping the coding repertoire of E.coli strains toward distinct ecotypes with different phenotypic properties. Additionally, the profiles we present should lead to further investigation and may lend insight into the biological roles of genes whose previously assigned biological function is incomplete and also for the large number of hypothetical proteins that were outlined using this method.
In contrast to previous studies of E.coli eco-genomics [3, 5, 22, 60] our isolates come from a population that is narrowly localized both temporally and geographically. This could entail reduced genetic diversity due to shared ancestry and increased exchange of genes through horizontal transfer (HGT) between strains. Although the present study was not in particular concerned with HGT we did see a substantially higher percentage of shared gene content (52.4%) than what has previously been reported, as well as a smaller pan-genome, indicating that homogenizing forces are increasingly effecting genomic diversity on a local scale. This interpretation is also supported by the comparison of the isoloates sequenced in this study with publicly available E.coli genomes from a variety of sources (Figure 11), which showed that our strains were clearly distinguished from the others in terms of gene content. Nevertheless there were several instances where relatively clear gene enrichment profiles could be linked to specific phenotypes and ecological characters. Due to the disparate nature of E.coli genomes identification of such gene suites might be impeded if similar phenotypes can arise through different mechanisms and evolutionary histories, as is the case with clinical phenotypes of many pathogenic E.coli. A more homogenous genomic background, as seen in this work, could make it easier to tease out gene content signatures that are ecologically relevant.
Materials and methods
Strains and culture conditions
The bacterial strains used in this study have been previously described in [10, 14] and (de Muinck et al., manuscript submitted) (Table 1). 10 of these strains originated from samples collected as part of the IMPACT study . Six strains were selected for genome sequencing from  because they were eae-positive and represented the previously reported diversity of phylogenetic groups. Two of these strains were from healthy children while four were isolated from children with diarrhea and these isolates were further classified as enteropathogenic E.coli (EPEC). A further ten strains were selected from , all of which were isolated from healthy children. All strains were grown to saturation in LB media and DNA extraction was performed using the DNeasy kit from Qiagen.
Genome sequencing and annotation
DNA was single-end shotgun sequenced using Roche 454 GS (FLX Titanium) pyrosequencing. Sequences have been deposited in the EMBL-EBI Sequence Read Archive. Accession numbers are listed in Table 1. De novo assembly was performed using Roche's program Newbler v2.3 (performed at the freely available Bioportal computing service, http://www.bioportal.uio.no). Annotation of all genomes, including those downloaded from Genbank (Table 3), was done using RAST version 4.0 . The RAST annotated genes of each of the genomes were BLASTed  against all the other annotated genomes using criteria of 85% identity and an e-value of less than 1x10^-25 to signify a gene match. Due to the large number of contigs, determination of gene presence included additional processing steps to recover genes split into separate contigs or genes that were not included in the annotation. Briefly, we used the complete set of annotated genes from all of the genomes as a reference pool. If a gene in the reference pool was not found in all of the analyzed genomes, the longest copy of the gene was re-BLASTed against the Newbler assemblies of each of the genomes in which the gene was initially not found. This gene was then added to the annotation of a genome if a partial hit was found that was at least 90% identical and an e-value of less than 1x10^-25. Genes were grouped as a family if they matched with the BLAST criteria just mentioned, or if they received identical functional annotations from RAST.
Core genome phylogeny and pan-genome tree
A multiple alignment of the de novo genome assemblies was performed using progressiveMauve version 2.3.0 . The regions shared by all genomes were then extracted and used to generate a phylogenetic tree using ClonalFrame version 1.2 . In addition to this phylogeny based on the core genome, we constructed a tree based on the pan-genome as follows: a gene content matrix consisting of 1 s and 0 s was constructed where the columns correspond to the different strains and the rows to different gene families. An entry of 1 means presence of a gene family in a given strain, whereas a 0 means absence. This matrix was used for calculation of Manhattan distances between strains, which were then used for hierarchical clustering in order to construct the pan genome tree. These computations were done using R .
Gene family enrichment analysis
Enrichment for gene families was found using the gene content matrix described above, combined with previous knowledge of the isolates. Isolates were grouped according to phenotypic or phylogenetic criteria and then gene families overrepresented in one group relative to others were counted in the matrix. Group sizes and cutoff values used to define overrepresentation are shown in Table 2. Results were plotted as heat maps in R. To assess the statistical significance of these results, we designed a permutation test in which we used the same group sizes as above but assigned group membership randomly according to a combinatorial scheme. This procedure produces the numerical distribution of gene family enrichments for all possible combinations of group members given some fixed set of group sizes and enrichment criteria, with which our results could be compared. This procedure provides an indication of whether our results could arise from random associations, although the limited strain sample means that subtle associations may go undetected. P-values for our focal enrichments were derived from the computed distributions as the empirical probability of observing an enrichment of equivalent or higher rank. Genes enriched in each of the groups and cross category comparisons are listed in Additional file 2, Additional file 3, Additional file 4, Additional file 5, Additional file 6, Additional file 7, Additional file 8, Additional file 9 and Additional file 10,Additional file 14 Additional file 15 and Additional file 13.
Multiple correspondence analysis
Multiple correspondence analysis was carried out as described by Nenadic and Greenacre  using singular value decomposition of the scaled gene content indicator matrix.
Enrichment for biological processes and re-annotation of enriched genes
The lists of genes generated by the gene family enrichment analysis and found to be overrepresented within each of the categories were used to generate the biological process scores using Blast2GO (http://www.Blast2GO.com) . This software annotates coding sequences and assigns them to gene ontology (GO) categories. Blast2GO gene annotations of enriched and unique gene sets can be found in Additional file 11, Additional file 12, Additional file 16, Additional file 17, Additional file 18, Additional file 19, Additional file 20, Additional file 21, Additional file 22, Additional file 23 and Additional file 24.
Codon usage bias analysis
Genome wide codon usage tables were computed from the annotated coding sequences for each strain. Codon usage for highly expressed genes was computed from the 54 ribosomal protein gene sequences extracted from the annotation of each EDM strain. Effective number of codons (ENC) was computed according to the method of Wright . This provides a metric for the evenness of codon usage with smaller values indicating a bias toward more specialized codon usage while higher values signify more uniform usage. The index of bias in highly expressed genes, ΔENC, was computed as the scaled difference between genome wide ENC and highly expressed gene (ribosomal protein gene) ENC . We did not apply correction for differential G + C content in our ENC calculations as this did not vary significantly across genomes.
Basic local alignment search tool
Effective number of codons
Horizontal gene transfer
Multi-Locus Sequence Typing
Rapid annotations using subsystems technology.
The Norwegian Foundation for Health and Rehabilitation, The Norwegian Diabetes Association, Centre for Ecological and Evolutionary Synthesis (CEES), The Liaison Committee between the Central Norway Regional Health Authority (RHA) and the Norwegian University of Science and Technology (NTNU) (Project 46023900) for funding this research. We acknowledge Liselotte Buarø for technical assistance and Lex Nederbragt for useful discussions. The sequencing service was provided by the Norwegian Sequencing Centre (http://www.sequencing.uio.no), a national technology platform hosted by the University of Oslo and supported by the "Functional Genomics" and "Infrastructure" programs of the Research Council of Norway and the Southeastern Regional Health Authorities.
- Cho I, Blaser MJ: The human microbiome: at the interface of health and disease. Nat Rev Genet. 2012, 13: 260-270.PubMed CentralPubMed
- Tenaillon O, Skurnik D, Picard B, Denamur E: The population genetics of commensal Escherichia coli. Nat Rev Microbiol. 2010, 8: 207-217. 10.1038/nrmicro2298.View ArticlePubMed
- Lukjancenko O, Wassenaar TM, Ussery DW: Comparison of 61 sequenced Escherichia coli genomes. Microb Ecol. 2010, 60: 708-720. 10.1007/s00248-010-9717-3.PubMed CentralView ArticlePubMed
- Willenbrock H, Hallin PF, Wassenaar TM, Ussery DW: Characterization of probiotic Escherichia coli isolates with a novel pan-genome microarray. Genome Biol. 2007, 8: R267-10.1186/gb-2007-8-12-r267.PubMed CentralView ArticlePubMed
- Touchon M, Hoede C, Tenaillon O, Barbe V, Baeriswyl S, Bidet P, Bingen E, Bonacorsi S, Bouchier C, Bouvet O: Organised genome dynamics in the Escherichia coli species results in highly diverse adaptive paths. PLoS Genet. 2009, 5: e1000344-10.1371/journal.pgen.1000344.PubMed CentralView ArticlePubMed
- Luo C, Walk ST, Gordon DM, Feldgarden M, Tiedje JM, Konstantinidis KT: Genome sequencing of environmental Escherichia coli expands understanding of the ecology and speciation of the model bacterial species. Proc Natl Acad Sci U S A. 2011, 108: 7200-7205. 10.1073/pnas.1015622108.PubMed CentralView ArticlePubMed
- Storro O, Oien T, Dotterud CK, Jenssen JA, Johnsen R: A primary health-care intervention on pre- and postnatal risk factor behavior to prevent childhood allergy. The Prevention of Allergy among Children in Trondheim (PACT) study. BMC Public Health. 2010, 10: 443-10.1186/1471-2458-10-443.PubMed CentralView ArticlePubMed
- Storro O, Oien T, Langsrud O, Rudi K, Dotterud C, Johnsen R: Temporal variations in early gut microbial colonization are associated with allergen-specific immunoglobulin E but not atopic eczema at 2 years of age. Clin Exp Allergy. 2011, 41: 1545-1554. 10.1111/j.1365-2222.2011.03817.x.View ArticlePubMed
- Rudi K, Storro O, Oien T, Johnsen R: Modelling bacterial transmission in human allergen-specific IgE sensitization. Lett Appl Microbiol. 2012, 54: 447-454. 10.1111/j.1472-765X.2012.03229.x.View ArticlePubMed
- de Muinck EJ, Oien T, Storro O, Johnsen R, Stenseth NC, Ronningen KS, Rudi K: Diversity, transmission and persistence of Escherichia coli in a cohort of mothers and their infants. Environmental Microbiology Reports. 2011, 3: 352-359. 10.1111/j.1758-2229.2010.00231.x.View ArticlePubMed
- Lozupone CA, Stombaugh JI, Gordon JI, Jansson JK, Knight R: Diversity, stability and resilience of the human gut microbiota. Nature. 2012, 489: 220-230. 10.1038/nature11550.PubMed CentralView ArticlePubMed
- Delmont TO, Robe P, Cecillon S, Clark IM, Constancias F, Simonet P, Hirsch PR, Vogel TM: Accessing the soil metagenome for studies of microbial diversity. Appl Environ Microbiol. 2011, 77: 1315-1324. 10.1128/AEM.01526-10.PubMed CentralView ArticlePubMed
- Oien T, Storro O, Johnsen R: Intestinal microbiota and its effect on the immune system–a nested case-cohort study on prevention of atopy among small children in Trondheim: the IMPACT study. Contemp Clin Trials. 2006, 27: 389-395. 10.1016/j.cct.2006.02.009.View ArticlePubMed
- Afset JE, Anderssen E, Bruant G, Harel J, Wieler L, Bergh K: Phylogenetic backgrounds and virulence profiles of atypical enteropathogenic Escherichia coli strains from a case–control study using multilocus sequence typing and DNA microarray analysis. J Clin Microbiol. 2008, 46: 2280-2290. 10.1128/JCM.01752-07.PubMed CentralView ArticlePubMed
- Elliott SJ, Wainwright LA, McDaniel TK, Jarvis KG, Deng YK, Lai LC, McNamara BP, Donnenberg MS, Kaper JB: The complete sequence of the locus of enterocyte effacement (LEE) from enteropathogenic Escherichia coli E2348/69. Mol Microbiol. 1998, 28: 1-4.View ArticlePubMed
- Ramer SW, Schoolnik GK, Wu CY, Hwang J, Schmidt SA, Bieber D: The type IV pilus assembly complex: biogenic interactions among the bundle-forming pilus proteins of enteropathogenic Escherichia coli. J Bacteriol. 2002, 184: 3457-3465. 10.1128/JB.184.13.3457-3465.2002.PubMed CentralView ArticlePubMed
- Vieira-Silva S, Rocha EP: The systemic imprint of growth and its uses in ecological (meta)genomics. PLoS Genet. 2010, 6: e1000808-10.1371/journal.pgen.1000808.PubMed CentralView ArticlePubMed
- Blattner FR, Plunkett G, Bloch CA, Perna NT, Burland V, Riley M, Collado-Vides J, Glasner JD, Rode CK, Mayhew GF: The complete genome sequence of Escherichia coli K-12. Science. 1997, 277: 1453-1462. 10.1126/science.277.5331.1453.View ArticlePubMed
- Chaudhuri RR, Sebaihia M, Hobman JL, Webber MA, Leyton DL, Goldberg MD, Cunningham AF, Scott-Tucker A, Ferguson PR, Thomas CM: Complete genome sequence and comparative metabolic profiling of the prototypical enteroaggregative Escherichia coli strain 042. PLoS One. 2010, 5: e8801-10.1371/journal.pone.0008801.PubMed CentralView ArticlePubMed
- Hochhut B, Wilde C, Balling G, Middendorf B, Dobrindt U, Brzuszkiewicz E, Gottschalk G, Carniel E, Hacker J: Role of pathogenicity island-associated integrases in the genome plasticity of uropathogenic Escherichia coli strain 536. Mol Microbiol. 2006, 61: 584-595. 10.1111/j.1365-2958.2006.05255.x.View ArticlePubMed
- Welch RA, Burland V, Plunkett G, Redford P, Roesch P, Rasko D, Buckles EL, Liou SR, Boutin A, Hackett J: Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli. Proc Natl Acad Sci U S A. 2002, 99: 17020-17024. 10.1073/pnas.252529799.PubMed CentralView ArticlePubMed
- Rasko DA, Rosovitz MJ, Myers GS, Mongodin EF, Fricke WF, Gajer P, Crabtree J, Sebaihia M, Thomson NR, Chaudhuri R: The pangenome structure of Escherichia coli: comparative genomic analysis of E. coli commensal and pathogenic isolates. J Bacteriol. 2008, 190: 6881-6893. 10.1128/JB.00619-08.PubMed CentralView ArticlePubMed
- Crossman LC, Chaudhuri RR, Beatson SA, Wells TJ, Desvaux M, Cunningham AF, Petty NK, Mahon V, Brinkley C, Hobman JL: A commensal gone bad: complete genome sequence of the prototypical enterotoxigenic Escherichia coli strain H10407. J Bacteriol. 2010, 192: 5822-5831. 10.1128/JB.00710-10.PubMed CentralView ArticlePubMed
- Chen SL, Hung CS, Xu J, Reigstad CS, Magrini V, Sabo A, Blasiar D, Bieri T, Meyer RR, Ozersky P: Identification of genes subject to positive selection in uropathogenic strains of Escherichia coli: a comparative genomics approach. Proc Natl Acad Sci U S A. 2006, 103: 5977-5982. 10.1073/pnas.0600938103.PubMed CentralView ArticlePubMed
- Krause DO, Little AC, Dowd SE, Bernstein CN: Complete genome sequence of adherent invasive Escherichia coli UM146 isolated from Ileal Crohn's disease biopsy tissue. J Bacteriol. 2011, 193: 583-10.1128/JB.01290-10.PubMed CentralView ArticlePubMed
- Toh H, Oshima K, Toyoda A, Ogura Y, Ooka T, Sasamoto H, Park SH, Iyoda S, Kurokawa K, Morita H: Complete genome sequence of the wild-type commensal Escherichia coli strain SE15, belonging to phylogenetic group B2. J Bacteriol. 2010, 192: 1165-1166. 10.1128/JB.01543-09.PubMed CentralView ArticlePubMed
- Oshima K, Toh H, Ogura Y, Sasamoto H, Morita H, Park SH, Ooka T, Iyoda S, Taylor TD, Hayashi T: Complete genome sequence and comparative analysis of the wild-type commensal Escherichia coli strain SE11 isolated from a healthy adult. DNA Res. 2008, 15: 375-386. 10.1093/dnares/dsn026.PubMed CentralView ArticlePubMed
- Hayashi T, Makino K, Ohnishi M, Kurokawa K, Ishii K, Yokoyama K, Han CG, Ohtsubo E, Nakayama K, Murata T: Complete genome sequence of enterohemorrhagic Escherichia coli O157:H7 and genomic comparison with a laboratory strain K-12. DNA Res. 2001, 8: 11-22. 10.1093/dnares/8.1.11.View ArticlePubMed
- Iguchi A, Thomson NR, Ogura Y, Saunders D, Ooka T, Henderson IR, Harris D, Asadulghani M, Kurokawa K, Dean P: Complete genome sequence and comparative genome analysis of enteropathogenic Escherichia coli O127:H6 strain E2348/69. J Bacteriol. 2009, 191: 347-354. 10.1128/JB.01238-08.PubMed CentralView ArticlePubMed
- Ahmed SA, Awosika J, Baldwin C, Bishop-Lilly KA, Biswas B, Broomall S, Chain PS, Chertkov O, Chokoshvili O, Coyne S: Genomic comparison of Escherichia coli O104:H4 isolates from 2009 and 2011 reveals plasmid, and prophage heterogeneity, including shiga toxin encoding phage stx2. PLoS One. 2012, 7: e48228-10.1371/journal.pone.0048228.PubMed CentralView ArticlePubMed
- Nash JH, Villegas A, Kropinski AM, Aguilar-Valenzuela R, Konczy P, Mascarenhas M, Ziebell K, Torres AG, Karmali MA, Coombes BK: Genome sequence of adherent-invasive Escherichia coli and comparative genomic analysis with other E. coli pathotypes. BMC Genomics. 2010, 11: 667-10.1186/1471-2164-11-667.PubMed CentralView ArticlePubMed
- Ogura Y, Ooka T, Iguchi A, Toh H, Asadulghani M, Oshima K, Kodama T, Abe H, Nakayama K, Kurokawa K: Comparative genomics reveal the mechanism of the parallel evolution of O157 and non-O157 enterohemorrhagic Escherichia coli. Proc Natl Acad Sci U S A. 2009, 106: 17939-17944. 10.1073/pnas.0903585106.PubMed CentralView ArticlePubMed
- Lu S, Zhang X, Zhu Y, Kim KS, Yang J, Jin Q: Complete genome sequence of the neonatal-meningitis-associated Escherichia coli strain CE10. J Bacteriol. 2011, 193: 7005-10.1128/JB.06284-11.PubMed CentralView ArticlePubMed
- Wetzel J, Kingsford C, Pop M: Assessing the benefits of using mate-pairs to resolve repeats in de novo short-read prokaryotic assemblies. BMC Bioinformatics. 2011, 12: 95-10.1186/1471-2105-12-95.PubMed CentralView ArticlePubMed
- Brzuszkiewicz E, Bruggemann H, Liesegang H, Emmerth M, Olschlager T, Nagy G, Albermann K, Wagner C, Buchrieser C, Emody L: How to become a uropathogen: comparative genomic analysis of extraintestinal pathogenic Escherichia coli strains. Proc Natl Acad Sci U S A. 2006, 103: 12879-12884. 10.1073/pnas.0603038103.PubMed CentralView ArticlePubMed
- Diard M, Garry L, Selva M, Mosser T, Denamur E, Matic I: Pathogenicity-associated islands in extraintestinal pathogenic Escherichia coli are fitness elements involved in intestinal colonization. J Bacteriol. 2010, 192: 4885-4893. 10.1128/JB.00804-10.PubMed CentralView ArticlePubMed
- Le GT, Clermont O, Gouriou S, Picard B, Nassif X, Denamur E, Tenaillon O: Extraintestinal virulence is a coincidental by-product of commensalism in B2 phylogenetic group Escherichia coli strains. Mol Biol Evol. 2007, 24: 2373-2384. 10.1093/molbev/msm172.View Article
- Kamada N, Kim YG, Sham HP, Vallance BA, Puente JL, Martens EC, Nunez G: Regulated virulence controls the ability of a pathogen to compete with the gut microbiota. Science. 2012, 336: 1325-1329. 10.1126/science.1222195.PubMed CentralView ArticlePubMed
- Alteri CJ, Mobley HL: Escherichia coli physiology and metabolism dictates adaptation to diverse host microenvironments. Curr Opin Microbiol. 2012, 15: 3-9. 10.1016/j.mib.2011.12.004.PubMed CentralView ArticlePubMed
- Read AF: The evolution of virulence. Trends Microbiol. 1994, 2: 73-76. 10.1016/0966-842X(94)90537-1.View ArticlePubMed
- Meric G, Kemsley EK, Falush D, Saggers EJ, Lucchini S: Phylogenetic distribution of traits associated with plant colonization in Escherichia coli. Environ Microbiol. 2012, 15: 487-501.View ArticlePubMed
- Koch AL: The adaptive responses of Escherichia coli to a feast and famine existence. Adv Microb Physiol. 1971, 6: 147-217.View ArticlePubMed
- Button DK: Biochemical basis for whole-cell uptake kinetics: specific affinity, oligotrophic capacity, and the meaning of the michaelis constant. Appl Environ Microbiol. 1991, 57: 2033-2038.PubMed CentralPubMed
- Adlerberth I: Factors influencing the establishment of the intestinal microbiota in infancy. Nestle Nutr Workshop Ser Pediatr Program. 2008, 62: 13-29.View ArticlePubMed
- Palmer C, Bik EM, DiGiulio DB, Relman DA, Brown PO: Development of the human infant intestinal microbiota. PLoS Biol. 2007, 5: e177-10.1371/journal.pbio.0050177.PubMed CentralView ArticlePubMed
- Koenig JE, Spor A, Scalfone N, Fricker AD, Stombaugh J, Knight R, Angenent LT, Ley RE: Succession of microbial consortia in the developing infant gut microbiome. Proc Natl Acad Sci U S A. 2011, 108 (Suppl 1): 4578-4585.PubMed CentralView ArticlePubMed
- Gill SR, Pop M, Deboy RT, Eckburg PB, Turnbaugh PJ, Samuel BS, Gordon JI, Relman DA, Fraser-Liggett CM, Nelson KE: Metagenomic analysis of the human distal gut microbiome. Science. 2006, 312: 1355-1359. 10.1126/science.1124234.PubMed CentralView ArticlePubMed
- Koren O, Goodrich JK, Cullender TC, Spor A, Laitinen K, Kling BH, Gonzalez A, Werner JJ, Angenent LT, Knight R: Host remodeling of the gut microbiome and metabolic changes during pregnancy. Cell. 2012, 150: 470-480. 10.1016/j.cell.2012.07.008.PubMed CentralView ArticlePubMed
- Buckling A, Craig MR, Brockhurst MA, Colegrave N: The Beagle in a bottle. Nature. 2009, 457: 824-829. 10.1038/nature07892.View ArticlePubMed
- Ford CB, Lin PL, Chase MR, Shah RR, Iartchouk O, Galagan J, Mohaideen N, Ioerger TR, Sacchettini JC, Lipsitch M: Use of whole genome sequencing to estimate the mutation rate of Mycobacterium tuberculosis during latent infection. Nat Genet. 2011, 43: 482-486. 10.1038/ng.811.PubMed CentralView ArticlePubMed
- Okoro CK, Kingsley RA, Quail MA, Kankwatira AM, Feasey NA, Parkhill J, Dougan G, Gordon MA: High-resolution single nucleotide polymorphism analysis distinguishes recrudescence and reinfection in recurrent invasive nontyphoidal Salmonella typhimurium disease. Clin Infect Dis. 2012, 54: 955-963. 10.1093/cid/cir1032.PubMed CentralView ArticlePubMed
- Morowitz MJ, Denef VJ, Costello EK, Thomas BC, Poroyko V, Relman DA, Banfield JF: Strain-resolved community genomic analysis of gut microbial colonization in a premature infant. Proc Natl Acad Sci U S A. 2011, 108: 1128-1133. 10.1073/pnas.1010992108.PubMed CentralView ArticlePubMed
- Taylor DE: Bacterial tellurite resistance. Trends Microbiol. 1999, 7: 111-115. 10.1016/S0966-842X(99)01454-7.View ArticlePubMed
- Kurokawa K, Itoh T, Kuwahara T, Oshima K, Toh H, Toyoda A, Takami H, Morita H, Sharma VK, Srivastava TP: Comparative metagenomics revealed commonly enriched gene sets in human gut microbiomes. DNA Res. 2007, 14: 169-181. 10.1093/dnares/dsm018.PubMed CentralView ArticlePubMed
- Martens EC, Chiang HC, Gordon JI: Mucosal glycan foraging enhances fitness and transmission of a saccharolytic human gut bacterial symbiont. Cell Host Microbe. 2008, 4: 447-457. 10.1016/j.chom.2008.09.007.PubMed CentralView ArticlePubMed
- Kim DH, Konishi L, Kobashi K: Purification, characterization and reaction mechanism of novel arylsulfotransferase obtained from an anaerobic bacterium of human intestine. Biochim Biophys Acta. 1986, 872: 33-41. 10.1016/0167-4838(86)90144-5.View ArticlePubMed
- Kurihara S, Oda S, Kumagai H, Suzuki H: Gamma-glutamyl-gamma-aminobutyrate hydrolase in the putrescine utilization pathway of Escherichia coli K-12. FEMS Microbiol Lett. 2006, 256: 318-323. 10.1111/j.1574-6968.2006.00137.x.View ArticlePubMed
- Kurihara S, Oda S, Tsuboi Y, Kim HG, Oshida M, Kumagai H: Suzuki H: gamma-Glutamylputrescine synthetase in the putrescine utilization pathway of Escherichia coli K-12. J Biol Chem. 2008, 283: 19981-19990. 10.1074/jbc.M800133200.View ArticlePubMed
- Skibinski DA, Golby P, Chang YS, Sargent F, Hoffman R, Harper R, Guest JR, Attwood MM, Berks BC, Andrews SC: Regulation of the hydrogenase-4 operon of Escherichia coli by the sigma(54)-dependent transcriptional activators FhlA and HyfR. J Bacteriol. 2002, 184: 6642-6653. 10.1128/JB.184.23.6642-6653.2002.PubMed CentralView ArticlePubMed
- Didelot X, Meric G, Falush D, Darling AE: Impact of homologous and non-homologous recombination in the genomic evolution of Escherichia coli. BMC Genomics. 2012, 13: 256-10.1186/1471-2164-13-256.PubMed CentralView ArticlePubMed
- Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, Formsma K, Gerdes S, Glass EM, Kubal M: The RAST Server: rapid annotations using subsystems technology. BMC Genomics. 2008, 9: 75-10.1186/1471-2164-9-75.PubMed CentralView ArticlePubMed
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215: 403-410.View ArticlePubMed
- Darling AE, Mau B: Perna NT: progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One. 2010, 5: e11147-10.1371/journal.pone.0011147.PubMed CentralView ArticlePubMed
- Didelot X, Falush D: Inference of bacterial microevolution using multilocus sequence data. Genetics. 2007, 175: 1251-1266.PubMed CentralView ArticlePubMed
- R Foundation for Statistical Computing: R:A language and environment for statistical computing. Vienna, Austria, URL http://www.R-project.org.ISBN3-900051-07-01.
- Nenadic O, Greenacre M: Correspondence analysis in R, with two- and three-dimensional graphics: The ca package. J Stat Software. 2007, 20: 1-13.
- Conesa A, Gotz S, Garcia-Gomez JM, Terol J, Talon M, Robles M: Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics. 2005, 21: 3674-3676. 10.1093/bioinformatics/bti610.View ArticlePubMed
- Wright F: The ‘effective number of codons’ used in a gene. Gene. 1990, 87: 23-29. 10.1016/0378-1119(90)90491-9.View ArticlePubMed
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.