We analyzed the phylogenetic diversity of an unselected set of 161 consecutive bacteremic E. coli isolates, and compared the genomic content of a representative subset of genotypes in order to investigate the link between clonal diversity, gene content and clinical correlates of bacteremia.
Bacteremic isolates were distributed across the entire span of E. coli phylogenetic diversity. The amount of polymorphism found herein is consistent with values reported in E. coli, based on other housekeeping genes [18, 19, 35, 36]. Amounts of polymorphism among bacteremia isolates were only slightly lower than a recent study of 185 isolates from freshwater beaches . For example, the authors found 49 uidA alleles with 12% polymorphic sites, whereas we found 27 uidA alleles and 9.3% polymorphic sites (note that our collection of bacteremic isolates included only 7% of isolates of group B1, whereas this group represented 56% of isolates from the environment ). Consequently, our results clearly indicate that human E. coli bacteremia strains are genetically highly diverse. Of note, avian pathogenic E. coli strains (APEC) overlap only partly with the human ExPEC population, with some potentially zoonotic clonal groups being found both in humans and birds, while other APEC clonal groups are rare or absent among human ExPEC [37–39]. Thus, total ExPEC diversity may exceed the large diversity of human bacteremic isolates alone.
Although it was not the main purpose of this study, we noted that our combined analysis of bacteremic isolates and of ECOR reference strains using ClonalFrame revealed the existence of sharp demarcations among deep branches, suggesting a much stronger structure within the E. coli species than previously disclosed based on recombination-sensitive phylogenetic analyses, such as the neighbor-joining method. So far, methods that detect homologous recombination events and account for these in phylogenetic reconstruction have not been used widely in E. coli, and it is likely that phylogenetic group demarcation may have been problematic until now given the effects of homologous recombination, which tends to homogenize E. coli diversity and blur the neat borders that may delineate independently evolving phylogenetic lineages [19, 26]. In addition, classification into only four major groups appears to be an oversimplification of a more complex reality. The five major branches disclosed herein do not correspond totally to previous group definitions, as groups A and B1 were not well separated, and as branch F could not be equated to any previously described group. These results challenge the classical view of the internal phylogenetic structure of E. coli, in agreement with recent studies [18, 19].
Our initial hypothesis was that clones, rather than entire phylogenetic groups, may be more relevant natural entities to establish an association of genotype with phenotype, including clinical correlates of bacteremic isolates. In order to define groups of closely related genotypes, we used allelic profile-based comparisons, rather than nucleotide sequence-based analysis [16, 21, 23]. Because the later approach is sensitive to homologous recombination, analysis of allelic profiles is preferred for clone definition in many bacterial species [12, 40]. Among E. coli bacteremic isolates, a number of CCs, which can be equated to clones, could thus be defined. Interestingly, the central genotype of CCs, inferred to represent the founder of the clone, generally had a higher frequency than its variants, consistent with currently ongoing or recent expansion of the clones . For the purposes of classification, it is noteworthy that most CCs were separated by large allelic profile distances (four or five mismatches), highlighting the neat demarcation among them. Phylogenetic group B2 appeared to be the most strongly structured group, as most isolates were grouped into nine CCs (Figure 2).
We explored the hypothesis that clinical correlates and distribution of particular genes would be associated with specific clonal backgrounds, rather than being distributed widely across E. coli diversity. Because horizontal gene transfer occurs among E. coli strains, virulence genes that confer specific advantages e.g. for urinary tract infection can be distributed in various genomic backgrounds. However, association of specific genes with particular genomic backgrounds (clonal groups) can be retained, at least in the short term, by predominantly vertical transmission e.g. for genes that are not harbored on mobile elements. In this case, knowledge of the clonal background would have predictive value regarding gene content and corresponding phenotypes .
Consistent with our initial hypothesis that particular clones may exhibit clinically relevant features, several correlations were established between some CCs and clinical data. Most notably, CC1 and CC4 were clearly associated with isolates responsible for urosepsis; these CCs correspond to previously-described subgroups II (strain CFT073) and IX (strain RS218), corresponding to strains responsible of urosepsis and of neonatal meningitis, respectively . Hence, the previously held association of B2 as a whole with urosepsis may be an oversimplification due to the strong contribution of CC1 and CC4 to isolates of this group, and may thus not be valid for all B2 strains. Interestingly, we detected by DNA array analysis, two specific gene clusters associated with CC1 and CC4. These sequences, which correspond to cryptic prophages, were previously described as being located in two consecutive genomic islands . Consistent with our results, one of these clusters, PAI-CFT073-icdA, was described as being more frequent among pyelonephritis isolates . Likewise, a specific DNA region encoding to a putative RTX protein was significantly associated to CC4 isolates. A first RTX-toxin described in E. coli, hemolysin A, has been clearly associated with pyelonephritis and implicated in inflammation during urinary tract infection [42, 43]. Some UPEC strains including CFT073 harbor two operons of this toxin, and the specific role of each copy remains unclear [33, 44]. Whether the new putative RTX toxin constitutes an advantage to CC4 isolates and could explain their specific urovirulence remains to be determined.
As assessed by PCR and DNA array hybridization, clonal complexes were characterized by specific gene content. For example, isolates belonging to CC4 exhibited significantly more VFs (sfa, cnf1, hlyC) than CC1 isolates, and ORF svg was associated with CC1. In addition, the pattern of gene content variation (Figure 3) was highly concordant with clonal complexes, clearly illustrating that genomic background as assessed by MLST and gene content are strongly correlated. This pattern is consistent with the well-established mechanisms of deletion or acquisition of entire PAIs [45–50]. In order to get clues into the possible biological and clinical significance of gene content differences among clones, it will be necessary to combine functional studies with the analysis of E. coli isolates from other sources. For example, comparing the gene content of bacteremic and commensal isolates within a single clone should provide insights into microevolutionary events leading to increased or decreased pathogenic potential.
Our results confirmed that most of the usually recognized extra-intestinal VFs (e.g. pap, sfa, hly) were concentrated in the virulent CCs, particularly those belonging to phylogenetic group B2. In contrast, others VFs, for example those related to the plasmid-encoded iron uptake system (e.g. iucC, iroN), were more broadly distributed [45, 51]. Therefore, for highly mobile genetic elements (e.g. plasmids), the association with clonal background may not be retained.