The E. coli genome continuously changes through both small-scale variations and horizontal gene transfer of mobile genetic elements (MGE). Among MGEs, bacteriophages play a pivotal role in the evolution of E. coli pathogenic clones  by providing a mean for genomic remodelling and conveying important virulence genes such as those encoding the Verocytotoxins (vtx1 and vtx2) in the Verocytotoxin-producing E. coli (VTEC).
In 2011 a huge outbreak caused by an Enteroaggregative Haemorrhagic E. coli (EAHEC) O104:H4 struck Germany with more than 3,000 cases of infection, 800 HUS, and 50 deaths . The causative agent was a mosaic strain deriving from the lysogenization of an Enteroaggregative E. coli strain with a vtx2-phage . Such a virulence combination was indeed associated with elevate pathogenicity, as demonstrated by the high rate of human infections evolving to HUS, even among adults (88% and 42 years of median age), and the heavy toll of 50 deaths .
This arrangement of virulence factors in E. coli strains from human disease had been occasionally reported before during the investigation of a small outbreak of HUS occurred in France in 1992 and a case of infection in Japan in 1999 [10, 11].
The occurrence of the German outbreak caused the scientific community to look back retrospectively at the repositories of VTEC infections records and culture collections and it turned out that some other sporadic cases of infections with Vtx2-producing Enteroaggregative E. coli O104:H4 had already occurred in Europe and Asia in the time span 2001–2011 [12, 31]. Finally a HUS case occurred in Northern Ireland in 2012 was demonstrated to be associated with an EAHEC O111:H21 .
The observation of sporadic cases and outbreaks occurring throughout a 20-years time span, all caused by Vtx2-producing EAggEC and belonging to different serotypes, strengthens the hypothesis that these pathogenic E. coli represent a new pathogroup, as it has been recently proposed .
To better understand the events underlying the emergence of EAHEC we determined the whole genome sequence of Phi-191, the vtx2-phage present in the EAHEC O111:H2 isolated during the French outbreak of 1992, and compared it with the sequences of the vtx2-phage present in the EAHEC strains described in the following years and available in GenBank.
Interestingly, the genomic sequence of Phi-191 was almost identical to that of the vtx2-phages from the EAHEC O104:H4 strains isolated during the 2011 German outbreak about 20 years later. This is noteworthy since vtx-phages are characterized by a high degree of variability [50, 51]. It is conceivable that the same vtx2-phage has been acquired in two different events and that the selective pressure impeded the accumulation of changes in the phage sequence before the phage infection events occurred.
However, the EAHEC O111:H21 isolated in Northern Ireland in 2012 seems to host a different type of vtx2-phage, suggesting that at least two different vtx2-phage types have been successfully transferred to EAggEC. Unfortunately, the sequence of the phage of the EAHEC O86: HNM isolated in Japan in 1999 was not available for comparison .
It has been hypothesized that the infection with a lambdoid phage can be mediated by the cross-talking between the bacterium and the phage resulting in host specificity . An extended comparison of the EAHEC vtx2-phages with the whole genome sequences of vtx-phages from VTEC strains available at NCBI returned a wide range of similarities between sequences, going from 87% to 60% and lower. This picture is in line with how reported for the general variability of vtx-phages sequences . Interestingly, one region of 900 bp, identified in the Phi-191 and encoding a tail fiber, was present in all the vtx2-phages from EAHEC and was also present in the short reads dataset from the EAHEC O111:H21. At the same time this DNA sequence was absent in all the vtx-phage sequences identified in VTEC strains and stored at NCBI.
This is in agreement with previously reported data which pointed at a larger fragment including this region as one of those differentiating the P13374 genome from the E. coli phage TL-2011c (O103:H25) and not exhibiting significant homology to known vtx-encoding phages .
The annotation of the Phi-191 genome showed that this sequence peculiar to EAHEC vtx2-phages contains part of a gene encoding a type of phage tail fiber displaying some conserved aminoacidic motifs such as a Collagen triple helix repeat (20 copies) [NCBI CDD:189968] and the Peptidase_S74 [NCBI CDD:258151]. The latter is the C-terminal domain of the bacteriophage protein endosialidase, which forms homotrimeric molecules and releases itself from the end-tail-spike of the bacteriophages .
The 900 bp-long sequence could potentially encode part of the mechanism defining the specificity of the vtx2-phages for EAggEC strains, being directly involved in the phage-bacterium interaction. As a matter of fact, several authors reported that the interactions between phage tail fibers and host proteins, such as LamB and OmpC [52, 53] contribute to the success of the infection, as demonstrated by the finding that lamB gene mutations block phage adsorption . It is therefore conceivable that differences in phage tail fibers may contribute to define vtx2-phages tropism for E. coli recipients, although this hypothesis together with the mechanisms underlying this process still need to be verified.
For a successful infection to occur, suitable vtx-phages and E. coli acceptors need to meet in the same environment. In the case of the emergence of typical VTEC, the events of vtx-phage acquisition probably occurred at the level of the gastrointestinal tract of ruminants  where both vtx-phages and aEPEC are abundant [55, 56].
Conversely, the EAHEC emergence is probably not directly connected to an animal reservoir since EAggEC are human pathogens with an inter-human transmission of the infection . The environment, in turn, plays a role in the pathogen’s amplification cycles, particularly in geographic areas characterised by poor hygienic conditions, where the lack of effective human sewage treatments make the infections with enteric pathogens, including EAggEC, endemic . In such a scenario, an environment contaminated with ruminant’s excreta might have been the source of the vtx2-phages found in EAHEC as it has been recently proposed . Such a picture may account for the existence of a favourable setting for the EAggEC and the vtx-phages to come in contact and for the following selection process resulting in the occasional emergence of an E. coli strain matching the EAHEC definition.