Genetic changes during a laboratory adaptive evolution process that allowed fast growth in glucose to an Escherichia coli strain lacking the major glucose transport system

Background Escherichia coli strains lacking the phosphoenolpyruvate: carbohydrate phosphotransferase system (PTS), which is the major bacterial component involved in glucose transport and its phosphorylation, accumulate high amounts of phosphoenolpyruvate that can be diverted to the synthesis of commercially relevant products. However, these strains grow slowly in glucose as sole carbon source due to its inefficient transport and metabolism. Strain PB12, with 400% increased growth rate, was isolated after a 120 hours adaptive laboratory evolution process for the selection of faster growing derivatives in glucose. Analysis of the genetic changes that occurred in the PB12 strain that lacks PTS will allow a better understanding of the basis of its growth adaptation and, therefore, in the design of improved metabolic engineering strategies for enhancing carbon diversion into the aromatic pathways. Results Whole genome analyses using two different sequencing methodologies: the Roche NimbleGen Inc. comparative genome sequencing technique, and high throughput sequencing with Illumina Inc. GAIIx, allowed the identification of the genetic changes that occurred in the PB12 strain. Both methods detected 23 non-synonymous and 22 synonymous point mutations. Several non-synonymous mutations mapped in regulatory genes (arcB, barA, rpoD, rna) and in other putative regulatory loci (yjjU, rssA and ypdA). In addition, a chromosomal deletion of 10,328 bp was detected that removed 12 genes, among them, the rppH, mutH and galR genes. Characterization of some of these mutated and deleted genes with their functions and possible functions, are presented. Conclusions The deletion of the contiguous rppH, mutH and galR genes that occurred simultaneously, is apparently the main reason for the faster growth of the evolved PB12 strain. In support of this interpretation is the fact that inactivation of the rppH gene in the parental PB11 strain substantially increased its growth rate, very likely by increasing glycolytic mRNA genes stability. Furthermore, galR inactivation allowed glucose transport by GalP into the cell. The deletion of mutH in an already stressed strain that lacks PTS is apparently responsible for the very high mutation rate observed.


Background
Genome changes including point mutations, duplications, and recombination with homologous and heterologous DNA are the driving force of evolution. Bacteria, the most diverse and adapted types of cells in the biosphere, have permanently been evolving during millions of years to survive different environmental changes. Comparative genomics is a very powerful tool to analyze bacterial evolution occurring over short periods of time [1][2][3]. Moreover, whole genome resequencing of evolved Escherichia coli strains using simultaneously two different methodologies has recently been reported. This strategy is certainly very useful for understanding bacterial evolution, such as pathogen emergence, adaptation to environmental perturbations or during fermentation events used to generate derivative strains with enhanced industrial capacities [3][4][5].
We have constructed and characterized Escherichia coli strains that lack the phosphoenolpyruvate: carbohydrate phosphotransferase system (PTS), by deletion of the ptsH, ptsI, and crr genes, which is the major bacterial component involved in glucose transport and its phosphorylation. One of these strains, PB11, in spite of growing very slow in glucose (with a specific growth rate (μ) = 0.1 vs. 0.7 h -1 as compared to the parental strain JM101), accumulates high amounts of phosphoenolpyruvate, which can be diverted to the synthesis of aromatic compounds. PTS deletion results in a carbon stress response when the PB11 strain is grown in glucose as the sole carbon source that induces carbon scavenging. Strains lacking PTS can co-utilize several carbon sources due to the lack of catabolite repression exerted by PTS, and their glycolytic flux is reduced as part of a carbon limitation response [6][7][8][9][10][11][12][13]. As a metabolic engineering strategy, an adaptive laboratory evolution process for the selection of faster growing derivatives of the PB11 strain was carried out in a fermentor in minimal medium with glucose as the sole carbon source. In this process, after entering the stationary phase this carbohydrate was fed by progressively increasing the dilution rate. The resulting strain, PB12, which achieved a very reasonable growth rate (μ= 0.44 h -1 ), was selected in a process that lasted 120 hours (hr) (Figure 1) [9,10,12,13]. The evolved PB12 strain that in the absence of PTS uses the galactose permease (GalP), as the parental PB11 strain for glucose transport, has been utilized for overproduction of aromatic compounds [7,9,12,[14][15][16][17].
It is well known that E. coli cells can adapt their metabolism to achieve higher growth rates as a result of specific mutations [2,5,18]. To get insights of the faster growth of the PB12 strain, we have compared its transcript levels with those of the parental PB11 strain, by reverse transcriptase quantitative real time PCR (RT-qPCR), of critical metabolic pathways. Interestingly, we found that all glycolytic and several other central carbon metabolism genes, including those that code for the tricarboxylic acid (TCA) cycle enzymes, are overexpressed, suggesting a very efficient carbon utilization by the evolved strain [7][8][9][10][11][12][13]19]. We have previously shown that a mutation in the arcB gene could be responsible for the overexpression of the TCA genes [9,[20][21][22]. In addition a second mutation responsible of amber stop codon at position 98 in the rpoS gene which codes for the sigma factor RpoS, was detected in PB12 when compared against strain MG1655 [9,11]. Nevertheless, to get a detailed knowledge at the molecular level, of all the different genetic changes that occurred in the PB12 strain, a complete genomic analysis is required. This information will allow a better understanding of the basis of growth adaptation, plasticity, and the physiology of Figure 1 Isolation of the evolved PB12 strain. The isolation of PB12 has previously been reported and is included to provide orientation to the reader and for discussion purposes [10]. The evolutionary process that generated the PB12 strain initiated with the parental PB11 strain that lacks the PTS system. Deletion of this system generates a carbon stress response when PB11 is grown in glucose as the sole carbon source [9,13]. This strain that grows very slowly in glucose and generates white colonies (WC) in glucose-McConkey agar plates, was grown in a batch culture fermentor containing minimal medium with 2 g/l of glucose as the sole carbon source and 30 μg/ml of kanamycin. Under these conditions, a selection pressure is generated, favoring faster growing mutants. The culture was maintained until the stationary phase and then a continuous culture was initiated by feeding a glucose solution at progressively higher dilution rates in the same medium. Dotted line indicates the end of the batch culture and the start of the continuous culture. This procedure allowed the isolation of mutants according to their growth rates. Samples were monitored on glucose-McConkey agar plates to identify red colonies as an indicative of glucose utilization [Glc + ] phenotype. Red colonies (RC) were detected after a period of 70 hr. The arrows indicate the isolation time for several Glc + variants including PB12. Numbers indicate different dilution rates (D = h -1 ). All the isolated colonies from this culture carry the same large deletion present in strain PB12 (data not shown). This figure was derived and modified from figure 1 from Flores et al. 2007 [10].
this evolved E. coli strain, and also will be useful in the design of improved laboratory adaptive evolution and metabolic engineering strategies for enhancing carbon diversion into the aromatic pathway utilizing strains lacking PTS.
In this work, using the Roche NimbleGen Inc. comparative genome sequencing technique (CGS) and high throughput sequencing with Illumina Inc. GAIIx, we identified all the genetic changes that occurred in the evolved PB12 strain during the selection process and analyzed and characterized the most relevant ones. Results of the whole genome sequencing, supported by transcript quantification by RT-qPCR and by knockout inactivation of selected genes in the parental PB11 strain, indicate that a simultaneous deletion of several contiguous genes including rppH, mutH and galR, is the main reason for the fast growth in glucose. galR codes for the repressor of the gal operon that includes galP that codes for the GalP permease [23], rppH codes for the RNA pyrophosphohydrolase (RppH), which initiates mRNA degradation [24], while mutH codes for the endonuclease of the MutHLS system involved in the mismatch DNA repair system [25]. In addition, several non-synonymous point mutations were detected as one located in the RNase I coding gene rna, involved in the degradation of RNA [26,27], while others were located in known and in putative regulatory genes, such as arcB [21,22], barA [28,29] rpoD [30], rssA [31] and yjjU [32,33]. Finally, other mutation was mapped on ypdA, which code for a putative histidine kinase [34].

Results and discussion
Detection and characterization of non-synonymous point mutations in the evolved PB12 strain Two comparative whole genome nucleotide sequence analyses of the evolved PB12, its parental JM101, and the wild type K-12 MG1655 strains were performed. The first was carried out by Roche NimbleGen Inc., Madison, WI (RN), using their CGS method; the second analysis was developed by Winter Genomics Inc., Mexico City (WG), using Illumina's massively parallel sequencing technology (see Materials and Methods). In the RN analysis, 26 non-synonymous point mutations were detected in structural genes; 21 of them were also mapped at the same positions by WG. In addition, 6 non-synonymous mutations were detected only by WG (Table 1A and Tables S1 and S2 presented in Additional file 1 and Additional file 2). Since there was some discrepancies between the two technologies that utilized DNA obtained from the original frozen stock of the PB12 strain (see Materials and Methods), we decided to sequence each of the mutant genes solely reported by one company, after PCR amplification using the Sanger methodology. Only two of the mutants detected by WG (in the csgF and ytfR genes) could be confirmed by Sanger resequencing. The mutations in the other four genes (ftsK, stfE, C0362 and rsxC) reported by WG and the five other genes (mdlB, yagN, ydfN, ykfA and yagG) reported by RN (Tables S1 and S2 presented in Additional file 1 and Additional file 2), could not be confirmed by the Sanger methodology (data not shown). Therefore, the total number of non-synonymous point mutations detected, comprised 21 reported by both companies, plus 2 additional mutations reported only by WG. 14 of the 21 common mutations including the ones located in regulatory genes (see below), were confirmed by Sanger resequencing (Table 1A). Recently, it has been shown that when both types of methodologies were utilized simultaneously for whole genome resequencing of E. coli strains in which growth adaptations by evolution occurred, both techniques reported false positive mutations [4,5]. Therefore, it is likely that the mutations in the PB12 strain not confirmed by the Sanger methodology are false positives. However, since the mutH gene deletion in this strain is responsible of increasing the mutation rate in E. coli [35] (see below), it could be possible that the two nonsynonymous mutations detected only by WG, and confirmed by Sanger, are due to de novo changes that occurred in the overnight culture utilized to obtain DNA for genome analysis by WG and not in the fermentation process started with the PB11 strain ( Figure 1). Alternatively, these two mutations could be real, but they were not detected by the RN analysis. Importantly, the nucleotide sequence of the parental PB11 strain was also determined by WG (data not shown). None of the point mutations that occurred in PB12 were detected in PB11, indicating that they appeared in the laboratory adaptive evolution process.
Among the non-synonymous substitutions detected some were located at genes with regulatory functions: arcB, barA, rpoD, rna, and three in putative regulatory genes: yjjU, rssA and ypdA (Table 1A). The mutation in the arcB gene, a tyrosine to cysteine residue substitution at position 71 that apparently modifies the ArcA/B ability to function as a repressor, has previously been reported by our group. We have proposed that this modification could be responsible for the overexpression of the TCA genes in the PB12 derivative as compared to the parental PB11 strain [9,20,21] (Table 1A). A new role for this mutation is proposed (see next sections).
The barA mutation resulted in a phenylalanine to leucine residue substitution at position 366 (Table 1A). This residue is located between two functional subregions of the HK domain of this protein; the "H box", where the conserved histidine residue involved in the autophosphorylation of BarA is located, and the "N box", involved in ATP binding [28,29]. The change in the rpoD gene resulted in a valine to isoleucine residue substitution at position 582, which is located in a helix-turn-helix (HTH) motif of the RpoD coded protein (Table 1A). This motif is involved in the binding to the −35 promoter region, but it is unlikely that the conservative nature of the substitution, and the fact that this residue does not make any direct contact with the DNA [36], had any significant effect on promoter recognition.
The mutation in the rna gene, an alanine to threonine residue substitution at position 90 of the coded RNase I, is also unlikely to have any consequence, since it is located in a nonconserved structural part outside the catalytic region [37].
The yjjU mutation resulted in a threonine to alanine residue substitution at position 179 in the coded YjjU protein (Table 1A). It has been proposed that this protein could have a regulatory role [32,33].
The mutation in the rssA gene resulted in an arginine to histidine residue substitution at position 258 of its product. The importance of this mutation, as well as the function of the RssA protein, are unknown; however, it might be functionally related to the RssB protein, which is involved in the degradation of the RpoS, since both conserved proteins are coded by genes located in the same operon [31]. An esterase function has also been predicted for RssA [Uniprot.org].
The point mutation in the ypdA gene located at position 200, caused an alanine to serine residue substitution. This gene codes for a predicted sensory histidine kinase of the two-component system YpdA-YpdB [34].
In addition, 22 synonymous mutations were reported; 16 of them detected by both companies. These mutations were not analyzed since it is unlikely that they could have any significant effect on the phenotype (Tables S1 and S2 in Additional file 1 and Additional file 2). Also, several point mutations were detected in non-coding regions by WG. It should be noted that these unconfirmed mutations are unlikely to be located in regulatory regions (data not shown).
The presence of a mutation in the rpoS gene in the PB12 strain has been reported. This change generates a stop codon instead of a glutamine coding residue at position 98. It is known, that this strain being a derivative of JM101, carries a supE mutation, which suppresses amber stop codons [11,38,39]. Originally this mutation was considered to have occurred during the adaptive evolution process since the change was detected when compared against the sequence of strain MG1655 [9,11,40]. However, both comparative genome sequencing strategies showed that this mutation was already present in the parental PB11 and the JM101 strains. Sanger resequencing confirmed the presence of this mutation in both of the parental strains (data not shown). So, this mutation was incorporated at sometime during the development of the JM101 strain from its parental strains JC3130, CSH51 and 71.18 [38,39].
Detection, characterization of a chromosomal deletion in the evolved PB12 strain, and analysis of the effects of the deleted genes A deletion of 10,328 bp located at minute 64 on the chromosome of the PB12 strain that removed simultaneously the rppH, ygdT, mutH, ygdQ, ygdR, tas, lplT, aas, omrA, omrB, and the part of ptsP and galR genes, was detected by both RN and WG analyses (Table 1B and Figure 2). This deletion was confirmed by PCR ( Figure 3) and its limits mapped within the ptsP and galR genes, resulting in a fusion of the remaining segments of these two genes ( Figure 4). The nucleotide sequence of the fused fragments was confirmed by Sanger and it is presented in Additional file 3 ( Figure S1). Neither repeated sequences nor insertion sequences were detected in the chromosomal DNA regions flanking the deleted genes; therefore, the molecular bases of this deletion are unknown. Table 1 Mutations that occurred in the evolved PB12 strain during the adaptive process (Continued) omrB Small RNA that is involved in regulating the protein composition of the outer membrane.

Present
Absent --galR DNA-binding transcription factor; represses transcription of the operons involved in transport and catabolism of D-galactose.

Present
Absent --  (Tables S1 and S2 in Additional file 1 and Additional file 2). The regulatory and possible regulatory genes analyzed in this study are in bold letters. The mutations in these seven regulatory and possible regulatory genes and in seven additional genes were confirmed by the Sanger methodology and are labeled with a +. 26 (21+5) point mutations in 26 genes were detected by RN, 5 of them were false positive. 27 (21+6) point mutations in 27 genes were detected by WG. In addition to the 21 mutations shared with RN, only two of these point mutations (noted as WG) were confirmed by the Sanger methodology. Additional file 1 and Additional file 2 (Tables S1 and S2) include the data from RN and WG. The analyses of three of these genes, galR, mutH and rppH whose deletion could be the main cause (see below) of the faster growth in glucose of the evolved PB12 strain is presented and discussed in the following sections.
The galR gene codes for the repressor of the gal regulon that includes the galP gene [23]. Therefore, the inactivation of this gene in the PB12 strain is apparently responsible for the high transcription levels of most of the gal genes. In agreement with this proposition, we have previously demonstrated that the PB12 strain that lacks PTS is dependent on GalP for glucose transport [9,12].
The rppH gene codes for an RNA pyrophosphohydrolase that initiates mRNA degradation by hydrolysis of the 5' triphosphate-end. After this modification, RNase E can initiate further mRNA degradation. It has been reported that the level of 382 transcripts increased significantly in E. coli cells lacking RppH [24]. Accordingly, in the PB12 strain higher transcript levels of many genes, among them glycolytic and TCA genes were detected by RT-qPCR as compared to the parental JM101 and PB11 strains. We could not explain the basis of the simultaneous "overexpression" of the genes involved in these metabolic pathways (with the exception of the TCA cycle genes due to the arcB mutation, as previously mentioned) [9,11]. In the light of the phenotype of the strains lacking rppH, it is likely that the higher levels of the transcripts observed in the PB12 strain is a result of impaired mRNA degradation instead of overexpression (see below). In fact, rppH inactivation in the PB11ΔrppH strain increased its μ 261% (Table 2). In agreement with this μ increment, as it will be presented in detail in the next sections, the RT-qPCR values of most central metabolic genes were increased 1.5-to 18-fold when the rppH gene was inactivated in the PB11ΔrppH derivative as compared to the parental PB11 strain, allowing an improved glucose metabolism, probably due to higher levels of central metabolic genes transcripts (see next sections). Interestingly, in contrast to the PB11 strain, inactivation of the rppH gene in the parental JM101ΔrppH strain decreased 27% its μ (data not shown), indicating that this mutation disrupted the glucose metabolism in this strain that does not have impaired the glucose transport. From these results it is tempting to speculate that inactivation of rppH could be considered as a tool for increasing the half-life of certain mRNAs families, and therefore growth rates, in strains with certain limited growth conditions.
The absence of the mutH gene, which codes for the MutH endonuclease that is part of the MutHLS complex involved in the DNA mismatch repair pathway, is probably responsible for the appearance of the large number of mutations detected in the PB12 strain during the short term adaptive laboratory evolution process that lasted for only 120 hr [5,7,9,10,12,18,25]. It is known that the absence of the mutH gene increases the mutation frequency in E. coli at least 200 fold [35]. In addition, starvation-induced mutagenesis among hundreds of E. coli natural isolates is increased on average 7-fold, but in certain strains up to 1000-fold [41][42][43]. The presented and analyzed information indicates that the 21 shared non-synonymous point mutations as well as the deletion are real mutations that appeared after a 120 hr period in this adaptive laboratory evolution process. A possible explanation for this high number of mutations is that the deletion of the mutH, rppH and galR genes occurred early in the very short laboratory Figure 2 Comparative genomic maps of the JM101 and PB12 strains. Deletions detected in the PB12 strain. The small deletion is the result of the elimination of the PTS genes (ptsH, ptsI and crr) that was previously generated in the parental PB11 strain [7,9]. The largest of these deletions appeared during the laboratory adaptive evolution process (Figures 3 and 4). evolutionary process allowing faster growth by itself. In the absence of mutH, some or several of the point mutations could then have occurred during a few replication cycles, favouring the selection of even faster growing variants in glucose. In addition, in this genetic background in which the PTS deletion generates a carbon stress response, further mutagenesis induction is expected [9,10,12,41,42,44,45].
RT-qPCR mRNA expression values of relevant mutated genes and of central metabolic genes in the evolved PB12 strain Table 3 shows mRNA expression values of relevant (mainly regulators or possible regulators) mutated genes and of some other genes as controls determined by RT-qPCR (see Materials and Methods). RT-qPCR values of more than 100 genes from the evolved PB12 strain, the parental PB11 and JM101 strains have previously been reported by our group, including the arcB gene in which a non-synonymous point mutation appeared, and are presented in Table 3 for comparison and analyses purposes [9][10][11]. Results indicate that the RT-qPCR values of some of the regulatory and possible regulatory genes were not substantially modified (except for rpoS and barA) with respect to the JM101 parental strain. As anticipated, except for galR, no transcripts of the 12 deleted contiguous genes were detected.
Inactivation of mutated regulatory and putative regulatory genes in the PB11 and PB12 strains. RT-qPCR values of carbon central metabolism genes of the PB11ΔrppH strain With the aim of understanding the possible roles of some of the regulatory and possible regulatory genes Figure 3 Chromosomal deletion markers in the PB12 strain. The absence of a chromosomal fragment in the PB12 strain was confirmed by PCR and by Sanger resequencing. Ten genes were deleted and the galR and ptsP genes were fused. Section A shows the ptsP and galR genes that were amplified in the JM101, PB11 and PB12 strains: line 1, (M) molecular weight markers; lines 2, 3 and 4, ptsP amplification in the JM101, PB11 and PB12 strains, respectively; lines 5, 6 and 7, galR amplification in the JM101, PB11 and PB12 strains, respectively; line 8, amplification of the chromosomal region in the PB12 strain; line 9, (M) molecular weight markers; lines 10, 11 and 12, amplification of the chromosomal region using DNA from strains JM101, PB11 and PB12, respectively. Section B presents the oligonucleotides utilized for DNA amplifications. The left section (L) includes the oligonucleotides employed for the amplification of the ptsP and galR genes of the three strains (lines 2-7), and the right section (R) presents the entire chromosomal regions of the same three strains amplified using ptsP-fwd and galR-rv oligonucleotides (lines 8, [10][11][12]. The nucleotide sequences of the oligos utilized are included in table S3 presented in additional file 4. mutated in PB12, as well as in its parental PB11 and JM101 strains (Table 1A), they were individually inactivated by a cassette insertion using the Datsenko-Wanner method [46] (see Materials and Methods), with the exceptions of the rpoD gene, since its inactivation is lethal [47], and the galR gene, since its regulated target, the galP gene, is already overexpressed in the PB11 and PB12 strains [9,23]. Interestingly, as shown in Table 2, individual cassette inactivation of the arcA, rppH, yjjU, and rssA genes in the PB11 strain increased their specific growth rates in glucose as the only carbon source, supporting our hypothesis that the inactivation of some of these genes, especially rppH, was the result of direct selection during the evolution process for faster growth.
Some of these regulatory genes were also inactivated by the Datsenko-Wanner method in the PB12 strain; the effects on their respective specific growth rate values are shown in Table 4. Knockout inactivation of the barA, arcA, and yjjU genes decreased the μ by about 5%, 10%, and 23%, respectively, while no substantial growth rate difference were observed when the rna, rssA, or ypdA genes were inactivated.
Since the inactivation of the rppH gene in the PB11ΔrppH strain is responsible for a markedly (261%) μ increment, the expression of several genes, including carbon central metabolism, transport and regulators in this strain, as well as in the JM101ΔrppH strain, were determined (Table 3). In PB11ΔrppH, most of the RT-qPCR values of the central metabolism genes, including TCA (except pgk, fbaA, talB, pckA), increased from 2-to 18-fold, as compared to PB11. RT-qPCR values of all genes involved in growth under stress-limited carbon conditions, which most of them are overexpressed in the PB11 strain due to the lack of PTS (such as the gal operon, poxB, acs, and the glyoxylate shunt genes), were also high in the PB11ΔrppH strain and, for some genes, the increase was up to 10-fold, as for the maeB gene (Table 3) [8][9][10][11][12]. The mRNA levels of all these previously mentioned genes, except for aceEF, fbaA, eno, pckA, pfkA, pykA and pgk, in the PB11ΔrppH strain were higher than in the parental JM101 strain. Importantly,    Gene pts operon crr 0.00 ± 0.00 0.00 ± 0.00 2.46 ± 0.37 0.00 ± 0.00 0.00 ± 0.00 0.00 ± 0.00 0.00 ± 0.00 ptsH 0.00 ± 0.00 0.00 ± 0.00 1.07 ± 0.12 0.00 ± 0.00 0.00 ± 0.00 0.00 ± 0.00 0.00 ± 0.00 ptsI 0.00 ± 0.00 0.00 ± 0.00 1.34 ± 0.14 0.00 ± 0.00 0.00 ± 0.00 0.00 ± 0.00 0.00 ± 0.00 RT-qPCR values of several of these genes were also increased in the JM101ΔrppH as compared to JM101, but in general at lower level, with the exception of the gapA, pgi, glk, sdhA, sdhB, and sucAB genes, which increased more than 3-fold ( Table 3). The RT-qPCR values of some of the ArcA/B regulated genes were also highly increased in the PB11ΔrppH (Table 3). When comparing the RT-qPCR values of central metabolic genes in the PB11ΔrppH strain with those of the PB12 strain, the TCA cycle genes were reduced in the PB12 strain ( Table 3). The analysis and possible explanations of these differences is presented in the next section. Remarkably, RT-qPCR values of most of the regulatory genes analyzed were also increased and, in some cases, highly increased in the PB11ΔrppH derivative, as compared to the parental PB11, whereas RT-qPCR values of most regulatory genes in the JM101ΔrppH derivative were not substantially modified as compared to its parental JM101 strain. As mentioned, higher RT-qPCR values of most of the genes in the PB11ΔrppH derivative are mainly the result of the increase in the mRNA half-life time due to the absence of RppH. However, since higher levels of most of the transcriptional regulators were detected in the PB11ΔrppH strain, it is possible that the enhanced expression of some genes could be, in addition of increased mRNA half-life time, the result of higher or lower expression of their regulatory genes.

Gene
Analysis and possible effects of some of the mutated regulatory and putative regulatory genes in the evolved PB12 strain It has previously been proposed that the point mutation in the arcB gene, detected in the PB12 strain, is apparently responsible for diminishing the ArcA/B function as a repressor, since it is known that inactivation of ArcB or enhancing of its ArcA-P dephosphorylating activity, could contribute to the overexpression of ArcA/B regulated genes [9,[20][21][22]. However, in order to explain the particularly higher mRNA levels of most of the TCA cycle and respiratory genes mainly controlled by ArcA/B in the PB11ΔrppH strain, as compared to the PB12 strain (except for mdh and nuoN; Table 3), we now propose a different role for this arcB mutation. This mutation is apparently responsible for modifying the ArcA/B repressor function in the PB12 strain by reducing -not enhancing-, its ArcA-P dephosphorylating capacity, which in turn could contribute to higher repression of ArcA/B regulated genes, explaining the reduction in the RT-qPCR values of most of the genes regulated by the ArcA-P in PB12 as compared to the PB11ΔrppH strain. Therefore, this change in the arcB gene apparently   reduced both, transcription of ArcA/B-dependent genes [9,20,21], and metabolic burden, allowing better growth capacities to the PB12 strain as compared to the PB11ΔrppH derivative (Table 3). In agreement with this proposition, the knockout inactivation of the arcA gene in the PB12ΔarcA strain reduced 10% the μ (Table 4), because higher transcription levels of the ArcA/B-controlled genes resulted in this derivative (data not shown) and this was probably sensed as metabolic burden. The same growth diminishing effect occurred in the JM101ΔrppH strain, probably due to higher transcription levels of many central metabolism genes, including some of the TCA cycle, which were apparently responsible for reducing 27% the μ (data not shown), as compared to the parental JM101 strain. In agreement with the important role of the ArcA/B regulator, inactivation of arcA in the PB11ΔarcA strain increase substantially the μ and the transcription levels of most of the ArcA/B-regulated genes as compared to PB11 strain [10] (Table 3). From these results, it is tempting to propose that inactivation of the arcA gene in E. coli could be used as a tool for allowing better growth capabilities to cells growing aerobically in certain stress conditions, in which the lack of regulation of the TCA cycle and respiratory genes would be an advantage [9,10]. It has been proposed that YjjU could be involved in regulatory processes [32,33]. The inactivation of yjjU in the PB11ΔyjjU strain increased its μ from 0.13 to 0.16 h -1 . This 23% increment is not as high as the values obtained with the inactivation of arcA (243%) and rppH (261%) ( Table 2). However, yjjU inactivation in the PB12ΔyjjU strain reduced its μ 23% (from 0.44 to 0.34 h -1 ), as compared to the parental PB12 strain (Table 4). These results suggest that if this protein really functions as a regulatory factor, as has been proposed, the point mutation could allow stronger capabilities to the cell for faster growth in glucose. Cassette inactivation of yjjU is the only case in which a gene knockout increased the μ in the PB11ΔyjjU derivative, and reduced the μ in the same percentage in the PB12ΔyjjU derivative. This mutation has to be investigated further, initially analyzing the transcription pattern of critical genes in the strain PB12ΔyjjU as compared to the parental PB12.
The mutation in the rpoD gene is responsible of a conserved valine 482 to isoleucine substitution located in the HTH motif of region 4.2 of RpoD that is involved in the recognition of the −35 promoter region. In the cocrystal structure of region 4.2 of Thermus aquaticus with promoter DNA, which is almost identical to the E. coli, this position is located at the turn of the HTH motif and does not make any direct contact with the DNA [36]. Thus, it is likely that this particular substitution does not affect the affinity of this sigma subunit for the promoter DNA sequences.
Since the knockout inactivation of the barA, rssA, rna and ypdA genes did not modify substantially the μ in the PB11 and PB12 derivatives, it appears that these genes played minor or not role at all in the growth recovery observed in the evolved strain.

Conclusions
We propose that the deletion event that simultaneously removed the mutH, rppH, and part of the galR genes, mainly responsible for the faster growth (4x) in glucose, occurred as one of the initial events in the adaptive laboratory evolution process which resulted in the evolved PB12 strain. This deletion caused simultaneously: a) a very high mutagenesis rate due to the removal of mutH, in a strain lacking PTS that is already responsible of a carbon stress response, b) higher glucose transport, by increased levels of GalP in this strain lacking PTS, due to the inactivation of galR [9,12], and c) higher mRNA levels resulting in enhanced glycolytic and TCA fluxes and better respiratory capacity to the precursor of the PB12 strain due to the absence of RppH.
In addition, lower mRNA levels of most of the ArcA/B regulated genes were detected in the PB12 strain as compared to the PB11ΔrppH derivative. This can be explained as an enhanced ArcA-P repressor capacity due to the arcB mutation that apparently appeared after the deletion of the rppH gene in the evolved strain, allowing lower levels of transcription of ArcA/B-regulated genes.
Knockout inactivation of the barA, rssA, rna and ypdA genes in the PB11 and in PB12 strains did not modify substantially the μ of the derivatives, suggesting that each of these mutations alone apparently played minor or no roles at all in the growth recovery in the evolved strain. Some of these changes could in fact be neutral mutations [48].
From these considerations, the evidences indicate that the main reasons for fast growth on glucose are apparently the deletion of the rppH, galR, and mutH genes and, perhaps, the point mutation in the arcB gene. These two changes could have been fixed in a short period of time during the fermentation process. Nevertheless, it cannot be ruled out that other point mutation, as those in the yjjU, or in the barA genes that have not been completely characterized in this study, could also play a minor role in the growth recovery in glucose.
In this study, as in others [4,5], we used two different whole genome sequencing strategies which produced slightly different results. True changes had to be discerned from false positives by conventional Sanger sequencing. Therefore, it is important to emphasize the relevance of using more than one genome resequencing method for this type of studies to have high confidence in the results.
Finally, the results presented here show the physiological plasticity of E. coli and could be useful in the design of more robust adaptive laboratory evolution strategies.
The culture of the PB12 strain that was also utilized for preparing the DNA for genome sequencing, was obtained from the original culture that has been kept frozen in glycerol (Figure 1) [10]. For μ determinations, cells were grown in LB and then inoculated into M9 minimal medium with 2 g/l of glucose as the only carbon source; when the cultures were growing exponentially, they were inoculated into the same prewarmed (50 ml) medium at 37°C and stirred at 300 rpm with a starting optical density at 600 nm (O.D. 600nm )= 0.1. O. D. 600nm were measured using a Klett/Summerson photocolorimeter, model 800-3. All specific growth rate values presented in Tables 2 and 4 are the averages of at least two independent cultures, each one in duplicate. For RNA isolation and RT-qPCR analyses, duplicate cultures were grown on 1 L fermentors on M9 medium with 2 g/l of glucose as the sole carbon source, at 37°C, stirred at 600 rpm and air flow rate at 1 vvm, with a starting O. D. 600nm = 0.1. For RT-qPCR determinations cells of the different fermentations were collected in the log phase at O. D. 600nm = 1 [9].
DNA extraction from parental and evolved PB12 strains for genomic analysis Two overnight cultures of the E. coli strains JM101, PB11 and PB12 were grown from their frozen original stocks in liquid LB medium. One set of these cultures (not including PB11) of these strains was utilized for DNA purification submitted to RN, and the DNA of the other set was submitted to the UNAM Massive Sequencing Unit, for genome resequencing (see below). DNA was extracted by a maxiprep phenol extraction and ethanol precipitation method [49] and purified with the Pure Link PCR purification kit (Invitrogen, USA). Quality and quantity of extracted DNA was verified as recommended by RN and by UNAM Massive DNA Sequencing Unit.
Roche NimbleGen Inc. sequencing DNA samples from the JM101 and PB12 strains were submitted to RN for CGS analysis using E. coli K-12 MG1655 (ATC #47076) as the reference strain [40]. The results provided by RN are included in Table 1, Figure 2 and in table S1 presented in Additional file 1.

Paired and paired end (PE) library construction and GAIIx sequencing
DNA samples from the JM101, PB11 and PB12 strains were submitted to the Massive DNA Sequencing Unit of UNAM for its paired ended (PE) library construction and genome sequencing. PE library was constructed following Illumina Inc. recommendations. Briefly, 5 μg of chromosomal DNA of each strain was fragmented by nitrogen nebulization during 6 min at a pressure of 32 psi. Fragmented DNA was purified using the QIAquick PCR purification kit and resuspended in 30 μl of elution buffer (EB: 10 mM TrisÁHCl, pH 8.5). DNA end repairs were performed using a mixture of T4 and Klenow DNA polymerases and T4 polynucleotide kinase for 5' ends. In order to facilitate the ligation of double stranded adapters, an adenine residue was incorporated at each 3' end of fragmented DNA before this step using a Klenow exo minus (exo -) enzyme and dATP. Illumina Inc. adapters with overhang thymine residues at 3' ends were ligated at each end of fragmented DNA using 2x rapid ligation buffer (Illumina Inc.) and T4 DNA ligase during 15 min at room temperature. Ligated DNA was purified using a Qiagen MinElute purification kit (Qiagen, USA) and resuspended in 15 μl of EB. Modified DNA pool was loaded on a 2% gel of Ultra Low Range Agarose (Bio Rad Laboratories USA) and %500 bp DNA fragments were purified using a QIAquick gel extraction kit. To enrich the adapter-modified DNA fragments, purified DNA was used as template for a 12-cycle PCR reaction (98, 65 and 72°C), using PCR primers PE 1.0 and 2.0 and Phusion DNA polymerase (included in the Illumina Inc. PE sample prep kit). PCR products were purified using a QIAquick PCR purification kit (Qiagen, USA) and eluted in 50 μl of EB. Validation and quantification of the libraries were performed using an Agilent Bioanalyzer 2100 (DNA 1000 chip). Finally, 18 pM of DNA library were used for a PE sequence of 2x36 cycles on a GAIIx instrument that performs sequencing by a synthesis method based on reversible fluorescent terminators accordingly to Illumina, Inc.
Genome "de novo" assembly and variant identification by Winter Genomics Inc Low quality reads produced by the Illumina GAIIx method were filtered using the ShortRead 1.8.0 package [50]. Assembly for each strain was performed with the PE-Assembler 1.1 [51]. IMAGE 2.1 [52] was used to close gaps as it locally assembles reads aligning to contig ends. Bowtie 0.12.5 short read aligner [53] was used to align reads to the resulting contigs and unsupported bases were removed with the Biostrings 2.18.0 package. Contigs were re-ordered along the E. coli K12 MG1655 genome [40] by using the Mauve 2.3.1. software [54,55]. Then contigs were compared against the reference genome using both Mauve and Murasaki 1.68.6. softwares [56]. Using the PTS operon deletion as marker, it was possible to correctly identify each strain. BLAT v34 software [57] was used to perform alignments of strain PB12, against JM101. VarScan 1.2 software [58] was used to identify variants using the BLAT alignments as input. Ambiguous variants were filtered out using a custom Perl script. For most of the analyses local cluster resources of the Instituto de Biotecnología-UNAM were used. The results provided by WG for JM101 and PB12 strains are included in Table 1 and in table S2 presented in Additional file 2. None of the point mutations detected in PB12 appeared in the genomic sequencing of the PB11 strain (data not shown).

DNA sequencing of putative mutations by Sanger methodology
DNA regions containing putative mutations in regulatory genes detected by RN and WG were PCR amplified using oligonucleotide primers listed in table S3 presented in Additional file 4, purified by the Pure Link PCR purification kit and sequenced by the Sanger methodology with the Taq FS Dye Terminator Cycle Sequencing Fluorescence-Based Sequencing, in a Perkin Elmer/ Applied Biosystems Model 3730. Sequence differences of 14 of the mutations presented in Table 1A were confirmed by examination of the trace data (data not shown).

RNA Extraction, DNAse treatment of RNA and cDNA synthesis for RT-qPCR analysis
Total RNA from the utilized strains was isolated and purified using the hot-phenol method, with some modifications. Samples containing 50 ml of the different strains growing logarithmically in the fermentor were collected at 1 OD 600nm . 1 ml of RNA later buffer (Ambion Inc., USA) was added to each sample, mixed and centrifuged 10 min/4°C/5000 rpm. Cells were resuspended with 1 ml of buffer I (0.3 M sucrose, 0.1 M sodium acetate), treated with 20 μl of lysozyme (10 mg/ml in TE buffer) for 10 min at room temperature. 2 ml of buffer II (0.01 M sodium acetate, 2% SDS) were added and the mixtures incubated for 3 min at 65°C. The lysates were extracted with 2 ml of hot phenol and heated for 3 min at 65°C. A second extraction with hot phenol was performed without heating the mixtures. Samples were then extracted with 2 ml of a phenol:chloroform mixture (1:1), precipitated with 0.1 volume of 3 M sodium acetate (pH 5.2) and 2.5 volume of ethanol and centrifuged for 15 min at 4°C/10000 rpm. Samples were then suspended in 300 μl of DNAse and RNase-free water (Ambion Inc, USA) with RNase inhibitor (Fermentas Life Sciences, USA) and extracted twice with 1 volume of chloroform. Finally, samples were precipitated as before and suspended in 300 μl of TE buffer (Ambion Inc, USA). RNA was analyzed on formaldehyde agarose gel for integrity. RNA concentrations were quantified using Nanodrop 2000c (Thermo Scientific); the 260 nm /280 nm and 260 nm /230 nm ratios were examined for protein and solvent contamination. For all samples the 260 nm /280 nm absorbances values were between 1.9-2.0 and in the range of 2.0-2.3 for the 260 nm /230 nm ratio. RNA samples were stored at −70°C. Three RNA extractions and purifications were carried out from three independent fermentations for each strain.
For DNAse treatment, total RNA samples were treated with TURBO DNA-free kit (Ambion Inc, USA) at 37°C for 30 min, following manufacturer's instructions. To determine whether RNA samples were significantly contaminated with genomic DNA, samples were subjected to conventional PCR with primers for the arcA gene (Table S3 presented in Additional file 4). Since these primers were designed to recognize genomic DNA, the presence of a detectable PCR product on an ethidium bromide-stained agarose gel confirmed that the specific RNA sample was contaminated with genomic DNA. Contaminated samples were discarded. PCR reactions were performed with Taq polymerase (Fermentas Life Sciences, USA). The cycling parameters were: 95°C for 5 min, 30 cycles at 95°C for 1 min, 55°C for 1 min and 72°C for 1 min plus an extension step at 72°C for 5 min. Additionally, the DNAse-treated RNA samples were used for RT-qPCR analyses of the same arcA gene, using the appropriate oligos [arcAa (forward) and arcAb (reverse)] (Table S3 presented in Additional file 4). As in the PCR case, all utilized samples did not produce a 101 bp amplimer, indicating that small fragments of genomic DNA were not present. cDNA was synthesized using RevertAid TM H minus First Strand cDNA Synthesis kit following the manufacturer´s instructions (Fermentas LifeSciences, USA.). For each reaction approximately 5 μg of RNA and a mixture of 10 pmol/μl of specific DNA reverse primers (b) for the utilized genes, were used. Nucleotide sequences of these genes have been previously published [9][10][11] or are listed in table S3 presented in Additional file 4. cDNA were used as templates for RT-qPCR assays. cDNAs were synthesized using specific oligonucleotides, since this condition ensures the synthesis of only one copy of cDNA per each RNA molecule [9,59].

RT-qPCR
RT-qPCR was performed with the ABI Prism 7000 Sequence Detection System and 7300 Real Time PCR System (Perkin Elmer/Applied Biosystems, USA) using the Maxima R SYBR Green/ROX qPCR Master Mix (2X) kit (Fermentas LifeSciences, USA). MicroAmp Optical 96well reaction plates (Applied Biosystems, USA) and Plate Max ultraclear sealing films (Axygen Inc, USA) were used in these experiments. Amplification conditions were 10 min at 95°C, followed by a two-step cycle at 95°C for 15 sec and 60°C for 60 sec for a total of 40 cycles, to finish with a dissociation protocol (95°C for 15 sec, 60°C for 1 min, 95°C for 15 sec and 60°C for 15 sec). DNA sequences of the primers for specific amplifications were designed using the Primer Express software (Perkin Elmer/Applied Biosystems, USA). Some of these sequences have been previously published [9][10][11] and the rest are included in table S3 presented in Additional file 4. All RT-qPCR experiments complied with the MIQE guidelines (Minimum Information for Publication of Quantitative Real-Time PCR Experiments) [60,61]. The length of all the utilized oligonucleotides (forward and reverse), was between 18 and 21 nucleotides, with % of GC between 45 to 60 and Tm between 58 to 60°C. The size of all amplimers was 101 bp. The final primer concentration was 0.2 μM, in a total volume of 12 μl. Five ng of target cDNA for each gene were added to the reaction mixture, since higher cDNA concentrations (>10 ng) are not in the dynamic range of the reference ihfB gene (see below). Hence the obtained values cannot be correctly normalized for this higher cDNA concentration. All experiments were performed at least in triplicate from three different fermentations, for each gene of each strain, obtaining very similar values (differences <0.3 SD). A non-template control reaction mixture was included for each gene and values appeared for all genes after cycle 31. Standard curves were constructed to evaluate PCR efficiency and all the genes had R 2 values above 0.9976 with slopes between −3.4 to −3.7. The quantification technique used to analyze data was the 2 -ΔΔ Cq method described by Livak and Shmittgen [62]. Data were normalized using the ihfB gene as an internal control (reference gene). The same reproducible expression level of this gene was detected in all the strains in the conditions in which bacteria were grown and analyzed, since this is the most important characteristic that a reference gene should have in accordance with the MIQE guidelines. Additional file 5 ( Figure S2) presents the ihfB gene values detected for the utilized strains. These results demonstrate the stability of the expression of this reference gene in all the analyzed derivatives for the used conditions in this report and also on previous reports utilizing these strains and other derivatives [9][10][11]60].
For each analyzed gene in all strains the transcription level of the corresponding JM101 gene, was considered equal to one, and it was used as control to normalize the data. Therefore, data are reported as relative expression levels, compared to the expression level of the same gene in the JM101 strain. Results presented in Table 3 are the averages of at least three independent measurements of the RT-qPCR expression values for each gene. Values were obtained from different cDNAs generated from two independent bioreactor samples [9].

Additional files
Additional file 1: Table S1. Mutations in coding regions detected by Roche NimbleGen Inc. This table includes the data provided by Roche NimbleGen Inc. (RN) for the whole genome sequence analysis of the evolved PB12 strain. Section A lists 26 (21+5) non-synonymous point mutations in structural genes that accordingly to RN changed the coding regions of these genes. In fact only in 21 of these genes detected also by Winter Genomics Inc (WG), the mutations occurred (Table 1A). Section B presents the list of the 12 genes included in a large deletion in this strain. This list also includes the genes deleted in the parental PB11 strain, for the construction of this derivative lacking PTS (Figure 2). The table also includes (Section C), the list of the 20 genes in which synonymous point mutations occurred accordingly to RN. Those 16 in common with WG are in bold letters.
Additional file 2: Table S2. Mutations in coding regions detected by Winter Genomics Inc. This table lists the data provided by Winter Genomics Inc. (WG) obtained for the whole genome sequence of the PB12 strain in comparison to the parental strains JM101 and PB11. Importantly, PB11 strain was sequenced by WG and the same nucleotide sequences as in the parental strain JM101 were determined (data not shown). Therefore all the point mutations detected in PB12 by RN and WG appeared during the laboratory evolution process. Section A includes a list of 27 genes (21+6) in which, accordingly to this company nonsynonymous point mutations occurred changing the coding regions in structural genes. 21 of these genes were also detected by RN (Table 1A). The table also includes (Section B) the list of 18 genes in which synonymous mutations also occurred, accordingly to WG. Those 16 in common with RN are in bold letters.
Additional file 3: Figure S1. Nucleotide sequence of the chromosomal genes fusion that occurred in the evolved PB12 strain. This figure includes the nucleotide sequence of the genomic region where the deletion occurred (Figures 3 and 4) between the ptsP and the galR genes in the PB12 strain.
Additional file 4: Table S3. Oligonucleotides employed in this study. This table lists the oligonucleotides utilized in this work. Section A shows the oligonucleotides used for DNA sequencing with the Sanger method, including those for the confirmation of the deletion that occurred in the PB12 strain (Figures 2, 3 and 4), the reported mutations provided by RN and WG (Table 1 and Tables S1 and S2 in Additional file 1 and Additional file 2) and the ones employed for gene disruption confirmation. Section B lists the oligos utilized for gene disruption with the Datsenko-Wanner methodology [46]. Section C lists the oligonucleotides utilized for RT-qPCR analysis not previously reported. The sequences of the oligos utilized for the remaining genes listed in Table 3, have been previously published [9][10][11] (see Materials and Methods).