Genome rearrangements induce biofilm formation in Escherichia coli C – an old model organism with a new application in biofilm research

Background Escherichia coli C forms more robust biofilms than other laboratory strains. Biofilm formation and cell aggregation under a high shear force depend on temperature and salt concentrations. It is the last of five E. coli strains (C, K12, B, W, Crooks) designated as safe for laboratory purposes whose genome has not been sequenced. Results Here we present the complete genomic sequence of this strain in which we utilized both long-read PacBio-based sequencing and high resolution optical mapping to confirm a large inversion in comparison to the other laboratory strains. Notably, DNA sequence comparison revealed the absence of several genes thought to be involved in biofilm formation, including antigen 43, waaSBOJYZUL for lipopolysaccharide (LPS) synthesis, and cpsB for curli synthesis. The first main difference we identified that likely affects biofilm formation is the presence of an IS3-like insertion sequence in front of the carbon storage regulator csrA gene. This insertion is located 86 bp upstream of the csrA start codon inside the − 35 region of P4 promoter and blocks the transcription from the sigma32 and sigma70 promoters P1-P3 located further upstream. The second is the presence of an IS5/IS1182 in front of the csgD gene. And finally, E. coli C encodes an additional sigma70 subunit driven by the same IS3-like insertion sequence. Promoter analyses using GFP gene fusions provided insights into understanding this regulatory pathway in E. coli. Conclusions Biofilms are crucial for bacterial survival, adaptation, and dissemination in natural, industrial, and medical environments. Most laboratory strains of E. coli grown for decades in vitro have evolved and lost their ability to form biofilm, while environmental isolates that can cause infections and diseases are not safe to work with. Here, we show that the historic laboratory strain of E. coli C produces a robust biofilm and can be used as a model organism for multicellular bacterial research. Furthermore, we ascertained the full genomic sequence of this classic strain, which provides for a base level of characterization and makes it useful for many biofilm-based applications.


Background
Escherichia coli is a model bacterium and a key organism for laboratory and industrial applications. E. coli strain C was isolated at the Lister Institute and deposited into the National Collection of Type Cultures, London, in 1920 (Strain No. 122). It was characterized as more spherical than other E. coli strains and its nuclear matter was shown to be peripherally distributed in the cell [1]. E. coli C, called a restrictionless strain, is permissive for most coliphages and has been used for such studies since the early 1950's [2]. Genetic tests showed that E. coli C forms an O rough R1-type lipopolysaccharide (LPS), which serves as a receptor for bacteriophages [3]. Its genetic map, which shows similarities to E. coli K12, was constructed in 1970 [4]. It is the only E. coli strain that can utilize the pentitol sugars, ribitol and Darabitol, and the genes responsible for those processes were acquired by horizontal gene transfer [5]. Some research on genes involved in biofilm formation in this strain has been attempted but hasn't been continued (Federica Briani, Università degli Studi di Milano, personal communication) [6].
Biofilm is the most prevalent form of bacterial life in the natural environment [7][8][9][10][11]. However, in laboratory settings, for decades, bacteria have been grown in liquid media in shaking, highly aerated conditions, which select for the planktonic lifestyle. While all laboratory strains of E. coli, such as K12, B, W, and Crooks, are poor biofilm formers, environmental isolates usually form robust biofilms. These E. coli strains can cause diarrhea and kidney failure, while others cause urinary tract infections, chronic sinusitis, respiratory illness and pneumonia, and other illnesses [12][13][14][15]. Many of these symptoms are correlated with biofilms. A few E. coli K12 mutant strains have been described as good biofilm formers, such as the csrA mutant or AJW678 [16,17], but one can claim that these mutants cannot occur in natural conditions. Therefore, it is important to find a safe laboratory strain that can serve as a model for biofilm studies.
We found that the E. coli C strain forms a robust biofilm under laboratory conditions. The complete genome sequence of this strain was determined and bioinformatics analyses revealed the molecular foundations underlying this phenotype. A combination of experimental and in silico analysis methods allowed us to unravel the two major mechanisms that draw the biofilm formation in this strain.

Biofilm formation
In our search for a good model biofilm strain, we screened our laboratory collection of E. coli strains using the standard 96-well plate assay [18] and the glass slide assay [19] (Fig. 1a). We found that the E. coli C strain formed robust biofilms on both microscope slides and in 96-well plates. In minimal M9 with glycerol medium, the strain C produced 1.5-to 3-fold more biofilm than the other laboratory strains; and in Luria-Bertani (LB) rich medium, the strain C biofilm formation was as much as 7.4-fold higher (Fig. 1b).
During overnight growth in LB medium at 30°C shaken at 250 rpm, we noticed an increased aggregation of bacterial cells in the E. coli C culture (Fig. 2a). The ratio of planktonic cells to total cells in the culture was 0.35 compared to 0.83 and 0.85 for Crooks and B and 0.98 or almost 1 for K12 and W, respectively (Fig. 2b).
Aggregation at low temperature depends on salt concentration Previously, we have described a regulatory loop affecting biofilm formation in a high salt/high pH environment. This loop involved the nhaR, sdiA, uvrY, and hns genes, as well as the csrABCD system [20]. We were interested if the aggregation of E. coli C depends on NaCl concentration. We grew the bacteria in three LB broth media containing different amounts of salts: Miller broth (1% NaCl), Lennox broth (0.5% NaCl), and a modified Lennox broth with 0.75% NaCl. After overnight growth at 30°C in culture tubes shaken at 250 rpm, we observed a lack of aggregation in standard Lennox medium, while in the modified Lennox medium and Miller broth the ratio of planktonic to total cells was similar (Fig. 3). The ratio of planktonic/total cells in the Lennox medium was statistically different (p < 0.00001, One-Way ANOVA test) from that in media with a higher NaCl concentration and similar to other strains grown in LB Miller broth (Fig. 2b).

E. coli C genomic sequence
The genomes of E. coli K12, E. coli B, E. coli W, and E. coli Crooks (GenBank:CP000946) have already been sequenced [21][22][23]. To compare the genomic sequences of all five laboratory strains, we sequenced the E. coli C genome. The chromosome consisted of 4,617,024 bp and encoded 4581 CDSs (Fig. 4). No extrachromosomal DNA was detected. The mean G + C content was 51%. We identified 7 rRNA operons, 89 tRNA genes, and 12 ncRNAs (total 121 RNA genes CP020543.1). The only methylation signal in that genome was Dam methylation. We found that 38,387 out of 38,406 (99.95%) of the GATC motifs had evidence of m6A.
Comparison with other laboratory E. coli strains showed a high degree of synteny except for an inverted 300 kb region between 107 and 407 kb (Fig. 5). That inverted region showed also an inverted GC skew in comparizon to the flanking regions, indicating a recent inversion event or an assembly error (Fig. 4). To prove that the inversion represented an actual event, we used an optical mapping method [24]. The order of obtained fluorescently labelled fragments was identical with the in silico constructed map of the E. coli C chromosome (Fig. 6a), indicating the authenticity of the inversion.
A similar comparison of E. coli K12 maps confirmed the stringency and precision of the optical mapping results (Fig. 6b). Comparison between the two optical maps confirmed that the PacBio-predicted inversion of the 300-kb DNA fragment was indeed a real event (Fig. 6b).

Genetic content
A maximal likelihood tree showed that E. coli C was most similar to the K12 strain (Additional file 1: Figure S1). A comparison of chromosomal protein-coding orthologs among the laboratory strains showed that, out of the 5686 predicted CDSs, 3603 were shared among all five strains.
Only 37 genes were present in all four of the other lab strains that were absent in E. coli C (Additional file 11: Table S1) (Fig. 7). Out of 177 genes that were unique to E. coli C, 108 encoded transposases or unknown proteins and 69 CDSs showed homology to known proteins (Table 1).

Genes involved in biofilm formation
Several genes have been ascribed active roles in biofilm formation in E. coli [26][27][28]. One of the most important is the flu gene encoding the antigen 43 protein [29]. In liquid culture, Ag43 leads to autoaggregation and clump formation rapidly followed by bacterial sedimentation. Surprisingly, the flu gene was not present in the E. coli C genome. We identified a few autotransporter encoding genes, which showed partial homology to Ag43, such as B6N50_05815 (50% similarity over 381aa); however, the homology was too weak to suggest that these genes could play a similar role. Surface polysaccharides often play an important role in biofilm formation [27,30]. E. coli C forms an O rough R1-type lipopolysaccharide, which serves as a receptor for bacteriophages [3]. Out of the 14 waa genes present in E. coli K12, we were able to find only 6 in E. coli C. Out of the 5 genes waaA, waaC, waaQ, waaP, and waaY, which are highly conserved and responsible for assembly and phosphorylation of the inner-core region [31], only the first 4 were present in E. coli C (Additional file 2: Figure S2). Two remaining genes in E. coli C were waaG, whose product is an α-glucosyltransferase that adds the first residue (HexI) of the outer core, and waaF, which encodes for a HepII transferase [31]. Biofilm formation by a deep rough LPS hldE mutant of E. coli BW25113 strain was strongly enhanced in comparison with the parental strain and other LPS deficient mutants. The hldE strain also showed a phenotype of increased autoaggregation and stronger cell surface hydrophobicity compared to the wild-type [32]. The gene hldE, which encodes for a HepI transferase, was found in the E. coli C strain. Other mutants in LPS core biosynthesis, which resulted in a deep rough LPS, have been described to decrease adhesion toabiotic surfaces [33]; therefore, we assumed that other genes in this family would not be responsible for the increased biofilm formation by E. coli C.
We noticed also that wzzB, a regulator of length of Oantigen component of LPS chains was mutated by an IS3 insertion. Another IS insertion was located in UDPglucose 6-dehydrogenase (B6N50_08940). Both of these genes were located at the end of a long 35 operon-like gene stretch in E. coli C, including wca operon [34] consisting of 19 genes involved in colanic acid synthesis.
We found that the region involved in biosynthesis of poly-β-1,6-N-acetyl-glucosamine (PGA) was almost 100% identical in both K12 and C strains.
Other types of structures involved in biofilm formation are fimbriae, curli, and conjugative pili [26,27,35]. Type 1 pili can adhere to a variety of receptors on eukaryotic cell surfaces. They are well-documented virulence factors in pathogenic E. coli and are critical for biofilm formation on abiotic surfaces [36][37][38][39][40]. Type 1 pili are encoded by a contiguous DNA segment, labeled the fim operon, which contains 9 genes necessary for their synthesis, assembly, and regulation [41,42]. In E. coli C, almost the entire fim operon except the fimH, which codes for the mannose-specific adhesin located at the tip of the pilus, was absent and replaced by a type II group integron (Additional file 3: Figure S3). The entire fim coli K12, and E. coli W. b Ratio of planktonic cells to total cells measured as OD 600 . c Microscopic picture of the E. coli C precipitate operon is driven by a single promoter located upstream of the fimA gene; therefore, it is possible that the fimH gene is not expressed in E. coli C. Although we cannot exclude the role of FimH in autoaggregation of E. coli C, reports that the function of FimH was inhibited by growth at temperatures at or below 30°C [43] make it highly unlikely.
Chaperone-usher (CU) fimbriae are adhesive surface organelles typical to many gram-negative bacteria. E. coli genomes contain a large array of characterized and putative CU fimbrial operons [44]. Korea at al. characterized the ycb, ybg, yfc, yad, yra, sfm, and yeh operons of E. coli K-12, which display sequence and organizational similarities to type 1 fimbriae exported by the CU pathway [45]. They showed that, although these CU operons were not well expressed under laboratory conditions, 6 of them were nevertheless functional when expressed and promote attachment to abiotic and/or epithelial cell surfaces [45]. A total of 10 CU operons have been identified in E. coli K12 MG1655 [44]. We identified all 10 CU operons in the E. coli C genome. Furthermore, we found that the IS5 insertion in the K12 yhcE gene was not present in E. coli C (Additional file 4: Figure S4A). We also noticed that two insertion sequences were inserted in the yad region (Additional file 4: Figure S4B).
Curli are another proteinaceous extracellular fiber involved in surface and cell-cell contacts that promote community behavior and host cell colonization [46]. Curli synthesis and transport are controlled by two operons, csgBAC and csgDEFG. The csgBA operon encodes the major structural subunit CsgA and the nucleator protein CsgB [47]. CsgC plays a role in the extracellular assembly of CsgA. In the absence of CsgB, curli are not assembled and the CsgA -main subunit protein, remains unpolymerized when secreted from the cell [46]. The csgDEFG operon encodes 4 accessory proteins involved in assembly of curli. The csgBA operon is positively regulated by transcriptional regulator CsgD [47]. We found that the intergenic region between csgBA and csgDEFG has been modified in E. coli C. An IS5/IS1182 family transposase was inserted between 106 bp upstream of the csgD gene and 96 bp inside the csgA gene (Additional file 5: Figure S5). The entire csgB gene as well as the first 32aa of CsgA have been deleted. The full CsgA protein in E. coli K12 contains 151aa while the truncated version in strain C consisted of only 107aa and might not be expressed. Furthermore, csgD expression is driven by a promoter located~130 bp upstream [48,49]. The IS5/ IS1182 family transposase inserted between that promoter and the csgD gene was transcribed in the same direction, so it might not cause a polar mutation but definitely would interfere with the sophisticated regulation of csgD expression by multiple transcription factors [48,49]. As E. coli C did not carry any extrachromosomal DNA, conjugative pili, which usually play an important role in biofilm formation [50], were not analyzed. Biofilm formation is a bacterial response to stressful environmental conditions [9]. This response requires an orchestra of sensors and regulators during each step of the biofilm formation process. We analyzed a few of the most important mechanisms, such as CpxAR, RcsCD, and EnvZ/OmpR [27]. In all three cases, we observed the same gene structure and a high degree of DNA sequence identity between the E. coli C and K12 strains.
Another regulatory loop includes the carbon storage regulator csrA and its small RNAs [51]. Mutations within the csrA gene induced biofilm formation in many bacteria [17,51]. Recently, the CsrA regulation has been connected with multiple other transcription factors, including NhaR, UvrY, SdiA, RecA, LexA, Hns, and many more [20,52]. The regulatory loop with NhaR protein drew our attention as it is responsible for integrating the stress associated with high salt/high pH and low temperature [20]. We found that the nhaAR and sdiA/uvrY regions of E. coli C were almost identical with the corresponding regions in the K12 strain. We amplified and sequenced the csrA gene from the E. coli C strain to verify its presence and integrity (Additional file 6: Figure S6). Detailed analysis of the csrA region revealed the presence of an IS3-like insertion sequence 86 bp upstream of the ATG codon (Fig. 8). The csrA gene is driven by 5 different promoters [53]. The distal (− 227 bp) promoter P1 is recognized by sigma 70 and sigma 32 factors and enhanced by DskA. The P2 (− 224 bp) promoter depends on sigma 70 . Both the P1 and P2 promoters are relatively weak promoters [53]. The P3 promoter is located 127 bp upstream of csrA and it is recognized by the stationary RpoS (sigma 32 ) polymerase. This promoter is the strongest promoter of csrA gene. Promoters P4 and P5 are located 52 bp and 43 bp, respectively, upstream of the csrA gene. These promoters are driven by the sigma 70 polymerase and are active mainly during exponential growth [53]. The IS3 insertion was located within the − 35 region of the P4 promoter. That location should almost completely abolish expression of the csrA gene in the stationary phase of bacterial growth and probably was the main reason for increased biofilm production by the E. coli C strain. Both small RNAs, csrB and csrC, which regulate CsrA activity, were found unchanged in the E. coli C genome.  Confirmation of IS3 insertion and its complementation by overexpression of csrA gene First we compared the biofilm formation ability of E. coli C and the K12 csrA mutant. The 72-h-old biofilms of both strains formed on microscope slides were similar (Additional file 7: Figure S7A). The 24-h 96-well plate biofilm assay showed that at 37°C the K12 csrA mutant formed 30% more biofilm than E. coli C (p = 0.001, Student t-test). At 30°C strain C produced more biofilm, but the difference was not statistically significant (Additional file 7: Figure S7B), although the csrA mutant ag-gregated~56% more efficiently than the E. coli C strain in the same conditions. To confirm the presence of the IS3 insertion in the csrA promoter region, we designed PCR primers specific for the alaS-csrA intergenic region. Amplification results confirmed the presence of IS3 in the E. coli C promoter region (Additional file 8: Figure S8). To see if extrachromosomal expression of the CsrA protein affects the aggregation phenotype, we cloned the csrA gene downstream of a plac promoter in pBBR1MCS-5 [54], resulting in plasmid pJEK718 or downstream of the constitutive pcat (chloramphenicol) promoter in pJEK786. Plasmids were transformed into E. coli C strain and the resulting clones were grown in LB Miller broth (30°C, 250 rpm). The results showed that the ratio of planktonic to total cells in E. coli C carrying both constructs overexpressing the csrA gene was~1.8 times higher (f-ratio = 78.12363, p < 0.00001) than in the control carrying the non-recombined vector (Fig. 9). We also noticed that the control strain showed a slightly higher amount of planktonic cells than the plasmidless control (shown on Fig. 3) (0.46 vs. 0.36), although the difference was not statistically significant (p = 0.09, Student t-test).
Expression of csrA promoter in E. coli K12 and E. coli C To analyze activities of the csrA promoter from E. coli C, we cloned PCR products containing sequences upstream of the csrA gene (Additional file 8: Figure S8) into a pAG136 plasmid vector carrying promoterless EGFP-YFAST reporters (pJEKd1750) [55]. The E. coli C csrA promoter was overexpressed in both strains however, the promoter activity was much stronger in the native strain than in K12 (Additional file 9: Figure S9). We notice that the highest differences (3.2 and 2.4, at 37°C and 30°C, respectively) occurred at the late exponential phase (~4.5 h and~10 h) (Additional file 9: Figure S9). We noticed that the presence of an additional copy of pcsrA in a high copy number plasmid induced aggregation of E. coli C at 37°C. The ratio of planktonic/total cells was similar (Additional file 10: Figure S10) to that obtained for Fig. 7 Comparison of orthologous CDSs among C, W, K-12, B, and Crooks strains. The number of shared genes, the number (log 10 ) of unique genes, and the genes shared between one, two, three, and four strains are shown. Graph was generated with the UpSet software [25]  The aggregation phenotype was correlated with the highest pcsrA activity at the entrance to the stationary phase (data not shown). As the aggregation might affect the measurements we decided to use a colony assay to measure the promoter activity over the long time. The LB agar plates with spots of E. coli C and K12 carrying pJEKd1751 reporter plasmids with a short half-life form of GFP [ASV] were incubated at 30°C and 37°C and the fluorescence activity was measured by a Typhoon 9400 Variable Mode Imager (Fig. 10). The data showed an increased pcsrA activity over the 72 h time period in both strains with much higher activity in the native E. coli C strain (Fig. 10). The highest differences between the two strains, 8.15 and 4.71, were observed at 72 h at 30°C and 37°C, respectively (Fig. 10). As the half-life of the GFP [ASV] is only 110 min [56], we concluded that in the K12 strain pcsrA promoter was active mostly at the stationary phase while in the E. coli C its activity was quasi constitutive, but also enhanced at the stationary phase (Fig. 10). To test that hypothesis we analyzed the spatial expression of the pcsrA promoter in 72 h old bacterial colonies using a fluorescence microscope (Fig. 11). The pictures fully supported our premises.
In the E. coli C the entire colony showed an intensive fluorescence with the highest level in the center (Fig. 11a). In the K12 strain we noticed 5 discrete zones with different fluorescence activities (Fig. 11b). The edge of the colony, which should consist of the youngest, still dividing and metabolically active cells,  Fig. 9 Complementation of E. coli C aggregation phenotype by introduction of pJEK718 and pJEK786 plasmids overexpressing the CsrA protein  showed the lowest, while the center of the colony with the oldest cells showed the highest fluorescence (Fig. 11b).
Location of IS3-like insertions in E. coli C genome and role of ISs in biofilm gene expression Based on the E. coli C pcsrA promoter structure in comparison to the pcsrA-K12 [53] and its transcriptional activities, we concluded that the small 80-bp region containing the P4 and P5 promoters could not be solely responsible for the csrA transcription. Insertion sequences play a huge role in bacterial genome evolution [57]. They can also insert upstream of a gene and activate its expression [58]. Out of 177 genes that were unique to E. coli C, 55 encoded transposases (Additional file 11: Table S1).
Using BLAST, we found that the IS3-like sequence present in front of the csrA gene was present in 19 other locations throughout the genome (data not shown). Analyzing these locations, we found that in 12 cases the IS3 might drive the expression of downstream located genes (Table. 2). One of the most striking observations was that the IS3-like sequence was located in front of an alternative sigma 70 factor, which was not present in the K12 strain (Table. 2). Based on the pcsrA expression, we concluded that a promoter located inside the IS3 drives permanent expression of the following genes. The presence of the constitutively expressed alternative sigma 70 factor in the E. coli C can drive expression of the sigma 70 promoters in a growth phase independent manner. As the remaining E. coli C csrA promoters P4 and P5 are sigma 70 dependent promoters [53], it might explain their strong activity along all the cell growth phases. Further studies will be conducted to prove that hypothesis.

Discussion
E. coli is the most common bacterial research model organism. Out of the five strains used only the E. coli C genome has not been sequenced. Here, we sequenced and analyzed the E. coli C genome and revealed its specific features that lead to enhanced biofilm formation. Recently, a new E. coli strain C genome has been submitted to the GenBank database (CP029371.1). Homology search revealed that this strain was not closely related to our strain. However, the sequence homology search of GenBank available E. coli genomes revealed that two isolates, WG5 (CP024090.1) and NTCT122 (LT906474.1), showed identical csrA promoter regions. Strain WG5 is in fact an E. coli C derivative resistant to nalidixic acid [59,60]. This E. coli C, also known as strain CN, is publicly available in the ATCC (ATCC number 700078). We found that our sequence is very similar to the WG5 sequence, although the inverted 300 kb region between 107 and 407 kb was not present in WG5. Also some of the insertion sequences were not present in the WG5 genome. These findings again revealed a role of different mobile elements in genome rearrangements and evolution. As the bacterial genome undergoes a constant evolution and adaptation [61] and bacterial mobile elements are the most common mechanism of those processes [62,63], one may ask why in this particular strain, unlike the other laboratory strains, the selection toward planktonic cells did not take place. There is no simple answer; however, we can speculate that as this strain is used for proliferation of bacteriophages the fact that phages kill planktonic cells might reduce the selection toward free floating cells. The second hypothesis is that for bacteriophage research using the E. coli C, the ATCC recommends low-salt (0.5% NaCl) or no salt Nutrient (#139) broth medium. As we showed, the low-salt medium reduced bacterial stress and most likely reduced the level of genome rearrangements, keeping the natural properties for biofilm formation characteristic for the wild-type strains in this laboratory E. coli C strain.

Conclusions
Biofilms are the most prevalent form of bacterial life [9,30] and as such have drawn significant attention from the scientific community over the past quarter century. However, only in 2018 did the number of biofilm related articles reach 24,000, based on a Google Scholar search. As in all other fields, biofilm research needs to develop and follow standard protocols and methods that can be used in different laboratories and give comparable results. Unfortunately, a standardized methodological approach to biofilm models has not been adopted, leading to a large disparity among testing conditions. This has made it almost impossible to compare data across multiple laboratories, leaving large gaps in the evidence [64]. In our work, we described and characterized biofilm formation in the classic laboratory strain, E. coli C [2,65].
We have used that strain in our biofilm-related research for almost a decade and we would like to share it with the biofilm community and propose to use it as a model organism in E. coli-based biofilm-related research.

Biofilm assays
Biofilms on microscope slide were grown as described previously [19]. For biofilm formation on a polystyrene surface, flat-bottom 96-well microtiter plates (Corning Inc.) were used [18]. E. coli overnight cultures were diluted 1:40 in fresh medium, and 150-μL aliquots were dispensed into wells. After 24 h of incubation (37°C), cell density was measured (OD 600 ) using a plate reader, and 30 μL of Gram Crystal Violet (Remel) was applied for staining for 1 h. Plates were washed with water and air dried, and crystal violet was solubilized with an ethanolacetone (4:1) solution. The OD 570 was determined from this solution, and the biofilm amount was calculated as the ratio of OD 570 to OD 600 [19].

Construction of CsrA overexpressing strain
A 277-bp DNA fragment containing the csrA gene was amplified using csrAF-aaa GAATTCGTAATACGACTC ACTATAGGGTTTC csrAR -aaaGAATTCTTTGAGG GTGCGTCTCACCGATAAAG primers. This fragment was cloned directly into the EcoRI site of the pBBR1MCS-5 vector [54]. Sequence orientation was verified by DNA sequencing and the correct clone with csrA gene downstream of the plac promoter was named pJEK718. To express the csrA gene with a constitutive pcat (chloramphenicol) promoter, a PCR amplified cat gene (870 bp, catF-aaaGATCCTGGTGTCC CTGTTGATACCGGGAA; cat-R-aaa GGATCCCCCA GGCGTTTAAGGGCACCAATAAC) was cloned in the BamHI site of one of the clones that carried the csrA gene in the orientation opposite to the plac promoter in the pBBR1MCS-5 vector. Selection for Cm-resistant clones ensured the promoter activity and the correct orientation was verified by PCR with catF/csrAR primers and DNA sequencing. The correct plasmid was named pJEK786. Plasmids were introduced into the E. coli C strain by TSS transformation [66].

Confirmation of IS3 insertion and construction of GFP reporter fusions
PCR fragments containing the csrA promoter were amplified using pcsrA aaaagatctCTGATTGCAGGCGTATCTAAGG and pcsrAR aaatctagaAAAGATTAAAAGAGTCGGGTCT CTCTGTATCC primer pair from both E. coli K12 and C strains and cloned into the BglII/XbaI site of the pAG136 plasmid [55] or the SmaI site of the pPROBE-GFP [LVA] promoter probe vector [56]. All constructs were verified by DNA sequencing. Plasmids were introduced into both the E. coli K12 and C strains by a TSS transformation [66]. GFP activity (OD 480-520 ) was measured using BioTek Synergy HT (BioTek) or Tecan InfiniteM200 Pro (Tecan) plate readers and normalized to the optical density of the culture (OD 600 ), yielding relative fluorescence units (RFU; FL 480-520 /OD 600 Nicked DNA was labeled with a fluorescent-dUTP nucleotide analog using Taq polymerase (NEB) for 1 h at 72°C. Nicks were repaired with Taq ligase (NEB) in the presence of dNTPs. The backbone of fluorescently labeled DNA was stained with YOYO-1 (Invitrogen). Labeled DNA molecules entered nanochannel arrays of an IrysChip (Bionano Genomics) via automated electrophoresis. Molecules were linearized in the nanochannel arrays and imaged. An in-house image detection software detected the stained DNA backbone and locations of fluorescent labels across each molecule. The set of label locations within each molecule defined the singlemolecule maps. The E. coli strain C reference sequence was in silico nicked with Nt.BspQI. Raw single-molecule maps were filtered by minimum length of 150 kbp. Molecule maps were aligned to the E. coli reference map with OMBlast. OMBlast is an optical mapping alignment tool using a seed-and-extend approach and allows splitmapping [68]. Alignments were performed with the OMBlastMapper module (version 1.4a) using the following parameters: --writeunmap false --optresoutformat 2 --falselimit 8 --maxalignitem 2 --minconf 0. Molecule maps with partial alignments to regions flanking the putative insertion breakpoint coordinates were extracted from the alignment output file. Molecule maps were manually inspected for label matches in segments 5′ and 3′ to the putative inverted region and into the inversion. The non-aligned segments of these maps, which extended into the inverted region with label matches to the opposing side in a reverse fashion, were retained.

Statistical analysis
Statistical analysis was carried out in the R computing environment and in Graphpad. One-way ANOVA was calculated using an online tool (https://www.socscistatistics.com/tests/anova/default2.aspx) or R package. Relevant statistical information is included in the methods for each experiment. Error bars show standard deviation from the mean. Asterisks represent statistical significance at p < 0.05.