Comparison of strand-specific transcriptomes of enterohemorrhagic Escherichia coli O157:H7 EDL933 (EHEC) under eleven different environmental conditions including radish sprouts and cattle feces
BMC Genomics volume 15, Article number: 353 (2014)
Multiple infection sources for enterohemorrhagic Escherichia coli O157:H7 (EHEC) are known, including animal products, fruit and vegetables. The ecology of this pathogen outside its human host is largely unknown and one third of its annotated genes are still hypothetical. To identify genetic determinants expressed under a variety of environmental factors, we applied strand-specific RNA-sequencing, comparing the SOLiD and Illumina systems.
Transcriptomes of EHEC were sequenced under 11 different biotic and abiotic conditions: LB medium at pH4, pH7, pH9, or at 15°C; LB with nitrite or trimethoprim-sulfamethoxazole; LB-agar surface, M9 minimal medium, spinach leaf juice, surface of living radish sprouts, and cattle feces. Of 5379 annotated genes in strain EDL933 (genome and plasmid), a surprising minority of only 144 had null sequencing reads under all conditions. We therefore developed a statistical method to distinguish weakly transcribed genes from background transcription. We find that 96% of all genes and 91.5% of the hypothetical genes exhibit a significant transcriptional signal under at least one condition. Comparing SOLiD and Illumina systems, we find a high correlation between both approaches for fold-changes of the induced or repressed genes. The pathogenicity island LEE showed highest transcriptional activity in LB medium, minimal medium, and after treatment with antibiotics. Unique sets of genes, including many hypothetical genes, are highly up-regulated on radish sprouts, cattle feces, or in the presence of antibiotics. Furthermore, we observed induction of the shiga-toxin carrying phages by antibiotics and confirmed active biofilm related genes on radish sprouts, in cattle feces, and on agar plates.
Since only a minority of genes (2.7%) were not active under any condition tested (null reads), we suggest that the assumption of significant genome over-annotations is wrong. Environmental transcriptomics uncovered hitherto unknown gene functions and unique regulatory patterns in EHEC. For instance, the environmental function of azoR had been elusive, but this gene is highly active on radish sprouts. Thus, NGS-transcriptomics is an appropriate technique to propose new roles of hypothetical genes and to guide future research.
Humans infected by enterohemorrhagic Escherichia coli O157:H7 (EHEC) suffer from gastroenteritis. Sometimes they develop hemorrhagic colitis or hemolytic uremic syndrome which can cause kidney failure [1, 2]. Treatment of an EHEC infection with antibiotics is under debate since this can increase the risk for the hemolytic uremic syndrome . Therefore, much effort should be put into prevention of transmission. However, this is complicated due to the low infectious dose of less than 50 bacterial cells . Infection sources are multiple [5, 6]: bacteria can persist and reproduce in soil, dung, water or other environmental niches, eventually causing fresh produce to be contaminated . Typical vectors for EHEC outbreaks include spinach, apple juice, unpasteurized milk, lettuce, but also meat products such as sausage . A large outbreak in Japan 1996 caused more than 6000 infections and was due to contaminated radish sprouts . Fenugreek sprouts (Trigonella foenum-graecum) caused a severe outbreak with more than 3800 infected and 53 dead in Germany in 2011. The sprouts were contaminated with a related bacterium, Escherichia coli O104:H4 [9, 10]. Thus, the spectrum of environmental niches of pathogenic E. coli is quite large, ranging from water, single cell organisms to plant and lower animals and vertebrates [7, 11, 12].
Gene regulation of EHEC has been studied under individual conditions using microarrays or related techniques [13–15]. However, microarrays are limited, especially when examining rare or highly abundant transcripts or unknown genes. New methods in transcriptome analysis such as strand-specific RNA-seq using Next Generation Sequencing (NGS) technologies have a much higher resolution . To date, only a few studies examined bacterial pathogens (e.g. [17–19]). In this work, we applied strand-specific RNA-seq to EHEC to identify genes involved in environmental and plant persistence with a special focus on hypothetical genes.
About one third of the genes of EHEC are still annotated as hypothetical. Hypothetical proteins are defined as genes that have no homology to any other predicted protein in any species  and the function of these genes is largely unknown. After sequencing a new genome their existence is predicted by annotation tools, e.g., GLIMMER [20, 21] or GeneMarkS . At this stage, there is no experimental evidence for the expression of these genes. A characterization of all hypothetical genes at the current rate would take decades [23, 24]. However, transcription studies allow confirmation of the activity of hypothetical genes, pre-characterize and remove them from the hypothetical category [24, 25]. The expression of some hypothetical proteins of EHEC has already been reported in single environmental studies, e. g., during heat shock  or in adhesion to bovine epithelial cells . However, global approaches, which cover a large environmental spectrum to identify functional hypothetical genes, are still missing. We therefore sequenced the transcriptomes of several EHEC-cultures from a high diversity of conditions strand-specifically to derive transcriptional patterns and global trends.
Results and discussion
In order to test the reproducibility of the sequencing process, two technical replicates of barcoded libraries of two conditions were generated, spinach medium and LB-nitrite. After cDNA-synthesis the libraries were split and treated independently and the RPKM values of each replicate were compared. The correlation coefficient R2 was analyzed as described in Haas et al. in reads per gene. Since the correlation was excellent (R2 = 1.0, see Figure 1A), as had also been observed for other NGS experiments (e.g., ), we combined those technical replicates for further expression analysis. Next, biological reproducibility was tested by sequencing replicates of the LB reference and the radish sprout condition on two different sequencing platforms SOLiD and Illumina. Despite massive differences in library making techniques and in the sequencing strategy of both platforms, we obtained a high correlation of R2 = 0.72 (Figure 1B). This verified that the observed changes in gene regulation were not due to technical or experimental artifacts.
Taking all sequencing results together, 26.1 million high quality reads mapped to the EHEC genome and to the plasmid pO157 (see Table 1 for a summary of the sequencing statistics). Since total RNA contains up to 95% rRNA , this RNA species was depleted before sequencing. However, averaged over all conditions, 26.4% of the sequenced RNA is remaining rRNA (Table 1). About 1.4% of all reads mapped to the plasmid (Table 1). The plasmid is 92,077 bp in length, which is about 1.7% of the 5,528,445 bp genome. Assuming a comparable transcription of genome and plasmid encoded genes, we calculate the number of plasmid pO157 in a single bacterial cell to be in parity with the genome.
Background transcription and silent genes
Random transcription, also called background transcription or transcriptional noise, has been reported in NGS studies of prokaryotes and eukaryotes (e.g., [31–33]). Single reads distribute all over the genome and are found in coding regions, non-coding regions and antisense to annotated genes. Most of such reads apparently do not form transcriptional units, i.e. they do not originate from non-annotated genes for most cases. It is unclear whether the reads occur due to background noise introduced during deep sequencing experiments or whether they are caused by the low information content of bacterial promoters, resulting in “sloppy” transcription [34, 35]. To see whether such reads mapped simply by chance to the EHEC genome, the RNA-seq data of LB medium in this study was mapped to the mouse Y-chromosome (95 Mbp). Out of 7 million reads, only one matched to the mouse genome sequence, thus all reads appear to be specific.
Generally, transcriptional noise is disregarded as non-functional. However, background transcription interferes with the detection of weakly transcribed genes. Several attempts were made to estimate a threshold to consider a gene as being active. Filiatrault et al. used proteomics data to estimate a threshold for an active gene in comparison with RNA-seq. Mortazavi et al. already estimated an upper bound of background noise in mouse transcriptomes by estimating the RPKM of all regions outside of exons or other transcribed regions, but this inevitably includes non-annotated genes causing a higher upper bound. However, mostly cut-off values have been selected intuitively. For instance, Beaume et al. defined a gene as being significantly active if its transcription is higher than 0.5 of the average sequencing coverage. The disadvantage of all methods applied hitherto is that weakly transcribed genes are below the threshold.
To detect weakly transcribed genes an estimate of a threshold of background transcription was performed for EHEC in order to define a gene as being active. To derive such a threshold value, the background transcription level under different conditions was observed using manually selected regions of the genome which are devoid of annotated genes and any conspicuous transcriptional patterns. These regions comprise a total of 104,192 bp or about 2% of the genome (see Methods). Table 1 lists the RPKM values of background transcription for each condition. The average RPKM value for all conditions, including the biological replicates, is 0.14 (±0.13, standard deviation). In order to see if the “RPKM of the background transcription” is dependent on the sequencing technology used (Illumina or SOLiD), we analyzed an additional data set from EHEC prepared according to the Illumina technology (data not shown). The average background RPKM of 0.13 was found to be in a similar range compared to the eleven conditions sequenced with the SOLiD technology. Thus, the mean level of background transcription compares to a 750 bp stretch of DNA covered by one read in a sequenced library of 10 million reads in EHEC.
For each gene, the probability whether its reads result from background or from activity above background, was calculated (see Additional file 1: Table S2). Of 5,379 annotated genes, the activity of 5,142 was found to be significantly above background (p ≤ 0.05), thus they do not originate from the noise (Table 1).
Filtering for transcriptionally inactive genes at any of the conditions studied, we found only 144 inactive genes which is about 2.7% of the annotated genes (Additional file 2: Table S3). These genes are covered by no read under any of the conditions investigated. 69.4% of the silent genes are hypothetical genes, indicating a potential over-annotation. On the other hand, some hypothetical genes might only be active at conditions not yet probed.We considered a gene as being regulated if its logFC was ≥ 3 or ≤ −1 under at least one condition. Accordingly, the number of regulated genes is about 4% higher for the known genes compared to the hypothetical genes (Figure 2).
Overall comparison of transcriptomes
It was observed that the number of active genes differs for different conditions (Table 1). In feces, the number of active genes is more than 1,000 genes lower compared to sprouts, although both conditions have about the same sequencing depth. This is important since differences in numbers of active genes could have originated from different sequencing depths as this influences the chance of finding a transcript. We show that such an effect of the sequencing depth does indeed influence the number of genes which will be defined as active (Figure 3): the number of active genes asymptotically reaches saturation with an increase in sequencing depth. The same pattern was observed by Haas et al., also for EHEC EDL933. Vivancos et al. show a similar effect for RNA-seq in Mycoplasma pneumonia and Mus musculus. However, the sequencing depth for EHEC grown on radish sprouts and feces is about the same. Therefore, the major difference observed must be of biological significance. We assume that survival of EHEC on radish sprouts requires a larger number of active genes than persistence in cattle feces since the cells have to deal with many environmental factors such as differing water activities, osmotic stress, radiation, temperature changes and low nutrient contents which are not present in cattle feces.
With only 2892 active genes, LB-antibiotics has the lowest number of active genes of all. In comparison, the reference condition LB displays around 4500 active genes (Table 1). Admittedly, LB-antibiotics has the lowest sequencing depth of all. However, as can be seen from Figure 3, the number of active genes is disproportionately low. After antibiotic treatment the cells elongate several times their original cell length. The indirect block of DNA synthesis influences their regulational pattern. Genes of many different pathways are turned off. We visualized this transcriptional pattern of LB-antibiotics in the heat map distance tree (Figure 4). The up-regulated genes (colored in blue) and down-regulated genes (colored in red) do not form the regulatory clusters observed in the other ten conditions: LB-antibiotics forms an outer group (antib in Figure 4). The extreme stress leads to most severe transcriptomic differences. Interestingly, it is the only LB-condition not clustering together with the other LB-based experiments. The four conditions that do not originate from LB medium, i.e. spinach medium, minimal medium, and feces, show a more related regulational pattern. Radish sprouts are closer to the conditions which originated from LB medium. We assume the high similarity of minimal medium and spinach medium as being due to a low nutrient content in both conditions. LB-pH9 and LB-nitrite have the most similar transcriptomic pattern, despite LB-nitrite being slightly acidic (Figure 4).
Transcriptional activity of hypothetical genes
We examined the transcriptional regulation of the 5379 protein-coding genes (GenBank and RefSeq) for the genome and plasmid in EHEC (Additional file 3: Table S4). Out of these genes, 2266 are not in COGs (cluster of orthologous genes), have a general function prediction only or are annotated as hypothetical (completely unknown function). Of the annotated genes on the genome, 32.9% are hypothetical. Table 1 shows a summary of active hypotheticals for each condition. In total, 77.0% of them are active in at least one condition (Table 1). Formerly, most experiments using E. coli refer to standard LB at pH7 or minimal medium. We hypothesized to find additional uniquely up-regulated hypothetical genes under non-standard laboratory conditions. Concentrating on highly regulated genes by using very stringent cut-off thresholds only (logFCs ≥ 5 at a single condition), we found 26 hypothetical genes in LB with antibiotics, 14 in minimal medium, 13 in feces, nine on radish sprouts, and nine in spinach medium. In contrast, three hypothetical genes are active in LB at 15°C, three in LB at pH4, two on solid LB, one in LB with nitrite, and none on LB at pH9 (Table 2, graphic version in Additional file 4). We performed a BLAST search (blastp) to evaluate the taxonomic distribution of these genes. Hits with an E value threshold of 10−5 or lower were taken as indicator for the maximal taxonomic distribution of this gene. According to this definition, 35 hypothetical genes are present only within the genus Escherichia, 17 within Enterobacteriaceae, 19 within proteobacteria, 7 within bacteria, and 2 within “cellular organisms”, respectively.
Transcription of virulence genes
The LEE (Locus of Enterocyte Effacement) pathogenicity island comprises 41 genes responsible for the attachment of EHECs to mammalian host cells and effacing lesions [57, 58]. Table 3 (for a graphic version see Additional file 4) summarizes their regulation. The most prominent up-regulated LEE gene is the secreted effector protein gene espZ (Z5122) in minimal medium (logFC > 5 compared to LB medium). It interacts with several host proteins (see [59–61]). The extremely high transcription level of espZ in minimal medium is quite surprising since it is the only medium completely lacking host cell related compounds. Most other LEE genes encoding the type III secretion system (TTSS) (e.g. Z5132 – Z5135), some translocated proteins like EspG (Z5142), EspH (Z5115), intimin (eae, Z5110), transcriptional regulators (e.g. ler, Z5140), and the chaperone CesD (Z5127) also display high transcript levels (RPKMs) in minimal medium and are also active in LB-antibiotics (Table 3).
Furthermore, 62 non-LEE encoded, virulence associated genes [39, 62] were found to be up-regulated in the absence of a host (Additional file 5: Table S5). Several of them locate to prophages and are secreted effector proteins. Similar to the LEE encoded genes, expression levels of most of these 62 additional genes are highest in LB medium. The remaining, especially in feces, have logFCs between 1 and 8 under other conditions compared to LB. We assume that LB’s ingredients, a tryptic digest of casein and yeast extract from autolysates, mimics vertebrate host-like conditions. Stress, like alkaline pH and nitrite, completely represses the induction of all LEE genes and many other virulence genes. Furthermore, these virulence-associated genes appear not to be active on radish sprouts as well as in spinach medium at the time point of harvest. Though EHECs are known to proliferate on plant surfaces , the TTSS seems to play no role in a prolonged EHEC-plant interaction.
Gene expression in the presence of antibiotics
LB-antibiotics is the condition with the lowest number of active genes. Among the highly up-regulated genes (logFC ≥ 5), 70% originate from prophages CP-933H, CP-933K, BP933W, CP-933C, CP-933X, CP-933U, and CP-933V. Interestingly, LB-antibiotics is the condition with the highest number of hypothetical genes being induced. Sixteen of the 26 highly antibiotic-induced hypotheticals are encoded by prophages. Z0314 and Z0316 are from prophage CP-933H and have high similarities to phage tail fiber proteins. The other 14 genes originate from different prophages and their function is unknown. However, they were also active after treatment with norfloxacin . Z1434 was also identified after a human infection using the in vivo-induced antigen technology (IVIAT; ). For the other hypotheticals, no experimental data exist. By a bioinformatic approach, Z5214 was identified as a secreted effector protein, espY5’ . While most prophages of Escherichia coli O157:H7 are regarded to be defective, Asadulghani et al. reported that these phages are still inducible. Antibiotics activate the SOS-response, thereby inducing phage replication. Therefore, it is not surprising that a higher number of phage-borne hypothetical genes are active.
It is known that the treatment of an EHEC infection with antibiotics may potentiate the severity of the disease. Among clinically applied antibiotics, the combination of trimethoprim with sulfamethoxazole seems to be the worst choice . Interestingly, this antibiotic mixture strongly induces transcription of CP-933V and BP-933W. These two phages encode the shiga-toxins which contribute essentially to the clinical symptoms of an infection . Their activation provides a direct explanation for the high rate in clinical complications. Furthermore, the LEE pathogenicity island is also active in LB-antibiotics (see Table 3). In some studies, a connection of the regulation of phages and the LEE pathogenicity island was found (e.g., [66, 67]). Phage-encoded regulators have effects on the activity of TTSS and LEE genes respectively.
Transcription of genes in cattle feces
Annotated genes active in cattle feces
The gastrointestinal tract of ruminants is considered a major reservoir of Escherichia coli O157:H7 . However, no transcriptomes under this condition have been reported. We could detect several genes up regulated in feces compared to LB (Table 4, for a graphic version see Additional file 4). A highly up-regulated gene in cattle feces is glgS with a logFC of 6.6. It is a central gene in glycogen metabolism: this metabolite accumulates under starvation . Other highly active metabolic enzymes are idi (Z4227, isopentenyl-diphosphate delta-isomerase), a key enzyme of isoprenoid pathways, and caiA (Z0045, crotonobetainyl-CoA dehydrogenase). The latter is involved in the metabolism of L-carnitine, a ubiquitous compound in eukaryotic tissues, which is metabolized to γ-butyrobetaine in E. coli.
Many up-regulated genes are either involved in macromolecule-protection or associated to membrane stress. One example is the up-regulated phage shock regulon pspEDCBA (Z2477 – Z2482, logFCs between 2 and 8) which is known to respond to certain stress conditions such as phage attack, heat shock, hyperosmotic stress, or exposure to hydrophobic organic solvents . Further, the co-chaperones dnaK (Z0014, Table 4) and dnaJ (Z0015) are active in feces with logFCs of 3.7 and 4.2, respectively. These chaperones are essential for the folding of newly synthesized proteins or refolding of misfolded proteins [72, 73]. A similar function in disaggregation and reactivation of proteins has the chaperone ClpB (additionally active in LB-pH4 and spinach juice ). A high logFC of these chaperone genes should indicate cellular stress. Other active stress related genes include tus, encoding a DNA replication termination protein , furthermore, yebG, which is involved in DNA-damage repair, and in addition the ibpAB operon, which plays a role in the recognition of aggregated proteins .
Membrane stress is indicated by CpxP (Z5458, formerly YiiO), a small protein located in the periplasm. The protein interacts with the cpx-regulon, a two component signal transduction system responsible for sensing envelope stress . HtpX, a member of the σ32 heat-shock regulon, is involved in the degradation and dislocation of unassembled membrane proteins . The highest up-regulation of this gene in feces indicates the presence of membrane stress. Interestingly, many of the up-regulated hypothetical genes in cattle feces also contain membrane domains.
Hypothetical genes active in cattle feces
Thirteen hypotheticals are only induced in cattle feces with a logFC higher than 5 (see Table 2). Z0387 and Z3722 are unknown genes which have never been reported to be active under any condition before. As in radish sprouts, several up-regulated genes are involved in biofilm formation, e.g. ycdT. Interestingly, the hypothetical gene Z2619 is similar to membrane proteins, probably involved in the uptake of host derived compounds. Z2619 has high similarities to UidC of Escherichia coli E101, belonging to the uidRABC operon which is involved in the metabolism of glucuronate, a molecule present in the gut . Furthermore, there is experimental evidence based on in vivo-induced antigen technology (IVIAT) for Escherichia coli O157:H7, that Z2619 is also active during human infection .
In summary, most of the highly active genes in cattle feces are connected to membrane stress or involved in the protection or reactivation of proteins. Based on these findings we suggest that EHEC may be under considerable environmental stress in the colon of ruminants.
Gene expression on radish sprouts
Utilization of carbon sources
After growth on radish sprouts (Figure 5A), 997 genes have significantly different transcript levels (478 up/519 down) compared to LB medium. A distinctive pattern of genes with high transcription levels includes genes active in the degradation of fructose fruAKB (Z3425-Z3427; logFCs between 5 and 8), trehalose otsAB (Z2949, Z2950; logFCs between 2 and 4), and arabinose araAHGF (Z0070, Z2951, Z2953, Z2954), including Z3511-Z3513/Z3515 (Table 5, for a graphic version see Additional file 4). EHECs are able to utilize these plant-specific carbon sources. Plants are known to exudate certain carbon sources and other substances from their roots to maintain a certain microbiome, which in turn provides the plants with micronutrients .
Response to stress
We assign azoreductase azoR (Z2315, Table 5, Figure 5B-C) to the stress related genes. Azo dyes are a class of colorants used in chemical, pharmaceutical and food industries. They are carcinogenic and can cause severe environmental problems . Bacterial azoreductases can reduce these dyes in a NAD(P)H dependent reaction . However, azo dyes are human made compounds. The environmental role of azoreductase is unknown . As we measured high levels of transcripts on sprouts (logFC = 4.1, RPKM = 190), we speculate on a role of this enzyme in detoxification of secondary plant metabolites directed against, or modulating, the bacterial microbiome. Indeed, Liu et al. found that azoR protects E. coli against thiol-specific stresses caused by electrophilic quinones.
Up-regulation (logFC = 3.8) of aquaporin aqpZ (Z1109, Table 5) on radish sprouts may indicate hypoosmotic stress  since aquaporins are proteins conducting water (or glycerol), but only about one quarter of the bacterial species possess an aqpZ homolog. The role of aqpZ in osmotic regulation is under debate due to conflicting data (see  and references therein). However, Tanghe et al. hypothesize that transport of other small uncharged molecules besides water may play a role associated with certain lifestyles or ecological niches.
A membrane stress response  of EHEC on sprouts is supported by the high activity of the phage shock genes pspABC and pspG (Z2479, Z2480, Z2482, Z5648, Table 5; ) with logFCs between 3 and 8 on radish sprouts, perhaps indicating that secondary plant metabolites secreted by the radish sprouts may impair membrane integrity. Further, we identified an up-regulated membrane protein (YhdV, Z4628) and a quercetinase homolog (YhhW, Z4807). The flavonoid quercetin is widely distributed in plants and potentially toxic. Thus, YhhW may be involved in its detoxification .
Another up-regulated gene (logFC of 7.5) indicative of a stress response is the acid shock protein precursor AsrA (Z2591, Table 5: ). This small protein localizes in the periplasm and is further processed to a 8 kDa fragment, which is the active form of this proposed chaperone . It appears that low pH is only a necessary, but not a sufficient condition, to induce asrA as it is not active in acidified LB-nitrite. In addition, osmotic stress also induces asrA[25, 87, 88].
Finally, narU (Z2243) encodes a protein forming a single channel for nitrate uptake and nitrite extrusion . It is strongly up-regulated (logFC of 7.0) on radish sprouts and only to a logFC of 4.4 in LB-nitrite.
Adhesion to the plant surface
Curli fiber genes are associated with adhesion to plants (e.g., ). These fimbriae-like structures are a major factor for the formation of biofilms and adhesion to surfaces . The highest activity of these six genes csgGFEDBA (Z1670 – Z1676) was determined on radish sprouts (Table 5). An additional indicator for adhesion to radish sprouts is the up-regulation of bssS (Z1697, Table 5), a regulatory gene for biofilm formation . The increased transcription level of curli-related genes together with bssS corroborates the hypothesis of Fink et al.  that lettuce leaves are colonized by using curli fibers and by fine tuning biofilm formation.
We identified nine hypothetical genes in radish sprouts with a logFC higher than 5, which are only active on sprouts (summarized in Table 2 and visualized in Figure 6B-E). One of those, yjfY, was already found induced on lettuce leaves . This gene is also active in biofilm growth [48, 49]. We found additional hypotheticals that play a role in biofilm formation, which are summarized in Table 2 including references for them.
Radish sprouts as a reservoir of EHEC?
Sprouts were inoculated with 4 × 102 cfu/g plant EHEC and grown for several days. The growth curve in Figure 5A illustrates that EHEC grows very well on the plants, reaching 2 × 107 cfu/g plant, apparently without affecting the plant phenotype. As shown above, EHEC expresses many unique genes when it thrives on the plant surface, including adhesin, membrane proteins, transport proteins, metabolic proteins and a variety of stress response proteins. We conclude that radish sprouts are a suitable habitat for EHEC to proliferate. However, this experiment reflects a mono-association of EHEC and radish sprouts and, therefore, does not yet allow a conclusion whether plants in general serve as a natural reservoir of EHEC.
EHEC as “vegetarian”?
Obviously, EHEC is able to survive and proliferate on and in plants. This has now been shown several times by different groups (e.g. ). However, after EHEC had been described as a pathogen in 1982, it was dubbed “hamburger disease”, since many outbreaks were related to undercooked minced meat. For quite some time more or less the only reservoir considered for pathogenic enterobacteria was meat, milk, and products thereof . However, in hindsight, a possible “vegetarian” life style of EHEC should have been considered years ago, since EHEC contains genes to metabolize different sugars (some of which exclusively produced by plants): fruAKB for fructose, otsAB for trehalose, and araAHGF for arabinose. Using BLAST, we found that plant pathogens or plant associated genera, such as Ralstonia, Xanthomonas, Erwinia, Rhizobium, and Dickeya also contain such operons. Next, EHEC forms biofilms on plant surfaces using curli. Again, species of Rahnella and Serratia contain csgA. The quercetinase homolog yhhW found in EHEC is also present in Pectobacterium carotovorum, and Serratia proteamaculans. Stress related EHEC-genes induced while growing on sprouts, such as asr and pspABCG, are found in Burkholderia gladioli, and Pectobacterium species. Finally, as shown in this paper, azoR is induced in EHEC when growing on radish and azoR-homologs are found in species of Serratia, Erwinia, Pectobacterium, and Dickeya. Taking together, it would be quite interesting to scan the EHEC-genome for homologous genes from other bacteria, which are known to be induced in the respective niche of each bacterium and to see, if EHEC can thrive in this niche as well and which genes are induced. Strand-specific transcriptomes supply an excellent technique to substantiate such hypotheses.
Distinguishing weakly transcribed genes from background transcription is a general problem in NGS transcriptomics. Our proposed statistical method is based on the data of the actual experiment, thus also takes the sequencing depth into account. Genes are classified into “active” or “inactive”, based on a sound statistical evaluation and not on arbitrarily chosen threshold values of reads or RPKM. We sequenced biological replicates of transcriptomes using the SOLiD and the Illumina system and showed a high correlation between both approaches, confirming that the SOLiD and Illumina system produce equivalent data. This is interesting insofar as PCR-artifacts and other biased reactions during library preparation are a possible source of the uneven coverage of a given gene with reads. However, when comparing relative transcription (hence, regulation), these effects apparently tend to cancel each other out. Otherwise, there would be no or only weak correlation between data gained with the SOLiD and the Illumina system.
We discovered a unique set of active genes for each condition tested and, remarkably, most genes of EHEC appear to be active under at least one condition. Indeed, under environmental conditions more hypothetical genes were found to be active than in standard lab media. This is not too surprising, since growth of E. coli in standard medium has been examined over and over again. Interestingly, only a minority of genes (2.7%) were not active under any condition tested by us. We therefore suggest that the general assumption that large numbers of genes are over-annotated in bacterial genomes may be wrong. In addition, such genes might be active in habitats not yet probed. Finally, azoR exemplarily shows that transcriptome profiling still is and will be a powerful technique to find new roles for genes. azoR was formerly only known to destroy artificial azo-dyes, but its high induction on plants suggests a detoxification role in nature. This finding provides an entry point to test natural plant substances for azoR induction and to observe growth (impairments?) of an ΔazoR mutant to further elucidate the behavior of EHEC and other pathogens in nature. Similarly, other highly induced or repressed genes are now new candidates for a detailed functional description.
Strains and culture conditions
If not stated otherwise, E. coli O157:H7 EDL933 (EHEC) (Collection de l’Institute Pasteur: CIP 106327) was incubated in liquid medium at 37°C with shaking (180 rpm) by adding 1 ml overnight culture (about 109 cfu) to 100 ml medium. Growth curves were measured either by optical density (OD600nm) or counting colony forming units (cfu) after serial platings. Before harvesting, samples were plated on CHROMagar O157 (CHROMagar, France) to confirm identity. In all cases, bacterial cells were harvested at the transition from late exponential to early stationary phase by centrifugation (20,000 × g, 1°C, 3 min) and frozen in liquid nitrogen for storage.
LB: Tenfold diluted lysogeny broth was used as reference medium. Cells were harvested after 3.5 h at about 3.1 × 108 cfu/ml.
LB-15°C: Transcription was determined at 15°C in tenfold diluted LB medium and harvested at 3.1 × 108 cfu/ml.
MM: M9 minimal medium was prepared as described  and cells harvested after 12 h at about 2.5 × 109 cfu/ml.
LB-pH9: Tenfold diluted LB medium at alkaline pH was buffered with 10 mM CHES and the pH was adjusted to 9.0 at 37°C and was filter sterilized. After 7 h, the cells reached 1.5 × 108 cfu/ml and were harvested.
LB-pH4: Tenfold diluted LB medium at pH4 was adjusted to 4.0 at 37°C and filter sterilized. Cells were harvested at 2.0 × 108 cfu/ml.
LB-nitrite: For nitrite, we added 200 mg/L sodium nitrite to 10-fold diluted LB and adjusted it to pH6. Harvest was after 6.5 h at 2.9 × 108 cfu/ml.
Spinach: For spinach medium, whole spinach leaves were homogenized (Agienda Agricola Pistelle, Kaufland, Germany) on ice using an Ultraturrax D50. The mush was centrifuged (1 h, 30,000 × g, 5°C), decanted, filtered (2.5 μm pore size), centrifuged (2 h, 30,000 × g, 5°C), decanted and sterile filtered (0.2 μm). After 5 h of growth, we harvested the cells at 6.0 × 108 cfu/ml.
LB-antibiotics: Tenfold diluted LB was supplemented with 2 μg/ml sulfamethoxazole and 0.4 μg/ml trimethoprim. This medium was inoculated with 2 ml of overnight culture. Cells cannot divide anymore in this medium and the increase in OD600nm is due to massive cell elongation. We harvested the cells at the peak of OD600nm at 0.194.
LB-solid: For growth on solid medium, about 500 colonies were grown on undiluted LB agar plates and harvested after 17 h at 37°C. Colonies were transferred directly to Trizol (see below) for RNA extraction.
Sprouts: Radish sprout seeds were sterilized (5 min 70% ethanol, 10 min 1% NaOCl with 0.1% Tween), then washed five times with sterile water and subsequently incubated in sterile MS medium without glucose  in sterile plastic boxes (1 L total volume, passively aerated). After germination, seedlings were tested for sterility by plating a sample on LB agar. After 5 days of growth, the shoots were inoculated 10 min with 1 L ¼-concentrated Ringer solution containing 103 cfu/ml EHEC. The superfluous medium was decanted and cfu/g was periodically determined as follows: infected shoots were washed and bacterial numbers of the washing liquid were determined by serial dilution platings. After 120 hours, the transition from exponential to stationary phase could be determined (see Figure 5A). Bacteria were harvested by gently shaking the seedlings in cold ¼-concentrated Ringer (+1% Tween-20; 4°C) for 1 min. Bacteria were collected by centrifugation from the decanted Ringer as above.
Cattle feces: The number of cultivatable bacteria of cattle feces was determined by serial platings on LB-agar plates after 12 h at 37°C. The cattle feces were subsequently inoculated with 1000-fold number of EHEC, pre-grown in 1 L LB to stationary phase. When the bacteria had reached stationary phase, they were harvested by centrifugation and re-suspended in 7 ml ¼-concentrated Ringer. We added this suspension to 10 g of cattle feces and mixed it thoroughly. After 6 h at 37°C, bacterial cells were harvested by adding 90 ml cold ¼-concentrated Ringer shaking for 10 s, sedimentation for 30 s, decanting, and centrifugation.
RNA isolation and propagation
RNA was isolated with Trizol (Invitrogen, USA). One ml Trizol and about 200 μl of 0.1 mm zirconia beads were added to 50 μl cell pellet. The cells were disrupted by bead-beating (FastPrep-24, MP Biomedicals, USA), thrice for 45 s at 6.5 m/s, and cooled for 5 min on ice in between. Subsequently, the Trizol-manual was followed and the RNA-pellet was dissolved in RNase free water. Since 90-95% of the total RNA consists of ribosomal RNA , we applied the Ribominus Transcriptome Isolation Kit (Yeast and Bacteria, Invitrogen, USA). The manufacturer’s manual was followed but the RNA was co-precipitated with 1 μl glycogen, using 2.5 volumes 100% ethanol and 0.1 volumes 3 M sodium acetate, instead of the concentration modules included. Residual DNA was removed with the TURBO DNA-free Kit (Applied Biosystems, USA).
Whole transcriptome RNA library preparation – SOLiD system
Fragmentation, hybridization, ligation, reverse transcription of enriched total RNA and amplification of the cDNA was carried out using the SOLiD Total RNA-seq Kit (Applied Biosystems, USA). Briefly, RNA was fragmented with RNase III for 9 min. We purified the reaction mixture with the miRNeasy Mini Kit (Qiagen, Germany). This returns high amounts of RNA and removes proteins from the RNase treatment. Hybridization and ligation was performed using the SOLiD Adaptor Mix at 65°C for 10 min and the Ligation Enzyme Mix at 16°C for 16 h following the manufacturer’s instructions. The ligation reaction was directly added to the RT reaction mix containing SOLiD RT Primer and ArrayScript Reverse Transcriptase. The mixture was incubated at 42°C for 30 min. After purification using the MinElute® PCR Purification Kit (Qiagen, Germany), the cDNA was size selected for 150–250 nt cDNA with Novex® 6% TBE-Urea Gels. The selected cDNA was directly amplified from the gel in 15 PCR cycles. Here, we used the SOLiD Transcriptome Mutiplexing Kit. SOLiD 3′ PCR primers were replaced by different barcoded SOLiD 3′ PCR primers for different conditions. Two libraries, spinach and LB-nitrite, were split before and further treated independently to obtain technical replicates. The amplified DNA was purified with the PureLink PCR Micro Kit (Invitrogen, USA). The amounts of RNA/DNA were measured with a NanoDrop spectrophotometer. The quality and size distribution of the isolated and depleted RNA was assessed on the Agilent 2100 Bioanalyzer with Agilent DNA 1000 Kit and RNA 6000 Pico Kit. SOLiD System templated bead preparation and sequencing on the SOLiD 4.0 system was conducted by CeGaT GmbH (Tübingen, Germany).
Whole transcriptome RNA library preparation – Illumina system
Biological replicates of LB medium and radish sprouts were sequenced on an Illumina MiSeq sequencer. One μg RNA was fragmented as described in Flaherty et al. using a Covaris sonicator and the RNA-fragments precipitated with glycogen and 2.5 volumes 100% ethanol. RNA fragments were dephosphorylated using Antarctic phosphatase (10 units per 300 ng RNA, supplemented with 10 units Superase, 37°C for 30 min). The fragments were recovered using the miRNeasy Mini Kit (Qiagen, Germany). Subsequent phosphorylation was carried out using 20 units T4 polynucleotide kinase, supplemented with 10 units RNase inhibitor Superase (Life Technologies, USA) at 37°C for 60 min, and recovered using the miRNeasy Mini Kit. The prepared RNA was processed further with the TruSeq Small RNA Sample Preparation Kit (Illumina, USA): The whole sample was concentrated in a Speedvac (Eppendorf, Germany) at 30°C for 1 hour to 5 μl final volume. The RNA 3′ and 5′ adapters were ligated to the fragments strand specifically. The ligated fragments were reverse transcribed using the SuperScript II Reverse Transcriptase kit (Life Technologies, USA). The subsequent PCR reaction was run in 11 cycles at an annealing temperature of 60°C. Amplified cDNA was purified on 6% Novex TBE polyacrylamide gels. For this, each complete sample was loaded into three wells. The gel was run for 45 minutes at 145 V in Novex TBE buffer. Afterwards, the DNA was stained with SYBR Gold. Fragments were size selected between 190 and 300 base pairs according to the ladder. The chosen length corresponds to an insert length of 50 to 100 base pairs. The gel pieces were transferred to a pierced 0.5 ml micro-centrifuge tube, placed in a 1.5 ml tube and centrifuged at 13,000 × g for 5 min at room temperature. The gel debris was eluted in 300 μl ddH2O for three hours under intense rotation. The eluate was filtered in a 0.22 μm Spin-X spin filter (Corning, USA) and the debris was discarded. The solution was ethanol precipitated with glycogen and sodium acetate and re-suspended in 10 μl elution buffer. The library was quantified using a Qubit (Life Technologies, USA), and denatured in 0.1 N NaOH. Next, it was diluted with the supplied HT1 buffer to an end concentration of 8 pM. The sequencing was conducted on a MiSeq sequencer with 50 cycles of library sequencing.
SOLiD output as QUAL and CSFASTA files was converted to FASTQ with Galaxy [99, 100]. We mapped SOLiD and Illumina FASTQ files to the reference genome of EHEC [GenBank:NC_002655] and to the plasmid pO157 [GenBank:NC_007414] using Bowtie  (settings for SOLiD data: 28 nt seed length, maximal two mismatches in the seed, a maximal threshold of 70 for the sum of the quality values at mismatched positions; Illumina data: 20 nt seed length, 0 mismatches in the seed) implemented in Galaxy. Using Samtools output SAM files were filtered for mappable reads only . We further converted SAM files to BAM files and indexed them to create BAM.BAI files. The data were visualized with BamView  implemented in Artemis 13.0 . Raw data have been uploaded to the Gene Expression Omnibus [GEO:GSE48199].
Normalizing to RPKM values
The number of reads were normalized to reads per kilobase per million mapped reads (RPKM; ). Using this method, the number of reads is normalized with respect to the sequencing depth and the length of a given gene. For determination of counts and RPKM values, BAM files were imported into R using Rsamtools. For further processing, the Bioconductor packages GenomicRanges and IRanges were used. Gene locations were determined by RefSeq and GenBank PTT files. The locations of the 16S rRNA and 23S rRNA are given by the RNT file from RefSeq. The method countOverlaps of IRanges was used to determine the remaining reads overlapping a 16S or 23S rRNA gene. We discarded these reads from further analysis due to the artificial removal of these rRNAs using the Ribominus kit as described above. countOverlaps is also used to determine the number of reads overlapping a gene on the same strand (counts). With these counts we generated the RPKM values. For the value “million mapped reads”, the number of reads mapped to the genome, minus the reads overlapping a 16S or 23S rRNA gene, were used (see above). The differential gene expression was analyzed with the Bioconductor package edgeR (version 3.2.3) using the counts .
Differential expression analysis
The Bioconductor package edgeR uses an overdispersed Poisson model to estimate biological variability. Such empirical Bayes methods diminish variances across the genes . The dispersion of the data was analyzed by sequencing biological replicates using two different NGS platforms (SOLiD and Illumina) of the LB reference medium and the radish sprouts condition. Confirming by statistical analysis that both sequencing platforms showed the same results for the biological replicates (see Results and Discussion), data of the experiments were merged. We present the data as a log2-fold change (logFC) of a gene in each condition compared to LB medium as basis. log2 was chosen since the cDNA is amplified using the non-linear process of a PCR-reaction in which, in first approximation, the number of fragments grows exponentially with each cycle. In the result tables values in parentheses are RPKM values.
Determination of background transcription
Incidence for the transcription of a gene is given by a transcription level higher than a supposedly random transcription. This pervasive transcription distributes all over the genome, also in non-coding regions (e.g., ). We determined random transcription by manually selecting regions of the genome that are obviously free of annotated genes. Some regions are antisense to annotated genes. We analyzed these regions also visually for the absence of non-coding RNAs or any other conspicuous transcription patterns. Figure 6A shows an example screenshot of one region. The genome positions (matching to [GenBank:NC_002655]) of the regions used are: complement(264387 – 269904), c(430056 – 435429), c(524890 – 530056), c(613235 – 620336), 2293616 – 2309141, 3707862 – 3711921, 3840351 – 3844419, 4121574 – 4126839, 4144776 – 4149762, 4298037 – 4302222, 4494846 – 4501272, 4615115 – 4619372, c(4635078 – 4639956), 5199469 – 5210215, 5263170 – 5266662, c(5277831 – 5281854), c(5282151 – 5286750), and c(5294994 – 5299602). Taken together all regions comprise a virtually “empty” part of the genome of 104.192 base pairs in length (~2%), which is supposed to be randomly transcribed only. We calculated the RPKM value for these parts for every condition in the same manner as for the annotated genes. Genes were defined as being active or turned “on” if the probability that the signal is due to the background is significantly low (p ≤ 0.05). We consider a gene as silent if it is not covered by a read in any of the conditions.
Statistical analysis of active genes
Reads observed over a gene may be solely attributed due to the background noise or background transcription (see Results and Discussion). Therefore we employ a background model as explained in the following. We assume that on average a background read will start at a position i with a given rate λ per base, and that the starts of background reads are mutually independent. Hence, a reasonable model for the read starts is a Poisson process with rate λ (see, e.g., ). Suppose we observe m reads over a gene of length g. The P-value of the hypothesis that the reads are solely due to the background is then given by
Equation (1) can be numerically evaluated given the gene length and the corresponding λ. To estimate the parameter λ we used the data of all regions with no transcription (see above) separately for each experimental condition (Additional file 6: Table S1).
Heat map generation
The generation of heat maps allows analysis of the data for similar global response patterns. We visualized the logFC values of all conditions sequenced on the SOLiD system including LB medium as reference with heat maps using the R  method heatmap.2 of the package gplots. Hierarchical complete linkage clustering was applied to rows and columns with Euclidean distance as distance measure. The used color map was linearly interpolated in RGB with the colorRampPalette method of R from the RColorBrewer color palette RdBu with eleven colors.
Availability of supporting data
The RNA-seq raw data were deposited to NCBI GEO with the accession number GSE48199 (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE48199).
Croxen MA, Finlay BB: Molecular mechanisms of Escherichia coli pathogenicity. Nat Rev Microbiol. 2010, 8 (1): 26-38.
Kaper JB, Nataro JP, Mobley HL: Pathogenic Escherichia coli. Nat Rev Microbiol. 2004, 2 (2): 123-140. 10.1038/nrmicro818.
Wong CS, Jelacic S, Habeeb RL, Watkins SL, Tarr PI: The risk of the hemolytic-uremic syndrome after antibiotic treatment of Escherichia coli O157:H7 infections. N Engl J Med. 2000, 342 (26): 1930-1936. 10.1056/NEJM200006293422601.
Tilden J, Young W, McNamara AM, Custer C, Boesel B, Lambert-Fair MA, Majkowski J, Vugia D, Werner SB, Hollingsworth J, Morris JG: A new route of transmission for Escherichia coli: infection from dry fermented salami. Am J Public Health. 1996, 86 (8): 1142-1145.
Erickson MC, Doyle MP: Food as a vehicle for transmission of Shiga toxin-producing Escherichia coli. J Food Prot. 2007, 70 (10): 2426-2449.
Ferens WA, Hovde CJ: Escherichia coli O157:H7: animal reservoir and sources of human infection. Foodborne Pathog Dis. 2011, 8 (4): 465-487. 10.1089/fpd.2010.0673.
Semenov AM, Kuprianov AA, van Bruggen AH: Transfer of enteric pathogens to successive habitats as part of microbial cycles. Microb Ecol. 2010, 60 (1): 239-249. 10.1007/s00248-010-9663-0.
Watanabe Y, Ozasa K, Mermin JH, Griffin PM, Masuda K, Imashuku S, Sawada T: Factory outbreak of Escherichia coli O157:H7 infection in Japan. Emerg Infect Dis. 1999, 5 (3): 424-428. 10.3201/eid0503.990313.
Bielaszewska M, Mellmann A, Zhang W, Kock R, Fruth A, Bauwens A, Peters G, Karch H: Characterisation of the Escherichia coli strain associated with an outbreak of haemolytic uraemic syndrome in Germany, 2011: a microbiological study. Lancet Infect Dis. 2011, 11 (9): 671-676. 10.1016/S1473-3099(11)70165-7.
Rosner B, Bernard H, Werber D, Faber M, Stark K, Krause G: Epidemiologie des EHEC O104:H4/HUS-Ausbruchs in Deutschland, Mai bis Juli. J Verbr Lebensm. 2011, 2011: 1-8.
Duffitt AD, Reber RT, Whipple A, Chauret C: Gene Expression during Survival of Escherichia coli O157:H7 in Soil and Water. Int J Microbiol. 2011, 2011: doi:10.1155/2011/340506.
Barker J, Humphrey TJ, Brown MW: Survival of Escherichia coli O157 in a soil protozoan: implications for disease. FEMS Microbiol Lett. 1999, 173 (2): 291-295. 10.1111/j.1574-6968.1999.tb13516.x.
Kus JV, Gebremedhin A, Dang V, Tran SL, Serbanescu A, Barnett Foster D: Bile salts induce resistance to polymyxin in enterohemorrhagic Escherichia coli O157:H7. J Bacteriol. 2011, 193 (17): 4509-4515. 10.1128/JB.00200-11.
Kyle JL, Parker CT, Goudeau D, Brandl MT: Transcriptome analysis of Escherichia coli O157:H7 exposed to lysates of lettuce leaves. Appl Environ Microbiol. 2010, 76 (5): 1375-1387. 10.1128/AEM.02461-09.
Lee JH, Kim YG, Cho MH, Wood TK, Lee J: Transcriptomic analysis for genetic mechanisms of the factors related to biofilm formation in Escherichia coli O157:H7. Curr Microbiol. 2011, 62 (4): 1321-1330. 10.1007/s00284-010-9862-4.
Matkovich SJ, Zhang Y, van Booven DJ, Dorn GW: Deep mRNA sequencing for in vivo functional analysis of cardiac transcriptional regulators: application to Gαq. Circ Res. 2010, 106 (9): 1459-1467. 10.1161/CIRCRESAHA.110.217513.
Perkins TT, Kingsley RA, Fookes MC, Gardner PP, James KD, Yu L, Assefa SA, He M, Croucher NJ, Pickard DJ, Maskell DJ, Parkhill J, Choudhary J, Thomson NR, Dougan G: A strand-specific RNA-Seq analysis of the transcriptome of the typhoid bacillus Salmonella typhi. PLoS Genet. 2009, 5 (7): e1000569-10.1371/journal.pgen.1000569.
Filiatrault MJ, Stodghill PV, Bronstein PA, Moll S, Lindeberg M, Grills G, Schweitzer P, Wang W, Schroth GP, Luo S, Khrebtukova I, Yang Y, Thannhauser T, Butcher BG, Cartinhour S, Schneider DJ: Transcriptome analysis of Pseudomonas syringae identifies new genes, noncoding RNAs, and antisense activity. J Bacteriol. 2010, 192 (9): 2359-2372. 10.1128/JB.01445-09.
Weissenmayer BA, Prendergast JG, Lohan AJ, Loftus BJ: Sequencing illustrates the transcriptional response of Legionella pneumophila during infection and identifies seventy novel small non-coding RNAs. PLoS One. 2011, 6 (3): e17570-10.1371/journal.pone.0017570.
Elias DA, Mukhopadhyay A, Joachimiak MP, Drury EC, Redding AM, Yen HC, Fields MW, Hazen TC, Arkin AP, Keasling JD, Wall JD: Expression profiling of hypothetical genes in Desulfovibrio vulgaris leads to improved functional annotation. Nucleic Acids Res. 2009, 37 (9): 2926-2939. 10.1093/nar/gkp164.
Delcher AL, Bratke KA, Powers EC, Salzberg SL: Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics. 2007, 23 (6): 673-679. 10.1093/bioinformatics/btm009.
Borodovsky M, Lomsadze A: Gene identification in prokaryotic genomes, phages, metagenomes, and EST sequences with GeneMarkS suite. Current protocols in bioinformatics / editoral board, Andreas D Baxevanis [et al.]. 2011, Chapter 4 (Unit 4 5): 1-17.
Thomas GH: Completing the E. coli proteome: a database of gene products characterised since the completion of the genome sequence. Bioinformatics. 1999, 15 (10): 860-861. 10.1093/bioinformatics/15.10.860.
Kolker E, Makarova KS, Shabalina S, Picone AF, Purvine S, Holzman T, Cherny T, Armbruster D, Munson RS, Kolesov G, Frishman D, Galperin MY: Identification and functional analysis of ‘hypothetical’ genes expressed in Haemophilus influenzae. Nucleic Acids Res. 2004, 32 (8): 2353-2361. 10.1093/nar/gkh555.
Šeputienė V, Motiejūnas D, Sužiedėlis K, Tomenius H, Normark S, Melefors Ö, Sužiedėlienė E: Molecular characterization of the acid-inducible asr gene of Escherichia coli and its role in acid stress response. J Bacteriol. 2003, 185 (8): 2475-2484. 10.1128/JB.185.8.2475-2484.2003.
Carruthers MD, Minion C: Transcriptome analysis of Escherichia coli O157:H7 EDL933 during heat shock. FEMS Microbiol Lett. 2009, 295 (1): 96-102. 10.1111/j.1574-6968.2009.01587.x.
Kudva IT, Griffin RW, Krastins B, Sarracino DA, Calderwood SB, John M: Proteins other than the locus of enterocyte effacement-encoded proteins contribute to Escherichia coli O157:H7 adherence to bovine rectoanal junction stratified squamous epithelial cells. BMC Microbiol. 2012, 12: 103-10.1186/1471-2180-12-103.
Haas BJ, Chin M, Nusbaum C, Birren BW, Livny J: How deep is deep enough for RNA-Seq profiling of bacterial transcriptomes?. BMC Genomics. 2012, 13: 734-10.1186/1471-2164-13-734.
Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B: Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008, 5 (7): 621-628. 10.1038/nmeth.1226.
Jansohn M: Gentechnische Methoden. 2007, München: Elservier, Spektrum Akademischer Verlag, 4
Clark MB, Amaral PP, Schlesinger FJ, Dinger ME, Taft RJ, Rinn JL, Ponting CP, Stadler PF, Morris KV, Morillon A, Rozowsky JS, Gerstein MB, Wahlestedt C, Hayashizaki Y, Carninci P, Gingeras TR, Mattick JS: The reality of pervasive transcription. PLoS Biol. 2011, 9 (7): e1000625-10.1371/journal.pbio.1000625. discussion e1001102
Bruno VM, Wang Z, Marjani SL, Euskirchen GM, Martin J, Sherlock G, Snyder M: Comprehensive annotation of the transcriptome of the human fungal pathogen Candida albicans using RNA-seq. Genome Res. 2010, 20 (10): 1451-1458. 10.1101/gr.109553.110.
Vivancos AP, Guell M, Dohm JC, Serrano L, Himmelbauer H: Strand-specific deep sequencing of the transcriptome. Genome Res. 2010, 20 (7): 989-999. 10.1101/gr.094318.109.
Raghavan R, Sloan DB, Ochman H: Antisense transcription is pervasive but rarely conserved in enteric bacteria. mBio. 2012, 3 (4): e00156-12-
Mendoza-Vargas A, Olvera L, Olvera M, Grande R, Vega-Alvarado L, Taboada B, Jimenez-Jacinto V, Salgado H, Juárez K, Contreras-Moreira B, Huerta AM, Collado-Vides J, Morett E: Genome-wide identification of transcription start sites, promoters and transcription factor binding sites in E. coli. PLoS One. 2009, 4 (10): e7526-10.1371/journal.pone.0007526.
Beaume M, Hernandez D, Farinelli L, Deluen C, Linder P, Gaspin C, Romby P, Schrenzel J, Francois P: Cartography of methicillin-resistant S. aureus transcripts: detection, orientation and temporal expression during growth phase and stress conditions. PLoS One. 2010, 5 (5): e10725-10.1371/journal.pone.0010725.
Herold S, Siebert J, Huber A, Schmidt H: Global expression of prophage genes in Escherichia coli O157:H7 strain EDL933 in response to norfloxacin. Antimicrob Agents Chemother. 2005, 49 (3): 931-944. 10.1128/AAC.49.3.931-944.2005.
John M, Kudva IT, Griffin RW, Dodson AW, McManus B, Krastins B, Sarracino D, Progulske-Fox A, Hillman JD, Handfield M, Tarr PI, Calderwood SB: Use of in vivo-induced antigen technology for identification of Escherichia coli O157:H7 proteins expressed during human infection. Infect Immun. 2005, 73 (5): 2665-2679. 10.1128/IAI.73.5.2665-2679.2005.
Tobe T, Beatson SA, Taniguchi H, Abe H, Bailey CM, Fivian A, Younis R, Matthews S, Marches O, Frankel G, Hayashi T, Pallen MJ: An extensive repertoire of type III secretion effectors in Escherichia coli O157 and the role of lambdoid phages in their dissemination. Proc Natl Acad Sci U S A. 2006, 103 (40): 14941-14946. 10.1073/pnas.0604891103.
van Passel MW, Marri PR, Ochman H: The emergence and fate of horizontally acquired genes in Escherichia coli. PLoS Comput Biol. 2008, 4 (4): e1000059-10.1371/journal.pcbi.1000059.
Sankar TS, Neelakanta G, Sangal V, Plum G, Achtman M, Schnetz K: Fate of the H-NS-repressed bgl operon in evolution of Escherichia coli. PLoS Genet. 2009, 5 (3): e1000405-10.1371/journal.pgen.1000405.
Weber MM, French CL, Barnes MB, Siegele DA, McLean RJ: A previously uncharacterized gene, yjfO (bsmA), influences Escherichia coli biofilm formation and stress response. Microbiology. 2010, 156 (Pt 1): 139-147.
Zhang XS, García-Contreras R, Wood TK: YcfR (BhsA) influences Escherichia coli biofilm formation through stress response and surface hydrophobicity. J Bacteriol. 2007, 189 (8): 3051-3062. 10.1128/JB.01832-06.
Zhang XS, García-Contreras R, Wood TK: Escherichia coli transcription factor YncC (McbR) regulates colanic acid and biofilm formation by repressing expression of periplasmic protein YbiM (McbA). ISME J. 2008, 2 (6): 615-631. 10.1038/ismej.2008.24.
Hancock V, Klemm P: Global gene expression profiling of asymptomatic bacteriuria Escherichia coli during biofilm growth in human urine. Infect Immun. 2007, 75 (2): 966-976. 10.1128/IAI.01748-06.
Adams M, Jia Z: Structural and biochemical analysis reveal pirins to possess quercetinase activity. J Biol Chem. 2005, 280 (31): 28675-28682. 10.1074/jbc.M501034200.
Fink RC, Black EP, Hou Z, Sugawara M, Sadowsky MJ, Diez-Gonzalez F: Transcriptional responses of Escherichia coli K-12 and O157:H7 associated with lettuce leaves. Appl Environ Microbiol. 2012, 78 (6): 1752-1764. 10.1128/AEM.07454-11.
Schembri MA, Kjaergaard K, Klemm P: Global gene expression in Escherichia coli biofilms. Mol Microbiol. 2003, 48 (1): 253-267. 10.1046/j.1365-2958.2003.03432.x.
Hancock V, Vejborg RM, Klemm P: Functional genomics of probiotic Escherichia coli Nissle 1917 and 83972, and UPEC strain CFT073: comparison of transcriptomes, growth and biofilm formation. Mol Genet Genomics. 2010, 284 (6): 437-454. 10.1007/s00438-010-0578-8.
Augustus AM, Spicer LD: The MetJ regulon in gammaproteobacteria determined by comparative genomics methods. BMC Genomics. 2011, 12: 558-10.1186/1471-2164-12-558.
Kang L, Shaw AC, Xu D, Xia W, Zhang J, Deng J, Woldike HF, Liu Y, Su J: Upregulation of MetC is essential for D-alanine-independent growth of an alr/dadX-deficient Escherichia coli strain. J Bacteriol. 2011, 193 (5): 1098-1106. 10.1128/JB.01027-10.
Augustus AM, Reardon PN, Spicer LD: MetJ repressor interactions with DNA probed by in-cell NMR. Proc Natl Acad Sci U S A. 2009, 106 (13): 5065-5069. 10.1073/pnas.0811130106.
Ryjenkov DA, Tarutina M, Moskvin OV, Gomelsky M: Cyclic diguanylate is a ubiquitous signaling molecule in bacteria: insights into biochemistry of the GGDEF protein domain. J Bacteriol. 2005, 187 (5): 1792-1798. 10.1128/JB.187.5.1792-1798.2005.
Hengge R: Principles of c-di-GMP signalling in bacteria. Nat Rev Microbiol. 2009, 7 (4): 263-273. 10.1038/nrmicro2109.
Jonas K, Edwards AN, Ahmad I, Romeo T, Romling U, Melefors O: Complex regulatory network encompassing the Csr, c-di-GMP and motility systems of Salmonella typhimurium. Environ Microbiol. 2010, 12 (2): 524-540. 10.1111/j.1462-2920.2009.02097.x.
Ramos JL, Martinez-Bueno M, Molina-Henares AJ, Teran W, Watanabe K, Zhang X, Gallegos MT, Brennan R, Tobes R: The TetR family of transcriptional repressors. Microbiol Mol Biol Rev. 2005, 69 (2): 326-356. 10.1128/MMBR.69.2.326-356.2005.
Iyoda S, Koizumi N, Satou H, Lu Y, Saitoh T, Ohnishi M, Watanabe H: The GrlR-GrlA regulatory system coordinately controls the expression of flagellar and LEE-encoded type III protein secretion systems in enterohemorrhagic Escherichia coli. J Bacteriol. 2006, 188 (16): 5682-5692. 10.1128/JB.00352-06.
Lim JY, Yoon J, Hovde CJ: A brief overview of Escherichia coli O157:H7 and its plasmid O157. J Microbiol Biotechnol. 2009, 20 (1): 5-14.
Kanack KJ, Crawford JA, Tatsuno I, Karmali MA, Kaper JB: SepZ/EspZ is secreted and translocated into HeLa cells by the enteropathogenic Escherichia coli type III secretion system. Infect Immun. 2005, 73 (7): 4327-4337. 10.1128/IAI.73.7.4327-4337.2005.
Shames SR, Deng W, Guttman JA, de Hoog CL, Li Y, Hardwidge PR, Sham HP, Vallance BA, Foster LJ, Finlay BB: The pathogenic E. coli type III effector EspZ interacts with host CD98 and facilitates host cell prosurvival signalling. Cell Microbiol. 2010, 12 (9): 1322-1339. 10.1111/j.1462-5822.2010.01470.x.
Berger CN, Crepin VF, Baruch K, Mousnier A, Rosenshine I, Frankel G: EspZ of enteropathogenic and enterohemorrhagic Escherichia coli regulates type III secretion system protein translocation. mBio. 2012, 3 (5): e00317-12-
Nicholls L, Grant TH, Robins-Browne RM: Identification of a novel genetic locus that is required for in vitro adhesion of a clinical isolate of enterohaemorrhagic Escherichia coli to epithelial cells. Mol Microbiol. 2000, 35 (2): 275-288. 10.1046/j.1365-2958.2000.01690.x.
Patel J, Sharma M, Millner P, Calaway T, Singh M: Inactivation of Escherichia coli O157:H7 attached to spinach harvester blade using bacteriophage. Foodborne Pathog Dis. 2011, 8 (4): 541-546. 10.1089/fpd.2010.0734.
Asadulghani M, Ogura Y, Ooka T, Itoh T, Sawaguchi A, Iguchi A, Nakayama K, Hayashi T: The defective prophage pool of Escherichia coli O157: prophage-prophage interactions potentiate horizontal transfer of virulence determinants. PLoS Pathog. 2009, 5 (5): e1000408-10.1371/journal.ppat.1000408.
Saenz JB, Li J, Haslam DB: The MAP kinase-activated protein kinase 2 (MK2) contributes to the Shiga toxin-induced inflammatory response. Cell Microbiol. 2010, 12 (4): 516-529. 10.1111/j.1462-5822.2009.01414.x.
Flockhart AF, Tree JJ, Xu X, Karpiyevich M, McAteer SP, Rosenblum R, Shaw DJ, Low CJ, Best A, Gannon V, Laing C, Murphy KC, Leong JM, Schneiders T, La Ragione R, Gally DL: Identification of a novel prophage regulator in Escherichia coli controlling the expression of type III secretion. Mol Microbiol. 2012, 83 (1): 208-223. 10.1111/j.1365-2958.2011.07927.x.
Bender JK, Praszkier J, Wakefield MJ, Holt K, Tauschek M, Robins-Browne RM, Yang J: Involvement of PatE, a prophage-encoded AraC-like regulator, in the transcriptional activation of acid resistance pathways of enterohemorrhagic Escherichia coli strain EDL933. Appl Environ Microbiol. 2012, 78 (15): 5083-5092. 10.1128/AEM.00617-12.
Laegreid WW, Elder RO, Keen JE: Prevalence of Escherichia coli O157:H7 in range beef calves at weaning. Epidemiol Infect. 1999, 123 (2): 291-298. 10.1017/S0950268899002757.
Wilson WA, Roach PJ, Montero M, Baroja-Fernandez E, Munoz FJ, Eydallin G, Viale AM, Pozueta-Romero J: Regulation of glycogen metabolism in yeast and bacteria. FEMS Microbiol Rev. 2010, 34 (6): 952-985.
Bernal V, Sevilla Á, Cánovas M, Iborra JL: Production of L-carnitine by secondary metabolism of bacteria. Microb Cell Fact. 2007, 6: 31-10.1186/1475-2859-6-31.
Jovanovic G, Lloyd LJ, Stumpf MP, Mayhew AJ, Buck M: Induction and function of the phage shock protein extracytoplasmic stress response in Escherichia coli. J Biol Chem. 2006, 281 (30): 21147-21161. 10.1074/jbc.M602323200.
Ullers RS, Ang D, Schwager F, Georgopoulos C, Genevaux P: Trigger Factor can antagonize both SecB and DnaK/DnaJ chaperone functions in Escherichia coli. Proc Natl Acad Sci U S A. 2007, 104 (9): 3101-3106. 10.1073/pnas.0608232104.
Winter J, Linke K, Jatzek A, Jakob U: Severe oxidative stress causes inactivation of DnaK and activation of the redox-regulated chaperone Hsp33. Mol Cell. 2005, 17 (3): 381-392. 10.1016/j.molcel.2004.12.027.
Doyle SM, Hoskins JR, Wickner S: Collaboration between the ClpB AAA+ remodeling protein and the DnaK chaperone system. Proc Natl Acad Sci U S A. 2007, 104 (27): 11138-11144. 10.1073/pnas.0703980104.
Neylon C, Kralicek AV, Hill TM, Dixon NE: Replication termination in Escherichia coli: structure and antihelicase activity of the Tus-Ter complex. Microbiol Mol Biol Rev. 2005, 69 (3): 501-526. 10.1128/MMBR.69.3.501-526.2005.
Gaubig LC, Waldminghaus T, Narberhaus F: Multiple layers of control govern expression of the Escherichia coli ibpAB heat-shock operon. Microbiology. 2011, 157 (Pt 1): 66-76.
Miot M, Betton JM: Optimization of the inefficient translation initiation region of the cpxP gene from Escherichia coli. Protein Sci. 2007, 16 (11): 2445-2453. 10.1110/ps.073047807.
Nonaka G, Blankschien M, Herman C, Gross CA, Rhodius VA: Regulon and promoter analysis of the E. coli heat-shock factor, σ32, reveals a multifaceted cellular response to heat stress. Genes Dev. 2006, 20 (13): 1776-1789. 10.1101/gad.1428206.
Berg G, Eberl L, Hartmann A: The rhizosphere as a reservoir for opportunistic human pathogenic bacteria. Environ Microbiol. 2005, 7 (11): 1673-1685. 10.1111/j.1462-2920.2005.00891.x.
Morrison JM, Wright CM, John GH: Identification, Isolation and characterization of a novel azoreductase from Clostridium perfringens. Anaerobe. 2012, 18 (2): 229-234. 10.1016/j.anaerobe.2011.12.006.
Feng J, Cerniglia CE, Chen H: Toxicological significance of azo dye metabolism by human intestinal microbiota. Front Biosci (Elite Ed). 2012, 4: 568-586.
Liu G, Zhou J, Fu QS, Wang J: The Escherichia coli azoreductase AzoR is involved in resistance to thiol-specific stress caused by electrophilic quinones. J Bacteriol. 2009, 191 (20): 6394-6400. 10.1128/JB.00552-09.
Tanghe A, van Dijck P, Thevelein JM: Why do microorganisms have aquaporins?. Trends Microbiol. 2006, 14 (2): 78-85. 10.1016/j.tim.2005.12.001.
Joly N, Engl C, Jovanovic G, Huvet M, Toni T, Sheng X, Stumpf MP, Buck M: Managing membrane stress: the phage shock protein (Psp) response, from molecular mechanisms to physiology. FEMS Microbiol Rev. 2010, 34 (5): 797-827.
Darwin AJ: The phage-shock-protein response. Mol Microbiol. 2005, 57 (3): 621-628. 10.1111/j.1365-2958.2005.04694.x.
Armalytė J, Šeputienė V, Melefors Ö, Sužiedėlienė E: An Escherichia coli asr mutant has decreased fitness during colonization in a mouse model. Res Microbiol. 2008, 159 (6): 486-493. 10.1016/j.resmic.2008.06.003.
Olesen I, Jespersen L: Relative gene transcription and pathogenicity of enterohemorrhagic Escherichia coli after long-term adaptation to acid and salt stress. Int J Food Microbiol. 2010, 141 (3): 248-253. 10.1016/j.ijfoodmicro.2010.05.019.
Bergholz TM, Vanaja SK, Whittam TS: Gene expression induced in Escherichia coli O157:H7 upon exposure to model apple juice. Appl Environ Microbiol. 2009, 75 (11): 3542-10.1128/AEM.02841-08.
Jia W, Tovell N, Clegg S, Trimmer M, Cole J: A single channel for nitrate uptake, nitrite export and nitrite uptake by Escherichia coli NarU and a role for NirC in nitrite export and uptake. Biochem J. 2009, 417 (1): 297-304. 10.1042/BJ20080746.
Saldana Z, Sanchez E, Xicohtencatl-Cortes J, Puente JL, Giron JA: Surface structures involved in plant stomata and leaf colonization by shiga-toxigenic Escherichia coli O157:H7. Front Microbiol. 2011, 2: 119-
Brombacher E, Baratto A, Dorel C, Landini P: Gene expression regulation by the Curli activator CsgD protein: modulation of cellulose biosynthesis and control of negative determinants for microbial adhesion. J Bacteriol. 2006, 188 (6): 2027-2037. 10.1128/JB.188.6.2027-2037.2006.
Domka J, Lee J, Wood TK: YliH (BssR) and YceP (BssS) regulate Escherichia coli K-12 biofilm formation by influencing cell signaling. Appl Environ Microbiol. 2006, 72 (4): 2449-2459. 10.1128/AEM.72.4.2449-2459.2006.
Simon S, Oelke D, Landstorfer R, Neuhaus K, Keim D: Visual analysis of next-generation sequencing data to detect overlapping genes. IEEE Symposium on Biological Data Visualization. 2011, 1: 47-54.
Hou Z, Fink RC, Radtke C, Sadowsky MJ, Diez-Gonzalez F: Incidence of naturally internalized bacteria in lettuce leaves. Int J Food Microbiol. 2013, 162 (3): 260-265. 10.1016/j.ijfoodmicro.2013.01.027.
Callaway TR, Elder RO, Keen JE, Anderson RC, Nisbet DJ: Forage feeding to reduce preharvest Escherichia coli populations in cattle, a review. J Dairy Sci. 2003, 86 (3): 852-860. 10.3168/jds.S0022-0302(03)73668-6.
Sambrook J, Russell DW: Molecular cloning. A laboratory manual. 2001, New York: Cold Spring Harbor Laboratory Press, 3
Murashige T: A revised medium for rapid growth and bioassays with tobacco tissue cultures. Physiol Plant. 1962, 15 (3): 473-497. 10.1111/j.1399-3054.1962.tb08052.x.
Flaherty BL, van Nieuwerburgh F, Head SR, Golden JW: Directional RNA deep sequencing sheds new light on the transcriptional response of Anabaena sp. strain PCC 7120 to combined-nitrogen deprivation. BMC Genomics. 2011, 12: 332-10.1186/1471-2164-12-332.
Goecks J, Nekrutenko A, Taylor J: Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 2010, 11 (8): R86-10.1186/gb-2010-11-8-r86.
Blankenberg D, von Kuster G, Coraor N, Ananda G, Lazarus R, Mangan M, Nekrutenko A, Taylor J: Galaxy: a web-based genome analysis tool for experimentalists. Current protocols in molecular biology / edited by Frederick M Ausubel [et al.]. 2010, Chapter 19 (Unit 19 10): 11-21.
Langmead B, Trapnell C, Pop M, Salzberg SL: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009, 10 (3): R25-10.1186/gb-2009-10-3-r25.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R: The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009, 25 (16): 2078-2079. 10.1093/bioinformatics/btp352.
Carver T, Bohme U, Otto TD, Parkhill J, Berriman M: BamView: viewing mapped read alignment data in the context of the reference sequence. Bioinformatics. 2010, 26 (5): 676-677. 10.1093/bioinformatics/btq010.
Rutherford K, Parkhill J, Crook J, Horsnell T, Rice P, Rajandream MA, Barrell B: Artemis: sequence visualization and annotation. Bioinformatics. 2000, 16 (10): 944-945. 10.1093/bioinformatics/16.10.944.
R Development Core Team. R: A language and environment for statistical computing. 2011, Vienna, Austria: R Foundation for Statistical Computing, [http://www.R-project.org]. ISBN 3-900051-07-0
Morgan M, Pagès H, Obenchain V: Rsamtools: Binary alignment (BAM), variant call (BCF), or tabix file import. [http://www.bioconductor.org/packages/release/bioc/html/Rsamtools.html]
Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JY, Zhang J: Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004, 5 (10): R80-10.1186/gb-2004-5-10-r80.
Aboyoun P, Pages H, Lawrence M: GenomicRanges: Representation and manipulation of genomic intervals. [http://www.bioconductor.org/packages/release/bioc/html/GenomicRanges.html]
Pages H, Aboyoun P, Lawrence M: IRanges: Infrastructure for manipulating intervals on sequences. [http://www.bioconductor.org/packages/release/bioc/html/IRanges.html]
Pruitt KD, Tatusova T, Maglott DR: NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2007, 35 (Database issue): D61-D65.
Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW: GenBank. Nucleic Acids Res. 2009, 37 (Database issue): D26-D31.
Robinson MD, McCarthy DJ, Smyth GK: edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2009, 26 (1): 139-140.
Feller W: An Introduction to Probability Theory and Its Applications. 1968, New York, London: John Wiley & Sons, Inc., 1: 3
Warnes GR, Bolker B, Bonebakker L, Gentleman R, Wolfgang Huber W, Liaw A, Lumley T, Maechler M, Magnusson A, Moeller S, Schwartz M, Venables B: gplots: Various R programming tools for plotting data. [http://cran.r-project.org/web/packages/gplots/index.html]
Neuwirth E: RColorBrewer: ColorBrewer palettes. [http://cran.r-project.org/web/packages/RColorBrewer/index.html]
This work was funded by grants from the Deutsche Forschungsgemeinschaft, No. BO867/23-1, KE740/13-1, and SCHE316/3-1 under the direction of the SPP-1395 “Informations- und Kommunikationstheorie in der Molekularbiologie (InKoMBio)”.
The authors declare that they have no competing interests.
RL conducted the transcriptome experiments, analyzed the data and wrote main parts of the manuscript. SvS analyzed the transcriptome data by contributing bioinformatics. StS provided the statistical analysis of active genes. DK and SiS initiated and conceived the study. KN supervised the study, participated in its design, contributed to the biological interpretation and helped to draft the manuscript. All authors helped in the writing of the manuscript and approved the final manuscript.
Electronic supplementary material
Additional file 1: Table S2: Probability of each gene under each condition, whether its reads result from background or from an activity above background. (XLSX 336 KB)
Additional file 3: Table S4: The transcriptional regulation of all 5379 protein-coding genes for the genome and plasmid for each condition. (XLSX 703 KB)
Authors’ original submitted files for images
About this article
Cite this article
Landstorfer, R., Simon, S., Schober, S. et al. Comparison of strand-specific transcriptomes of enterohemorrhagic Escherichia coli O157:H7 EDL933 (EHEC) under eleven different environmental conditions including radish sprouts and cattle feces. BMC Genomics 15, 353 (2014). https://doi.org/10.1186/1471-2164-15-353