Deep RNA sequencing of L. monocytogenes reveals overlapping and extensive stationary phase and sigma B-dependent transcriptomes, including multiple highly transcribed noncoding RNAs

Background Identification of specific genes and gene expression patterns important for bacterial survival, transmission and pathogenesis is critically needed to enable development of more effective pathogen control strategies. The stationary phase stress response transcriptome, including many σB-dependent genes, was defined for the human bacterial pathogen Listeria monocytogenes using RNA sequencing (RNA-Seq) with the Illumina Genome Analyzer. Specifically, bacterial transcriptomes were compared between stationary phase cells of L. monocytogenes 10403S and an otherwise isogenic ΔsigB mutant, which does not express the alternative σ factor σB, a major regulator of genes contributing to stress response, including stresses encountered upon entry into stationary phase. Results Overall, 83% of all L. monocytogenes genes were transcribed in stationary phase cells; 42% of currently annotated L. monocytogenes genes showed medium to high transcript levels under these conditions. A total of 96 genes had significantly higher transcript levels in 10403S than in ΔsigB, indicating σB-dependent transcription of these genes. RNA-Seq analyses indicate that a total of 67 noncoding RNA molecules (ncRNAs) are transcribed in stationary phase L. monocytogenes, including 7 previously unrecognized putative ncRNAs. Application of a dynamically trained Hidden Markov Model, in combination with RNA-Seq data, identified 65 putative σB promoters upstream of 82 of the 96 σB-dependent genes and upstream of the one σB-dependent ncRNA. The RNA-Seq data also enabled annotation of putative operons as well as visualization of 5'- and 3'-UTR regions. Conclusions The results from these studies provide powerful evidence that RNA-Seq data combined with appropriate bioinformatics tools allow quantitative characterization of prokaryotic transcriptomes, thus providing exciting new strategies for exploring transcriptional regulatory networks in bacteria. See minireivew http://jbiol.com/content/8/12/107.


Background
The development of powerful new DNA sequencing technologies has yielded new tools with the potential for dramatically revolutionizing scientific approaches to biological questions [1]. These new technologies can be used for a variety of applications, including genome sequencing, identification of DNA-methylation sites, population studies, chromatin precipitation (CHIP-Seq), and transcriptome studies (RNA-Seq). For RNA-Seq, cDNA is generated from an mRNA-enriched total RNA preparation and sequenced using high-throughput technology. Here, we used the Illumina Genome Analyzer to characterize the transcriptome of stationary phase Listeria monocytogenes 10403S and its isogenic ΔsigB mutant, which lacks the general stress response sigma factor, σ B .
L. monocytogenes, a Gram-positive foodborne pathogen of the Firmicutes family, is the etiological agent of the disease known as listeriosis. As 20% of listeriosis cases result in death in humans, with an estimated annual human death toll of ~ 500 in the US alone [2], this disease is a considerable public health concern. As a foodborne pathogen (with 99% of human illnesses caused by a foodborne route of infection [2]), this bacterium also presents challenging food safety concerns due to its ability to survive and grow under many conditions that are typically applied to control bacterial populations in foods, such as low pH, low temperature and high salt conditions [3][4][5]. The alternative general stress response sigma factor, σ B , is an essential component of a regulatory mechanism that contributes to the ability of L. monocytogenes to respond to and survive exposure to harsh environmental conditions [6].
Sigma factors are dissociable subunits of prokaryotic RNA polymerase responsible for enzyme recognition of a conserved DNA sequence encoding a transcriptional promoter site. Promoter recognition specificities of bacterial RNA polymerase are determined by the transient association of an appropriate sigma factor with core polymerase in response to conditions affecting the cell [7]. The regulon of a single alternative sigma factor can include hundreds of transcriptional units, thus sigma factors provide an effective mechanism for simultaneously regulating large numbers of genes under appropriate conditions [7]. Critical phenotypic functions regulated by alternative sigma factors range from bacterial sporulation [8] to stress response systems [6,9].
Through microarray analyses, the σ B regulon in L. monocytogenes has been reported to encompass more than 200 genes, including both virulence and stress response genes, many of them up-regulated upon entry into stationary phase [10][11][12]. However, interpretation of microarray analyses is dependent on the quality of existing genome annotations, which are rarely experimentally verified. Fur-ther, transcripts that do not correspond to annotated features (e.g., noncoding RNA transcripts) cannot be identified. In addition, the utility of microarrays is limited by the genomic variation that exists among bacterial strains (i.e., ideally, a unique microarray should be constructed for each strain to be analyzed) and by technical biases such as cross-hybridization. Hence, microarray data can be difficult to analyze and occasionally, misleading [13,14]. Although interpretation of RNA-Seq data also relies on the availability of a genome sequence, it is probeand annotation-independent and therefore, is free of cross-hybridization and low-hybridization biases, hence enabling genome-wide identification of all transcripts, including small noncoding RNAs (ncRNAs). Moreover, because RNA-Seq technology can generate multiple reads corresponding to each transcribed nucleotide on the genome, it is usually possible to identify 5' and 3' transcript ends with high resolution [15]. Therefore, in combination with bioinformatics tools, RNA-Seq data can be used to identify transcriptional promoters and terminators. We used L. monocytogenes as a model system to explore application of RNA-Seq for the dual purposes of genome-wide transcriptome characterization in a bacterial pathogen and comprehensive quantification of target gene expression for the alternative sigma factor, σ B .
include those matching rRNA genes as the 10403S pseudochromosome created for this study was designed with only one unique rRNA gene sequence.
To allow for quantitative comparisons among genes and runs, the coverage for each run was normalized for the total number of reads in each run and for gene size. The normalized data are presented as the Gene Expression Index (GEI), which is expressed as the number of reads per 100 bases [16]. Although in silico analyses suggested that the sequencibility (i.e., the portion of the pseudochromosome that could yield unique 32 nt reads) of the 10403S pseudochromosome was 99.6% (Additional file 1: Sequencibility text file), approximately 77.5% of the genome was covered by reads from at least one of the four runs, suggesting that more than 20% of the genome is not transcribed or is transcribed at low levels.

RNA-Seq coverage correlated with qRT-PCR transcript levels indicating that RNA-Seq data are quantitative
We evaluated whether average GEI for specific genes correlated with transcript levels that had been measured using TaqMan qRT-PCR, the current gold standard for quantification of mRNA [17]. Based on transcript levels for 9 and 5 genes in 10403S and ΔsigB, respectively, log transformed average GEI and log transformed TaqMan qRT-PCR absolute copy numbers were correlated (p-value < 0.001; adj. R 2 = 0.83; Figure 1; Additional file 2: RNA-Seq average GEI and TaqMan qRT-PCR absolute copy number of select genes), supporting that RNA-Seq provides reliable quantitative estimates of transcript levels in L. monocytogenes. RNA-Seq was previously reported to provide quantitative data on transcript levels in yeast [15], and more recently, in Burkholderia cenocepacia [16], thus, our findings extend this important correlation to a new prokaryotic system.

Stationary phase L. monocytogenes transcribed at least 83% of annotated genes
Among the 2888 annotated coding sequences (CDS) in the 10403S pseudochromosome, 2417 (83.7%) showed Correlation between qRT-PCR and RNA-Seq Figure 1 Correlation between qRT-PCR and RNA-Seq. Correlation between qRT-PCR and RNA-Seq data for selected genes in L. monocytogenes 10403S (red) and the ΔsigB strain (blue). The selected genes are: ctc, gadA, gap, opuCA, rpoB (qRT-PCR data from both strains were available for these 5 genes), flaA, inlA, plcA and sigB (only qRT-PCR data from 10403S were available for these 4 genes).
an average GEI ≥ 0.7 in 10403S (average of two biological replicates) suggesting that at least 83% of the annotated L. monocytogenes genes are transcribed in stationary phase (Additional file 3: Cumulative frequency of average GEI in L. monocytogenes 10403S; see Materials and Methods for calculation of coverage, rational for defining transcribed genes, and criteria for classifying transcript levels as low, medium or high). Of these 2417 genes, 654 (22%) had high transcript levels, 586 (20.0%) had medium transcript levels, and 1177 (41.0%) had low transcript levels.
A total of 471 genes (17%) had GEI < 0.7 and were considered "not transcribed". RNA-Seq data allowed visual examination of transcript units, aiding in identification of genes that are transcribed monocistronically or as part of an operon (Figure 2). A total of 355 transcription units appeared to represent operons; these units were identified and annotated (Additional file 4: Access database). A total of 1107 (38.3%) of the annotated 10403S CDS were located in these putative operons. Further experimental data are necessary to validate our predictions of transcription unit structure as some genes may have rho-dependent terminators that were not identified in this study and, therefore, they may be transcribed monocistronically despite the observation of GEI similar to those of their neighboring genes.
The three genes with the highest average GEI in 10403S all encoded predicted ncRNAs, including tmRNA, 6S and LhrA ( Table 2). The annotated CDS (as annotated in EGDe [18]) with the highest average GEI were lmo2257, fri, and lmo1847, which encode a hypothetical CDS, ironbinding ferritin, and an ABC transporter, respectively.
Other genes with well defined functions and high average GEI include flaA, which encodes a flagellin protein, sod, which encodes a superoxide dismutase involved in detoxification, and cspB and cspL, which encode cold-shock proteins involved in adaptation to atypical conditions ( Table 2).
Both positive and negative associations were observed between GEI and the TIGR classification of sets of genes to physiological role categories http://cmr.jcvi.org/cgi-bin/ CMR/RoleIds.cgi (Table 3). For example, genes involved in protein synthesis and protein fate showed higher average GEI in stationary phase 10403S as compared to genes involved in other functions, while genes involved in viral functions and amino acid biosynthesis were significantly associated with low average GEI in 10403S. Moreover, a positive significant association was observed between codon bias and the average GEI in 10403S (p-value < 0.001; linear regression analysis).

Identification and annotation of noncoding RNAs (ncRNAs)
Overall, we identified 67 ncRNAs (Additional file 5: ncRNAs identified by RNA-Seq) that showed average GEI ≥ 0.7 in 10403S, indicating that these ncRNAs are transcribed in stationary phase L. monocytogenes (see Materials and Methods for more details on ncRNA annotation). Among the 67 ncRNAs identified as transcribed in the present study, 60 matched ncRNAs previously described in L. monocytogenes (Additional file 5: ncRNAs identified by RNA-Seq) [19][20][21][22]. These 60 ncRNAs included 6S RNA, tmRNA, several S-box RNA and T-box leader RNA mole-View of RNA-Seq data using the Artemis genome browser cules. A total of 7 putative ncRNAs identified here were not previously identified in L. monocytogenes and did not match ncRNA entries in Rfam ( Table 4). The regions representing these putative ncRNAs showed contiguous coverage by RNA-Seq reads (i.e., at least 100 bp completely covered by RNA-Seq reads), but did not fully match annotated genes. Overall, 36 of the ncRNAs recently identified by tiling microarray analyses in L. monocytogenes strain EGD-e [20] were not identified in this study (see Additional file 6: ncRNAs previously described in L. monocytogenes strain EGD-e but not identified in this study for a list of these EGD-e ncRNAs). The most likely explanations for the absence of these EGD-e ncRNAs in 10403S are one or more of the following: (i) low (<0.7 GEI) or no RNA-Seq coverage in 10403S (indicating no transcription in stationary phase 10403S or loss of small RNAs during RNA isolation); (ii) the homolog may be absent in the L. monocytogenes 10403S genome (e.g., for EGD-e RliC; Table  S3); (iii) ncRNAs determined to be antisense RNA in EGD-e [20] were not identified in 10403S, as the RNA-Seq protocol did not provide for directional reads; (iv) the corresponding 10403S genome region has not been com- pletely sequenced and closed (e.g., for EGD-e LhrC, which falls in a repetitive region in the EGD-e chromosome [19]), and (v) the EGD-e ncRNA did not meet our criterion of 100 bases of contiguous coverage.
Three putative ncRNAs with high GEI covered either part or all of each of three annotated CDS, suggesting that ncRNAs overlap with these CDS or that some putative CDS actually encode ncRNAs rather than proteins. Specifically, LMRG_01574 (lmo2257), LMRG_02926 (no homolog in EGD-e), and LMRG_1986 (lmo2711) overlapped with lhrA (partial overlap), with the bacterial RNAse P class B ncRNA (full overlap), and with the bacterial signal recognition particle RNA (partial overlap), respectively. In concert with our findings, lmo2257 was previously hypothesized not to be a CDS [19,21].

RNA-Seq identified 96 annotated CDS and one ncRNA as s B -dependent and provided comprehensive data on transcript levels for genes in the s B regulon
Our RNA-Seq data analyses identified a total of 96 genes as up-regulated by σ B (Additional file 7: Genes up-regulated by σ B ). No annotated genes were identified as significantly down-regulated by σ B in this study. Although various genes have been identified previously as downregulated by σ B [10,12,20], we have observed that genes with significantly higher transcript levels in the ΔsigB strain (i.e., genes identified as down-regulated by σ B ): (i) are likely to be indirectly regulated by σ B , as σ B is a transcriptional activator, (ii) generally show a lower fold-difference in transcript levels between the parent strain and the ΔsigB strain as compared to genes identified as up-regulated by σ B [10], and (iii) have not been consistently identified as down-regulated by σ B between different studies, even in microarray studies using the same strain and condition (see Figure 3, which indicates that only 7 genes were identified as down-regulated by σ B in both of two separate studies with strain 10403S). Down-regulation of genes by σ B thus appears stochastic as compared to up-regulation by σ B . Overall, our findings suggest that RNA-Seq combined with stringent criteria for detection of statistically significant differences in transcript levels (i.e., the requirement for statistical significance for all four binomial comparisons) may generate fewer false positives as compared to some microarray-based approaches.
As illustrated in Figure 4A, RNA-Seq data are useful for predicting multi-gene operons controlled by a given regulator such as σB. Thirty-eight of the 96 genes up-regulated by σB are organized into a total of 20 operons, including (i) opuCABCD, which encodes the subunits of a glycine betaine/carnitine/choline ABC transporter, (ii) lmo0781-  lmo0784, which encode the four subunits of a putative mannose-specific phosphotransferase system, (iii) lmo2484-lmo2485, which encode a putative membraneassociated protein and a putative transcriptional regulator similar to PspC, respectively, and (iv) lmo0133 and lmo0134 ( Figure 4A), which encode proteins similar to E. coli YjdI and YjdJ, respectively.
One-sided Fisher's exact tests were used to determine if σ Bdependent genes are over-represented within specific TIGR role categories. Genes identified as σ B -dependent were over-represented among genes involved in cellular functions (q-value = 0.045). σ B -dependent genes in this category include genes involved in pathogenesis (inlA, inlB, inlH), adaptation to atypical conditions (lmo0515, lmo0669, lmo2673, lrtC), detoxification (lmo1433, lmo2230), cell division (lmo1624) and an unknown protein that may be involved in toxin production and resistance (lmo0321).
We evaluated RNA-Seq transcript levels for the 96 σ Bdependent genes identified here (Additional file 7: Genes up-regulated by σ B ). The average fold change (10403S GEI/ΔsigB GEI) for the 96 σ B -dependent genes ranged from 2.6 to 479.4. The σ B -dependent genes with the highest average GEI in 10403S were lmo2158, lmo1602, and lmo0539, which encode a protein similar to B. subtilis YwmG, an unknown protein, and a tagatose-1,6-diphosphate aldolase, respectively (Table 5).
An ~ 500 nt σ B -dependent ncRNA was identified between lmo2141 and lmo2142 ( Figure 4B); this ncRNA was recently designated rli47 [20]. To be consistent with the nomenclature for other σ B -dependent ncRNA [21], we propose that rli47 be named sbrE (sigma B-dependent RNA). Although BLASTX searches (using 6 possible reading frames) and searches against the Pfam database did not yield significant matches, a σ B -dependent promoter was identified upstream of the transcript and a Rho-independent terminator was found by TransTermHP ( Figure  4B). The sequence for this putative ncRNA was also present in 17 other L. monocytogenes genomes, including EGD-e (GenBank accession no. NC 003210), F2365 (GenBank accession no. NC 002973), and 15 unfinished genome sequences by the Broad Institute http:// www.broad.mit.edu/annotation/genome/listeria_group/ MultiHome.html as well as in one L. innocua (GenBank accession no. NC 003212) and one L. welshimeri (Gen-Bank accession no. NC 008555) genome. The 514 nt sbrE (rli47) sequence was 96.6% conserved among the 18 L. monocytogenes genomes.

HMM showed that 84% of s B -dependent genes and operons identified by RNA-Seq are preceded by s B promoters and therefore, appear to be directly regulated by s B
An HMM representing L. monocytogenes σ B -dependent promoters was dynamically created by using an initial training set of experimentally verified L. monocytogenes σ Bdependent promoters to search the RNA-Seq data. The final model yielded a total of 5,387 motifs with scores > 5.00 bits throughout the pseudochromosome sequence. Among these motifs, we identified 65 possible σ B -dependent promoter sequences upstream of genes and operons identified as σ B -dependent based on RNA-Seq data (see Figure 5 for the L. monocytogenes σ B promoter sequence σ B -dependent genes identified by RNA-Seq and microarray analyses Figure 3 B -dependent genes identified by RNA-Seq and microarray analyses. Venn diagram of σ B -dependent genes identified in stationary phase cells in this study and in previous microarray studies of stationary phase L. monocytogenes [10,12]. Numbers in bold are the number of up-regulated annotated CDS identified as σ B -dependent in each study; numbers followed by down arrows are down-regulated σ Bdependent genes. No down-regulated σ B -dependent genes were identified by RNA-Seq. The 13 genes identified as σ Bdependent in stationary phase only by RNA-Seq, but not by previous microarray studies of L. monocytogenes 10403S, include 5 genes that had been found to be σ B -dependent, by microarray studies [10] in salt stressed cells (see Table 5). In a number of instances, (e.g. opuCB, rsbX; See Additional file 8: Comparison of genes found to be σ B -dependent by microarray analysis and not by RNA-Seq) genes with significantly different transcript levels in both microarrays [10,12] had significant binomial probabilities (q < 0.05) and a fold change ≥ 2.0 for most of the possible combinations (i.e. 10403S replicate 1 vs ΔsigB replicate 1; 10403S replicate 1 vs ΔsigB replicate 2; 10403S replicate 2 vs ΔsigB replicate 1; 10403S replicate 2 vs ΔsigB replicate 2), but not for all four comparisons and these genes were, therefore, not identified as showing significant differences in normalized RNA-Seq coverage (based on our conservative definition of genes with significant differences in normalized RNA-Seq coverage); see Additional file 8: Comparison of genes found to be σ B -dependent by microarray analysis and not by RNA-Seq for detailed RNA-Seq data for genes identified as σ B -dependent by microarrays, but not by RNA-Seq.
logo). Because some of the genes with experimentally validated σ B promoters were not found to be significantly upregulated by σ B in our study (e.g. prfA and the rsbV operon) and because the ltrC promoter, which was in the initial training set, had a score below our threshold of 5.00 bits in the final search, our annotation does not include all promoters present in the training set (i.e., only promoters identified upstream of genes that were significantly up-regulated by σ B in the present study were annotated). Specifically, σ B -dependent promoter sequences were found upstream of 15 of the 20 putative σ B -depend-ent operons, 49 of the 58 monocistronic σ B -dependent genes, and the one σ B -dependent ncRNA identified here ( Figure 4B). We compared RNA-Seq defined transcriptional start sites for 8 genes with σ B promoters to transcriptional start sites determined by Rapid Amplification of cDNA Ends PCR (RACE-PCR) in a previous study [23]. Transcriptional start sites identified with RNA-Seq were located between 0 to 29 bases down-stream (and therefore sometimes 3') of start sites determined by RACE-PCR (see Figure 4C for LMRG_01602 transcriptional start site mapped by RACE-PCR and RNA-Seq), indicating that  RNA-Seq successfully approximates transcriptional start sites, but sometimes does not provide full sequence coverage to the 5' end of a transcript. Some transcriptional start sites could not be specifically mapped to a σ B promoter site using RNA-Seq as some genes (e.g. opuCA) have multiple promoters. A dendrogram of the putative σ B promoter sequences showed no apparent clustering of these promoter sequences by either average GEI in 10403S or by σ B -dependence (average fold change). These results suggest that additional regulatory elements or mechanisms other than promoter sequence per se (e.g., RNA stability) also influence transcript levels and/or σ B -dependence for these genes (data not shown).  Logo of the σ B promoter Figure 5 Logo of the B promoter. This logo was created from the alignment of 65 σ B promoters identified in this study.

RNA-Seq successfully identifies a number of previously identified as well as novel s B -dependent genes
To evaluate the ability of RNA-Seq to identify L. monocytogenes σ B -dependent genes, we compared the σ B -dependent genes identified here with those identified in two independent microarray studies by our research group. Specifically, we compared our results with microarray data reported by (i) Raengpradub et al. [10], who identified σ B -dependent genes using L. monocytogenes strains and growth conditions identical to those in this study, and by (ii) Ollinger et al. [12], who identified σ B -dependent genes by comparing transcripts from L. monocytogenes 10403S with a PrfA* (G155S) allele [24], which constitutively expresses the PrfA-regulated virulence genes [24][25][26], with those from an isogenic ΔsigB mutant grown to stationary phase under the same conditions used here. Further, we compared our results with those from a microarray study using another L. monocytogenes strain (EGD-e) and its isogenic ΔsigB mutant, grown under similar conditions (i.e., growth to early stationary phase [11]). Among the 96 σ B -dependent annotated CDS identified in the present study, 72 were also identified as σ B -dependent in previous microarray studies of stationary phase L. monocytogenes 10403S cells [10,12] (Figure 3). In addition, 64 (66.7%) of the 96 σ B -dependent genes identified here were identified as positively regulated by σ B in L. monocytogenes strain EGD-e cells grown to early stationary phase (8 h growth in BHI) [11]. Overall, 12 genes identified as σ B -dependent in stationary phase cells in both previous microarray studies by our group [10,12], were not identified as σ B -dependent by the RNA-Seq experiments reported here ( Figure 3); 9 of these genes showed a σ Bdependent promoter based on the HMM analyses in this study and are likely to be directly regulated by σ B (see Additional file 8: Comparison of genes found to be σ Bdependent by microarray analysis and not by RNA-Seq for further details on these genes).
Finally, a total of 13 annotated CDS identified as σ Bdependent by RNA-Seq (including 9 genes that also showed a σ B -dependent promoter in our HMM analysis) had not been identified as σ B -dependent in either of the previous microarray studies with strain 10403S grown to stationary phase [10,12] (see Table 3). Among these 13 genes not previously identified as σ B -dependent in stationary phase L. monocytogenes 10403S, five had previously been identified as σ B -dependent in salt-stressed cells [10], including the well-characterized virulence genes inlA and inlB, which have also been shown by qRT-PCR and promoter mapping to be directly regulated by σ B [27]. In addition, two of these 13 genes had been identified as positively regulated by σ B in L. monocytogenes strain EGDe [11], even though they had not been identified as σ Bdependent in previous microarray studies of strain 10403S [10,12]. For one of these genes (i.e. lmo0265), the microarray probe (designed based on the genome of L. monocytogenes strain EGD-e) showed a low hybridization index (HI; % match between strain-specific sequence and oligonucleotide probe) to 10403S (< 80%). Interestingly, lmo2003, which encodes a transcription regulator similar to the GntR family, was identified as σ B -dependent by RNA-Seq, but had not been previously identified as σ Bdependent in either 10403S or EGD-e.

Discussion
In this study, we used deep RNA sequencing to define and characterize the transcriptomes of L. monocytogenes strain 10403S and an otherwise isogenic ΔsigB mutant, which does not express the general stress-response sigma factor, σ B . The data generated using this approach showed that (i) at least 83% of annotated L. monocytogenes genes are transcribed in stationary phase cells; and (ii) stationary phase L. monocytogenes transcribes 67 ncRNAs, including one σ B -dependent ncRNA and seven ncRNAs that, to our knowledge, have not previously been identified in L. monocytogenes. Additionally, RNA-Seq data provided for quantitation of transcript levels and approximate identification of transcriptional start sites on a genome scale. Use of a novel, iterative, dynamic HMM, in combination with RNA-Seq data, identified putative σ B -dependent promoters and further defined the L. monocytogenes σ B regulon.

The majority of annotated L. monocytogenes genes are transcribed in stationary phase cells
While genome sequencing and microarray approaches have provided important insight into the biology of prokaryotic organisms, including a number of human bacterial pathogens, identification of all genes and their transcriptional patterns remains a major challenge in all areas of biology. Our results demonstrate that global probe-independent approaches for transcriptome characterization are valuable tools for analyzing bacterial transcriptomes [16,28,29]. A major challenge that currently hinders analysis of transcriptomic data generated by approaches such as RNA-Seq is the ability to differentiate between genes with low levels of transcription and background levels of coverage. Several approaches have been used to define cut-off values between background GEI and GEI indicative of low transcript levels (e.g., [15,30,31]). We chose a comparative analysis of L. monocytogenes 10403S transcript levels with those of a mutant strain that does not express a transcription factor (i.e., the alternative sigma factor σ B ) as a novel approach for robustly defining background RNA-Seq coverage. Our results show that a number of σ B -dependent genes were solely σ B -dependent (at least under the conditions used here), as supported by the lack of detectable RNA-Seq coverage in the ΔsigB strain, despite considerable RNA-Seq coverage of the same genes in the isogenic parent strain 10403S. This is an important observation as a number of σ B -dependent L. monocytogenes genes are also activated by other sigma factors (e.g., σ A [32,33]). Using the average GEI for L. mono-cytogenes genes that were solely σ B -dependent in the ΔsigB strain as a conservative cut-off value for transcribed genes, we found that approximately 83% of L. monocytogenes 10403S annotated CDS were transcribed in stationary phase cells. These transcribed genes include 355 putative operons, which cover a total of 1,107 genes, indicating that a considerable proportion of L. monocytogenes genes appear to be transcribed polycistronically. In comparison, a recent study using a tiling microarray identified 517 polycistronic operons that encompass 1,719 genes in L. monocytogenes EGD-e [20]. Taken  Our results also demonstrate that RNA-Seq coverage levels (generated with the Illumina Genome Analyzer System) correlate well with quantitative RT-PCR-based mRNA transcript level data. Therefore, in combination with results from previous studies (e.g., in yeast [15,31], human cell lines [35], human tissue [36], murine tissue [30]), our findings indicate that RNA-Seq tools can be broadly applied in biological studies to enable quantitative analysis of transcript levels. We also found a positive correlation between RNA-Seq-based transcript levels and codon bias, consistent with the well-documented observation that genes with high codon bias are often highly expressed [37][38][39]. Genes in four role categories, including (i) signal transduction, (ii) viral functions, (iii) amino acid biosynthesis, and (iv) transport and binding, were significantly associated with lower transcript levels. These categories include a number of genes that encode proteins predominantly required for growth and survival under specialized environmental conditions (e.g., viral replication genes) or under conditions other than stationary phase (e.g., amino acid biosynthesis may be less important in stationary phase than during exponential growth as sufficient amino acids from dead bacteria are likely to be available for scavenging), and/or proteins that may only be required in small amounts. On the other hand, we found that genes in seven role categories, including (i) cellular processes, (ii) DNA metabolism, (iii) protein fate, (iv) protein synthesis, (v) purines, pyrimidines, nucleosides, and nucleotides, (vi) transcription, and (vii) genes encoding proteins with unknown functions, showed, on average, higher transcript levels in stationary phase L. monocytogenes. These findings suggest that genes in these particular categories are important for bacterial cells transitioning from exponential growth to stationary phase.
Overall, the L. monocytogenes genes with the highest transcript levels were ncRNAs, specifically the transfer-mes-senger RNA (tmRNA) and 6S RNA, consistent with the observation that tmRNAs are involved with bacterial recovery from a variety of stresses including entry into stationary phase, amino acid starvation, and heat shock [40]. 6S RNA accumulates in cells during stationary phase; cells lacking 6S RNA have reduced fitness relative to wildtype stationary phase cells [41]. In addition to down-regulating some housekeeping genes, 6S RNA has been shown to upregulate expression of some σ S -dependent genes in Gramnegative bacteria [41]. σ S is the stationary phase stress response alternative sigma factor in E. coli [42]. Taken together, we hypothesize that 6S RNA plays a critical role in the ability of L. monocytogenes to survive stationary phase associated stress conditions.
Specific protein-encoding genes with very high transcript levels in stationary phase L. monocytogenes include fri, sod, cspB, and cspL, all genes with some previous evidence for contributions to L. monocytogenes stationary phase and stress survival [43][44][45][46][47][48][49]. flaA, which encodes a flagellin protein, was also highly transcribed in stationary phase cells at 37°C. Although L. monocytogenes has been reported to show flagellar motility only when grown at ≤ 30°C [50,51], our results are consistent with the observation that strain 10403S, which was used in this study, has been shown to express flagellin at 37°C [51]. Interestingly, we also found some annotated CDS without known function to be highly transcribed, including lmo1847 and lmo1849, which encode putative ABC transporters based on BLAST and Pfam [52] searches, respectively, and lmo1468, which encodes an unknown protein.

RNA-Seq identifies ncRNA molecules in L. monocytogenes, including a s B -dependent ncRNA, in 10403S
Using RNA-Seq, we found 67 previously identified or putative ncRNAs that were transcribed in stationary phase L. monocytogenes. Of these, 7 represent ncRNAs that have not been identified previously as transcribed in L. monocytogenes. Sixty of the ncRNAs identified here have previously been reported by Toledo-Arana et al. [20], Nielsen et al. [53], Mandin et al. [22] and Christiansen et al. [19]. Interestingly, 16 L. monocytogenes ncRNAs with similarities to ncRNAs identified in other bacterial organisms are putative riboswitches. We also found that sbrE (rli47), which has no homologies to ncRNA entries in Rfam, appears to be directly regulated by σ B , based on the considerably higher transcript levels (186 fold) present in the parent strain as compared to the sigB-null mutant, consistent with results from a recent tiling microarray study [20]. As the RNA isolation procedure used here selected against small RNA molecules (see Materials and Methods for details), it is likely that additional small ncRNAs not detected here (e.g., some small ncRNAs identified by Toledo-Arana et al. [20]), are also transcribed in stationary phase L. monocytogenes 10403S.

The L. monocytogenes s B regulon is composed of at least 96 genes, including 82 genes and 1 ncRNA that are preceded by putative s B promoters
As alternative sigma factors, such as σ B , are known to play critical roles in gene regulation across bacterial genera [33], we used L. monocytogenes 10403S and an isogenic ΔsigB null mutant as (77.1%) [10] and 81 (84.4%) [12] were also identified as σ B -dependent in stationary phase cells in two previous microarray studies using the same strain background. Also, 63 of the 96 σ B -dependent genes identified here were reported as positively regulated by σ B in another L. monocytogenes strain (EGD-e) grown to early stationary phase [11]. Twelve genes were identified as σ B -dependent in both previous microarray studies performed with the same L. monocytogenes strain background and the same conditions used here, but were not identified as σ Bdependent by RNA-Seq in this study. This disparity is likely due to the fact that the thresholds and statistical cutoffs used to define σ B -dependent genes were very stringent in the present study (e.g., a q-value < 0.05 in all four comparisons).
Overall, in addition to confirming a previously identified σ B -dependent ncRNA [20], RNA-Seq identified 13 genes that had not been defined as σ B -dependent in previous microarray studies of stationary phase L. monocytogenes 10403S cells [10,12], including 5 genes that had been identified as σ B -dependent in salt stressed cells, but not in stationary phase cells. One gene not previously identified as σ B -dependent was lmo2003, which encodes a transcription regulator similar to the GntR family. The GntR family of regulators has been characterized as global regulators of primary metabolism in a number of bacteria [56][57][58]. This finding further supports that L. monocytogenes σ B appears to be involved in a number of transcriptional regulatory networks [6]. Increasing evidence indicates that regulatory RNAs also contribute to regulatory networks that involve L. monocytogenes σ B . For example, in addition to the σ Bdependent SbrE ncRNA described here, tiling array analyses also identified additional σ B -dependent ncRNAs.
While previous in silico studies in L. monocytogenes strain EGD-e [53] identified four putative σ B -dependent ncRNAs (i.e., SbrA, SbrB, SbrC, SbrD), only SbrA was confirmed in vivo as σ B -dependent in EGD-e [20,53]. Even though our RNA-Seq analyses in 10403S identified SbrA transcripts, transcript levels for this ncRNA were not σ B -dependent under the conditions used in our study. The fact that SbrA was not found to be σ B -dependent in 10403S may be due to differences in strains or growth conditions used (e.g., Nielsen et al. [53] and Toledo-Arana et al. [20] used strain EGD-e, while we used strain 10403S). Further studies in different L. monocytogenes strains will thus be needed to understand the full complexity of regulatory networks in this pathogen, including those involving σ B and ncRNAs.
The quantitative nature of RNA-Seq allowed us to also identify highly transcribed σ B -dependent genes, including lmo2158 (which encodes a protein similar to the B. subtilis YwmG), lmo1602 (which encodes an unknown protein), and lmo0539 (which encodes a tagatose-1,6diphosphate aldolase). Interestingly, none of these genes encode proteins that appear to contribute to any of the presently recognized σ B -dependent phenotypes in L. monocytogenes, such as acid resistance [9,59], oxidative stress resistance [59,60], or virulence [27,33,61,62]. As there are no published reports of construction and characterization of null mutations in these highly transcribed σ B -dependent genes, our data clearly suggest that σ B and the σ B regulon make additional important contributions to L. monocytogenes physiology that remain to be characterized.
In conjunction with appropriate bioinformatics tools, such as the iterative, dynamic HMM developed in this study to identify putative σ B promoters, RNA-Seq data also allowed mapping of approximate transcriptional start and termination sites. Specifically, putative σ Bdependent promoters were identified upstream of (i) 49 monocistronic σ B -dependent genes, (ii) 15 σ B -dependent operons (covering a total of 40 genes), and (iii) 1 σ Bdependent ncRNA. By comparison, in the absence of genome wide transcriptional start site data, a previous study that solely relied on HMM and genome sequence data identified putative σ B -dependent promoters upstream of only 40 genes that had been identified as σ Bdependent by microarray analyses [10]. Our data reported here show that the majority of σ B -dependent genes are directly regulated by σ B and illustrate the power of combining RNA-Seq data and bioinformatics approaches for characterizing transcriptional regulatory systems. Specifically, combining transcriptional start site information with an HMM that identifies promoter motifs (e.g., the motif for σ B -dependent promoters) provides a powerful approach for identifying genes directly regulated by a given transcription factor. This approach facilitates rapid genome-wide identification of putative transcriptional start sites, which currently represents a critical bottleneck in genome-wide characterization of transcriptional regulation and regulatory networks, as many current strategies for promoter mapping (e.g., primer extension, rapid amplification of cDNA ends (RACE-PCR), RNAse protection assays) are time-and labor-intensive.

Conclusions
Using the human foodborne pathogen L. monocytogenes as a model system, we have shown that RNA-Seq provides a powerful approach to (i) rapidly, comprehensively, and quantitatively characterize prokaryotic genome-wide transcription profiles without hybridization bias, and (ii) characterize putative transcriptional start sites and operon structures. We also show that RNA-Seq transcriptomic evaluation of a bacterial strain bearing a deletion in a transcriptional regulator in comparison with its parent strain can provide rapid, comprehensive insights into the blueprints of prokaryotic transcriptional regulation. Such tools and approaches will revolutionize our ability to characterize genome-wide transcriptional regulatory networks, with wide ranging applications from medicine to ecology, e.g., by providing a means to quickly characterize transcriptional networks contributing to pathogen transmission and virulence as well as environmental growth and gene expression in bacteria used for specific purposes, such as bio-remediation. When applied to both genome and transcriptome sequencing, novel high throughput sequencing approaches can also provide rapid and comprehensive characterization of bacterial genomes, representing an important tool for initial rapid characterization of novel and emerging bacterial pathogens.

Strains and growth conditions
RNA-Seq was performed on the L. monocytogenes parent strain 10403S and a previously described [9] isogenic mutant (ΔsigB, FSL A1-254) with an internal non-polar deletion of sigB, which encodes the stress response alternative sigma factor σ B .
Prior to RNA isolation, bacteria were grown in 5 ml Brain Heart Infusion (BHI) broth (BD Difco, Franklin Lakes, NJ) at 37°C with shaking (230 rpm) for 15 h, followed by transfer of a 1% inoculum to 5 ml pre-warmed BHI. After growth to OD 600 ~ 0.4, a 1% inoculum was transferred to a 300 ml nephelo flask (Bellco, Vineland, NJ) containing 50 ml pre-warmed BHI. This culture was incubated at 37°C with shaking until cells reached stationary phase (defined as growth to OD 600 = 1.0, followed by incubation for an additional 3 h). Two independent growth replicates and RNA isolations were performed for each strain.

RNA isolation, integrity and quality assessment
RNA isolation was performed as previously described [10]. Briefly, RNAProtect bacterial reagent (Qiagen, Valencia, CA) was added according to the manufacturer's instructions to the cultures grown to stationary phase; treated cells were stored at -80°C (for no longer than 24 h) until RNA isolation was performed. Bacterial cells were treated with lysozyme followed by 6 sonication cycles at 18W on ice for 30 s. Total RNA was isolated and purified using the RNeasy Midi kit (Qiagen) according to the manufacturer's protocol; RNA molecules <200 nt in length are not recovered well with this procedure, according to the manufacturer. RNA was eluted from the column using RNase-free water. Total RNA was incubated with RQ1 DNase (Promega, Madison, WI) in the presence of RNasin (Promega) to remove remaining DNA. Subsequently, RNA was purified using two phenol-chloroform extractions and one chloroform extraction, followed by RNA precipitation and resuspension of the RNA in RNAse free TE (10 mM Tris, 1 mM EDTA; pH 8.0; Ambion, Austin, TX). UV spectrophotometry (Nanodrop, Wilmington, DE) was used to quantify and assess purity of the RNA.
Efficacy of the DNase treatment was assessed by TaqMan qPCR analysis of DNA levels for two housekeeping genes, rpoB [63] and gap [33]. qPCR was performed using Taq-Man One-Step RT-PCR Master Mix Reagent and the ABI Prism 7000 Sequence Detection System (all from Applied Biosystems, Foster City, CA). Each RNA sample was run in duplicate and standard curves for each target gene were included for each assay to allow for absolute quantification of residual DNA. Data were analyzed using the ABI Prism 7000 Sequence Detection System software as previously described [64] Normalization and log transformation were performed as described by Kazmierczak et al. [23]. All samples showed log copy numbers ≤ 1.5 and C t values > 35 for both rpoB and gap, indicating negligible levels of DNA contamination. As a final step, RNA integrity was assessed using the 2100 Bioanalzyer (Agilent, Foster City, CA).

mRNA enrichment
Removal of 16S and 23S rRNA from total RNA was performed using MicrobExpress™ Bacterial mRNA Purification Kit (Ambion) according to the manufacturer's protocol with the exception that no more than 5 μg total RNA was treated per enrichment reaction. Each RNA sample was divided into multiple aliquots of ≤ 5 μg RNA and separate enrichment reactions were performed for each sample. Enriched mRNA samples were pooled and run on the 2100 Bioanalzyer (Agilent) to confirm reduction of 16S and 23S rRNA prior to preparation of cDNA fragment libraries.

Preparation of cDNA fragment libraries
Ambion RNA fragmentation reagents were used to generate 60-200 nucleotide RNA fragments with an input of 100 ng of mRNA. Following precipitation of fragmented RNA, first strand cDNA synthesis was performed using random N 6 primers and Superscript II Reverse Transcriptase, followed by second strand cDNA synthesis using RNaseH and DNA pol I (Invitrogen, CA). Doublestranded cDNA was purified using Qiaquick PCR spin columns according to the manufacturer's protocol (Qiagen).

RNA-Seq using the Illumina Genome Analyzer
The Illumina Genomic DNA Sample Prep kit (Illumina, Inc., San Diego, CA) was used according to the manufacturer's protocol to process double-stranded cDNA for RNA-Seq, including end repair, A-tailing, adapter ligation, size selection, and pre-amplification. Amplified material was loaded onto independent flow cells; sequencing was carried out by running 36 cycles on the Illumina Genome Analyzer.
The quality of the RNA-Seq reads was analyzed by assessing the relationship between the quality score and error probability; these analyses were performed on Illumina RNA-Seq quality scores that were converted to phred format http://www.phrap.com/phred/. Quality scores are reported in Additional file 9: Distribution of quality scores for all RNA-Seq runs.

RNA-Seq alignment and coverage
The program nucmer, which is part of the  10: Genbank (gbk) file with ncRNAs identified here). The 5S, 16S and 23S rRNA genes as well as the various tRNA genes in 10403S were identified using blastn and the EGD-e annotated rRNA and tRNA genes as a reference (Genbank ID: AL591824).
Based on quantitative analyses of RNA-Seq data, throughout this manuscript, transcript levels of a given gene are reported as the Gene Expression Index (GEI), which is expressed as number of reads per 100 bases. To obtain the GEI, the 10403S pseudochromosome was used to align Illumina RNA-Seq reads. These alignments were performed using the whole genome alignment software Eland (Illumina), which reports unique alignments of the first 32 bases of each read, allowing up to 2 mismatches. Coverage at each base position along the pseudochromosome was calculated by enumerating the number of reads that align to a given base. The coverage for each base from the first to last nt in an annotated CDS was summed then divided by 32 (i.e., the length of each aligned read) to obtain the RNA-Seq coverage for that gene before normalization. The following data were discarded prior to further analyses: (i) reads with more than 2 mismatches, (ii) reads that matched to multiple locations, (iii) reads that did not map to the chromosome, and (iv) reads that mapped to the 16S or 23S genes (Table 1). Reads identified as "matching two locations" did not include those matching rRNA genes as the 10403S pseudochromosome created for this study was designed with only one unique rRNA gene sequence. Reads matching the 16S and 23S genes were removed prior to normalizing the total number of aligned reads across the four samples because of the technical bias introduced by our deliberate partial removal of 16S and 23S transcripts from the samples. Despite removal of 16S and 23S rRNA, in a given run, between 1,860,817 and 3,138,329 reads aligned to the 23S gene and between 434,263 and 760,863 reads aligned to the 16S gene. In a given run, between 101,419 and 242,246 reads matched the 5S rRNA gene and between 7,778 and 62,699 reads matched the various tRNA genes present in the pseudochromosome.
Because of the inherent differences in the total number of reads among the four runs, the total number of reads for each run was normalized to the run with the highest coverage (i.e. ΔsigB replicate 2, Table 1). The ratio of total number of reads for ΔsigB replicate 2 to the total number of reads for 10403S replicate 1, 10403S replicate 2, or ΔsigB replicate 2 was used as a multiplier to normalize the approximate number of reads matching a given gene ( Table 1). The GEI was then obtained by dividing the normalized number of reads matching each gene by the gene length. The average GEI was the number of reads that match each nt in a given gene after normalization; this value represented the average of the 2 biological replicates for a given strain and is presented as reads per 100 bases (as opposed to reads per 1 base) to simplify identification of differences. The distribution of the coefficient of variation for each gene between replicates is depicted in Additional file 11: Coefficient of variation among RNA-Seq replicates by strain.

Identification of transcribed annotated CDS
Sequence reads matching annotated CDS in the 10403S genome were used to identify those annotated CDS that were transcribed under the experimental conditions used. As our RNA-Seq analyses included both a wildtype strain and an isogenic mutant with a deletion in a transcriptional regulator (i.e., the alternative sigma factor σ B ), our data also provide a novel approach for characterizing background RNA-Seq coverage for genes that are not transcribed, similar to a previous approach that used background RNA-Seq coverage of so-called "gene deserts" in human chromosomes to characterize background average GEI [65]. The observations that (i) eight genes that showed average GEI between 8.64 reads and 96.43 reads per 100 bases in the parent strain showed 0 reads per 100 bases in the ΔsigB strain; (ii) 42 genes with average GEI of 1.21 to 73.81 reads per 100 bases in the parent strain showed between 0.01 and 0.7 reads per 100 bases in the ΔsigB strain; and (iii) 0.7 reads per 100 bases is the approximate median of the average GEI in σ B -dependent genes in the ΔsigB strain, clearly indicate that extremely low background RNA-Seq coverage is expected for genes that are not transcribed. Overall, 50/96 σ B -dependent genes show an average GEI < 0.7 in the ΔsigB strain (Additional file 7: Genes up-regulated by σ B ); genes with GEI < 0.7 reads are overrepresented in the ΔsigB strain ( Figure  6). It is not unexpected that some σ B -dependent genes showed average GEI ≥ 0.7 as a number of genes are not solely dependent on σ B and will still be transcribed in the absence of σ B (e.g., opuCABCD operon [32,66,67]). Based on these observations, we set an average GEI ≥ 0.7 as a conservative cut-off to identify genes that are transcribed (i.e., we define genes with average GEI ≥ 0.7 as being transcribed as the RNA-Seq data indicate that non-specific reads [e.g., from DNA] are highly unlikely to provide average GEI ≥ 0.7).
Depending on RNA-Seq coverage, genes were classified into four categories, including (i) not transcribed (average GEI < 0.7), (ii) low transcript levels (average GEI ≥ 0.7 and < 10), (iii) medium transcript levels (average GEI ≥ 10 and < 25), and (iv) high transcript levels (average GEI ≥ 25). While cut-offs between low, medium, and high transcript level categories were somewhat arbitrary, they were chosen to yield a relative distribution of genes into these categories similar to the distribution of yeast genes into low, medium, and high expression categories reported previously by Nagalakshimi et al. [15].

Annotation of Rho-independent terminators and putative operons
Potential operons were manually annotated based on the continuity of a similar level of RNA-Seq coverage across consecutive genes and the (i) absence of putative Rhoindependent terminators between genes, and/or (ii) presence of a putative Rho-independent terminator at the end of a putative operon. Putative Rho-independent terminators in the 10403S pseudochromosome were identified using the program TransTermHP v2.04 [68].

Discovery and annotation of regions transcribing ncRNAs
To aid in identification of transcribed ncRNAs, ncRNAs previously identified in L. monocytogenes EGD-e [19][20][21][22] were mapped onto the 10403S pseudochromosome and were identified as transcribed in 10403S in this study.
New putative ncRNAs (i.e., ncRNAs not previously reported or previously identified by Rfam) were manually identified using the genome browser Artemis [69]. Specifically, regions not matching annotated genes, but showing contiguous coverage by RNA-Seq reads (i.e., regions that contain at least 100 bp completely covered by RNA-Seq reads) were designated putative ncRNAs. Further, RNA-Seq reads that did not cover an entire annotated CDS, but showed partial contiguous coverage within a Average gene expression indices for σ B -dependent genes Figure 6 Average gene expression indices for B -dependent genes. The histogram shows the average GEI of σ B -dependent genes in 10403S (red) and the ΔsigB (blue) strains. GEIs were grouped in intervals of 0.7, i.e., the first bar represents genes with GEIs between 0 and 0.7; the second bar represents GEIs between > 0.7 and ≤ 1.4, etc. Genes with average GEI ≥ 50 were grouped together.
CDS, were also designated as putative ncRNAs. All ncRNAs, including those reported in previous publications [19,20,22,53], those identified by Rfam, and those with no matches to the Rfam database were annotated into a Genbank (gbk) file that is available as Additional file 10: Genbank (gbk) file with ncRNAs identified here. ncRNAs identified by RNA-Seq, but with no matches to the Rfam database were designated "putative ncRNA" and received designations from rli64 to rli70. The presence of rho-independent transcriptional terminators was used to assign the strand of putative ncRNAs. For two instances where terminators were not observed, the ncRNAs were annotated on both strands.

Differential expression analysis
To identify genes that showed significantly different transcript levels in the parent strain (10403S) and the ΔsigB strain, statistical analyses were performed using the normalized RNA-Seq coverage of each coding gene (as annotated by the Broad Institute). Normalized RNA-Seq coverage (i.e. the number of reads that match an annotated CDS after normalization across runs) was used in lieu of the GEI (in which the normalized RNA-Seq coverage number is divided by the gene length) for statistical analyses. Corresponding analyses were also performed for each region encoding a putative ncRNA transcript identified as described above. A coverage file of normalized RNA-Seq coverage is available in Additional file 12: Coverage file with the normalized RNA-Seq coverage for the 4 RNA-Seq runs.
For each gene, a binomial probability was calculated for the normalized RNA-Seq coverage, using each of the four possible comparisons between the 10403S and ΔsigB transcripts (i.e. 10403S replicate 1 vs ΔsigB replicate 1; 10403S replicate 1 vs ΔsigB replicate 2; 10403S replicate 2 vs ΔsigB replicate 1; 10403S replicate 2 vs ΔsigB replicate 2). The binomial probability was calculated under the hypothesis that genes that are not regulated by σ B will show the same normalized number of reads in the two strains (p = 0.5 and q = 0.5). For a gene to be considered up-regulated by σ B , the binomial probability of observing as many reads in the ΔsigB strain as those observed for 10403S had to be < 0.05 for each of the four possible combinations. Conversely, for a gene to be considered down-regulated by σ B , the binomial probability of observing as many reads as those observed for ΔsigB had to have q-values < 0.05 for each of the four possible combinations. To control for multiple comparisons, a False Discovery Rate (FDR) approach was used. q-values (representing the FDR) were calculated using the program Q-Value [70] for R. Only genes with q-values < 0.05 and fold change ≥ 2 or ≤ 0.5 among all four possible comparisons between 10403S and ΔsigB were considered significantly up-regulated or down-regulated by σ B .

Iterative HMM-based promoter identification
An initial training set containing 17 experimentally validated σ B -dependent promoter motifs was used to build a Hidden Markov Model (HMM) of these motifs (Additional file 13: σ B -dependent promoters used for HMM search). HMM construction and searches were performed using the program hmmer version 1.8.5. The HMM was constructed from unaligned sequences (using hmmt) and then used to search the 10403S pseudochromosome (using the hmmls tool). The null frequencies of each nucleotide used were those observed in the L. monocytogenes genome (i.e., A/T = 0.31 and G/C = 0.19).
To identify new promoter motifs that could be added to the training set, we used an iterative HMM approach. In each given HMM iteration, the only hits added to the training set were those that met four conservative criteria, including (i) location within 100 bp upstream of the start codon of an annotated CDS (or 100 bp upstream the first nt for the manually annotated noncoding genes), (ii) qvalues < 0.05 (from the binomial probabilities) for σ B dependence of a given gene (based on RNA-Seq data), and (iii) fold change ≥ 2 among all possible comparisons between 10403S and ΔsigB, and (iv) a score higher than the lowest score for which 50% of the motifs fall in noncoding regions (i.e. for each iteration, we adaptively chose a threshold score such that 50% of the motifs that score higher than this threshold lie in noncoding regions). After adding all hits that met these criteria (in a given iteration) to the training set, a new model was built and used to search the 10403S pseudochromosome. This process was repeated until no new motifs could be added to the training set; the final training set can be found in Additional file 13: σ B -dependent promoters used for HMM search. When no new motifs that matched our criteria were discovered, the model was considered complete and the results from the last search were used for promoter identification. The final model was used to search the 10403S pseudochromosome for potential σ B promoters. Potential σ B promoters identified by this HMM upstream of σ Bdependent genes and the σ B -dependent putative ncRNA were visually evaluated. Potential σ B promoters identified by HMM were considered probable σ B promoters if the promoter was within 50 bp upstream of the transcriptional start site (as identified by RNA-Seq). In some instances, the transcriptional start site was not discernable due to an upstream gene transcript that overlapped with a σ B -dependent gene transcript or because the gene had a low average relative normalized RNA-Seq coverage. For these instances, putative promoters were considered if they were located within 200 bp from the start codon of the σ B -dependent gene. σ B -dependent genes with probable σ B promoters are described in Figure 7; the σ B promoter sequence logo is presented in Figure 5http:// weblogo.berkeley.edu/ [71].
Alignment of the 65 putative σ B -dependent promoters identified in this study Figure 7 Alignment of the 65 putative B -dependent promoters identified in this study. EGD-e homologs of genes or operons downstream of a given promoters are indicated on the left. Positions 3 to 6 in the alignment represent the -35 region while positions 24 to 29 represent the -10 region. Darker nucleotides are more conserved than lighter nucleotides in the alignment. Gene names that are boxed indicated promoters that have been experimentally validated (e.g., by RACE-PCR).