- Research article
- Open Access
RNA-Seq of Bacillus licheniformis: active regulatory RNA features expressed within a productive fermentation
BMC Genomics volume 14, Article number: 667 (2013)
The production of enzymes by an industrial strain requires a complex adaption of the bacterial metabolism to the conditions within the fermenter. Regulatory events within the process result in a dynamic change of the transcriptional activity of the genome. This complex network of genes is orchestrated by proteins as well as regulatory RNA elements. Here we present an RNA-Seq based study considering selected phases of an industry-oriented fermentation of Bacillus licheniformis.
A detailed analysis of 20 strand-specific RNA-Seq datasets revealed a multitude of transcriptionally active genomic regions. 3314 RNA features encoded by such active loci have been identified and sorted into ten functional classes. The identified sequences include the expected RNA features like housekeeping sRNAs, metabolic riboswitches and RNA switches well known from studies on Bacillus subtilis as well as a multitude of completely new candidates for regulatory RNAs. An unexpectedly high number of 855 RNA features are encoded antisense to annotated protein and RNA genes, in addition to 461 independently transcribed small RNAs. These antisense transcripts contain molecules with a remarkable size range variation from 38 to 6348 base pairs in length. The genome of the type strain B. licheniformis DSM13 was completely reannotated using data obtained from RNA-Seq analyses and from public databases.
The hereby generated data-sets represent a solid amount of knowledge on the dynamic transcriptional activities during the investigated fermentation stages. The identified regulatory elements enable research on the understanding and the optimization of crucial metabolic activities during a productive fermentation of Bacillus licheniformis strains.
Bacillus licheniformis is a spore-forming soil bacterium closely related to the Gram-positive model organism Bacillus subtilis. The species’ saprophytic life style, based on the secretion of biopolymer-degrading enzymes, predestinates strains of B. licheniformis as ideal candidates for the large-scale industrial production of exoenzymes, such as amylases and peptide antibiotics . Especially its high capacity of secreting overexpressed alkaline serine proteases has made B. licheniformis one of the most important bacterial workhorses in industrial enzyme production . Due to their high stability and relatively low substrate specificity, alkaline serine proteases like subtilisins are crucial additives to household detergents and the greatest share on the worldwide enzyme market [2, 3]. Attempts to optimize the productivity have addressed the fermentation process [4, 5], protein-engineering [3, 6, 7], and cellular influences on protein quality and quantity [2, 8]. Since the 4.2 Mb circular genome of the type strain B. licheniformis DSM13 was published in 2004 [1, 9], several genome-based studies targeting strain improvement have been performed successfully [10, 11]. However, genome-based studies are limited to information directly accessible from the DNA sequence and cannot benefit from knowledge of the active transcriptome. Considering that the regulatory network represented by protein- and RNA-based regulators determines the performance of an industrial-oriented fermentation process  RNA-Seq data might contribute to further optimization approaches.
RNA-based regulatory elements are involved in the regulation of metabolism, growth processes, the adaptation to stress and varying culture conditions  and can be divided into two main categories. The first category comprises non-coding RNAs (ncRNAs). Trans-encoded ncRNAs, often referred to as small RNAs (sRNAs), are encoded independently from protein genes and are able to modulate the mRNA of genes located at distant chromosomal loci or to interact with target proteins . Upon formation of secondary structures, trans-encoded ncRNAs interact with their target RNAs by imperfect base pairing, which is triggered by the binding of a seed region of at least six contiguous nucleotides [15, 16]. This mechanism allows the sRNA to address multiple targets, thus orchestrating different members of one regulon [14, 17]. It has been shown that sRNAs affect mRNA degradation and translation and modulate protein activity [14, 16]. A second class of regulatory ncRNAs is encoded in cis, which means that these ncRNAs are transcribed from the antisense strand of protein-coding genes . Hence, they are complementary in full-length and can therefore form RNA duplexes with the mRNA of the targeted genes . Most described examples of these cis-encoded antisense RNAs (asRNA) range from 100 to 300 nt in size, but some asRNAs are also shown to be substantially longer [18, 20]. Antisense RNAs have been proven to either positively or negatively affect transcription, translation and mRNA stability . In addition, a cis-encoded asRNA might work as a trans-encoded sRNA for another target . Antisense transcription has been detected in multiple organisms  and, with the growing number of explored species, it is assumed that antisense transcripts can be found for ~10 to 20% of the bacterial genes . A second class of RNA-based regulators encompasses cis- regulatory elements, mainly present at the 5′ untranslated region (5′UTR) of mRNA transcripts, e.g. riboswitches, T-boxes or thermosensors . Whereas both 5′ as well as 3’untranslated regions can bear signals for the initiation and termination of translation [24, 25], respectively, 5′UTRs additionally have the ability to fine-tune translation by cis-regulatory elements. They can be prone to RNA-binding proteins or antisense RNAs, carry attenuation systems [14, 23] and play a role in mRNA stability . In contrast, 3’UTRs are not as well understood and have escaped the attention of most transcriptomic studies . It is known that long UTRs can be localized antisense to adjacent genes on the opposite strand; in fact some of these overlapping UTRs have been demonstrated to act as negative regulators for genes encoded on the opposite strand .
The development of next-generation sequencing techniques including RNA sequencing (RNA-Seq) enabled the genome-wide identification of RNA-based regulatory elements in an unprecedented depth. The high dynamic range of RNA-Seq allows the identification of transcripts which are expressed at vastly different levels. Also, this method does not exhibit background noise and is therefore appropriate for the identification of lowly abundant transcripts . RNA-Seq analyses targeting ncRNA in particular, have been published for e.g. Mycobacterium tuberculosis, Streptomyces coelicolor and Sinorhizobium meliloti.
The major goal of the project in which this study is embedded is the improvement of production strains and thus ultimately the enhancement of enzyme production. This study is targeted on the identification of active regulatory RNA elements within the different phases of a productive fermentation process. Therefore samples from crucial stages of an industrial-oriented B. licheniformis subtilisin fermentation process have been examined by strand-specific RNA-Seq and differential RNA-Seq (dRNA-Seq) . A comprehensive analysis of the data revealed a multitude of RNA features which correlate to the physiology and the growth phases during the process. The combination of genomic data and RNA features provides an excellent basis to understand the regulatory events within an industrial fermentation process.
Results and discussion
B. licheniformis MW3Δspo, a germination deficient mutant of B. licheniformis DSM13, transformed with an expression plasmid encoding an alkaline serine protease, was grown in fed-batch mode in 6 L cultures. The fermentations were carried out in complex amino acid broth under conditions resembling the parameters used in industrial fermentation processes (Figure 1). To enhance the reliability of the analysis, the experiments were carried out in triplicate (L, R and M). Samples were taken at five selected time points of the fermentation process, which were chosen to follow the initial cell growth (sampling points I, II and III) and to determine the decisive changes within the early (IV) and the late stage (V) of the protease-producing states (Figure 1). Total RNA from each sample was prepared for strand-specific whole transcriptome sequencing . RNA from samples L-I to L-V was additionally prepared for differential RNA-Seq for determination of transcription start sites (TSS), as described by Sharma et al. .
Whole transcriptome sequencing
Strand-specific deep sequencing of the whole transcriptome of 15 B. licheniformis samples yielded more than 500 million reads with a specific length of 50 nucleotides. The number of reads for each library ranged from 2.4 × 107 to 4.3 × 107.
After the application of a strict quality processing (see Methods), 77.3 to 93.9% of these reads have been found to map to the chromosome and the expression plasmid used in this study (for details see Additional file 1: Figure S1 and Additional file 2: Table S1). Due to repeat regions, 1.45% of the B. licheniformis genome is not precisely mappable when considering the applied read length of 50 nucleotides. Thus, all reads mapping completely to such repeat regions have been excluded from further analysis. This pertains mainly to those 68.5 to 88.8% of reads which map to tRNA and rRNA genes. The majority of these rRNA matching reads can be assigned to 5S rRNA genes, which is in accordance with the fact that the applied depletion targets especially 16S and 23S rRNAs. Also, all reads mapping to the plasmid were removed from the dataset, as this analysis is focused on the transcriptional activity of the chromosome. Finally, 4.4 to 12.0% of the initial reads were taken for further analyses. These reads enabled the identification of transcriptional units and the determination of their boundaries to assign the transcriptional activity of coding as well as non-coding regions of the chromosome (see Methods).
To facilitate the comparison of different transcription levels between samples, we introduce the n ucleotide activity p er k ilobase of exon model per m illion mapped reads (NPKM) value as single nucleotide-resolution measure of transcriptional activity (see Methods). NPKMs for each RNA feature and for every gene were calculated and are available at Additional file 2: Table S2 and Table S3.
Transcription start site determination and operon prediction
Differential RNA-Seq (dRNA-Seq) has been designed by Sharma et al.  to allow selective enrichment of native 5′ ends of transcripts for the determination of transcription start sites (TSS). The method is based on the observation that 5′ triphosphorylated RNA fragments are originating from native 5′ ends. In contrast, 5′ monophosphorylated RNAs are products of RNA decay or processing and do not contain information of transcription initiation. The dRNA-Seq approach includes a treatment with 5′ phosphate-dependent exonuclease (TEX), which results in the depletion of all monophosphorylated transcripts. It has been shown that TSS identification based on dRNA-Seq data is superior to an estimation of transcript boundaries based on whole transcriptome RNA-Seq reads .
The differential sequencing of samples L-I to L-V resulted in 22,047,373 reads (Additional file 2: Table S4). A total of 2522 putative TSS was predicted (see Methods), 1500 of which were detected in at least two samples (Additional file 2: Table S5). A comparison of the latter with the transcript boundaries obtained by whole transcriptome sequencing (Additional file 2: Table S6) shows that 412 identified TSS confirm the RNA-Seq data, whereas the other findings introduce TSS not detectable by conventional RNA-Seq. To allow the assignment of the identified TSS to their putative origin, an allocation to four different classes was accomplished (Figure 2A) . Naturally, the affiliation of TSS according to this schema is ambiguous as some TSS sort to multiple classes, e. g. some TSS are located in a promoter region and within the upstream gene as well. The distribution of the identified TSS to each class is shown in Figure 2B. 1092 TSS were detected in promoter regions, 72 genes are bearing more than one putative TSS in this region. The dRNA-Seq data enabled conclusions for TSS determination in cases in which read-through transcription of the upstream gene caused by leaky termination prohibits the identification of downstream TSS by conventional RNA-Seq data. Of the identified 456 intragenic TSS, 267 are not located in the 500 bp promoter region of the downstream gene, reflecting a high number of putative internal promoters. Orphan TSS may indicate potential start sites of yet unknown genes or non-coding RNAs, this is supported by the finding that 76 of the 141 detected orphan TSS could be allocated to identified ncRNAs.
Operon prediction based on RNA-Seq and dRNA-Seq data resulted in 2510 putative operons structuring the genome of B. licheniformis (Additional file 2: Table S7). While most operons are monocistronic (66.8%) or bicistronic (18.3%), seven operons seem to encompass more than ten genes (Additional file 1: Figure S2). This small number of long operons is not in accordance with the operon prediction made by Kristoffersen et al.  for B. cereus. The difference is due to the varying operon concept employed here. Especially the consideration of internal TSS in combination with distinct shifts of expression resulted in an increase of shortened operons in this study.
The first genome annotation of B. licheniformis DSM13 has been published in 2004 . It has been shown previously that mapping of RNA-Seq data to genomes allows the correction of open reading frames and supports the identification of not-annotated protein genes . Therefore, we performed a complete reannotation of the genome in order to integrate the RNA-Seq data provided by this study as well as the progress in gene prediction and annotation of the recent years. Distinct transcription start sites determined by dRNA-Seq and RNA-Seq-based whole transcriptome data have been used to identify putative mis-annotated genes (Figure 3A). These findings were validated by length comparisons to genes deposited in public databases and confirmation of ribosomal binding sites and -10 and -35 promoter regions. This approach enabled the correction of reading frames of 23 protein genes, 25 pseudo genes, 21 rRNA genes and two tRNA genes (Additional file 2: Table S8). Moreover, 60 previously not-annotated protein genes were identified based on transcriptional activity and protein conservation (Figure 3B, Additional file 2: Table S9). 52 genes (Additional file 2: Table S10) were removed from the annotation as these previously predicted ORFs could not be verified by detailed genome analysis and comparisons to public databases. In total, the reannotation approach resulted in a dataset containing 4297 ORFs. Comparisons to the annotation of B. licheniformis DSM13 by Rey et al.  showed that 16 of the newly annotated genes have not been described for this organism before. 18 of the removed genes were annotated in both genomes. More than 2000 gene annotations have been improved. These improvements mainly comprise former hypothetical proteins now assigned to a function and proteins with altered gene symbols. In addition to gene-associated improvements, seven genomic regions were identified as prophage regions based on GC content deviations, significant similarities to known prophage genes and the presence of insertion repeats. The transcriptional activity of the prophage regions was rather low, which is consistent with the observation that many prophages are induced during SOS response, which should not occur within a fermentation process .
5′ and 3′untranslated regions
In this study, 1433 5′untranslated regions (Figure 4) with a mean length of 117 nt (Figure 5A, Additional file 2: Table S6) could be identified. Thirty of these 5′UTRs are shorter than 11 nt, implicating that leaderless transcription, commonly found in many bacteria , is not an abundant mechanism in B. licheniformis. Correspondingly, low occurrence of leaderless transcription has also been suggested for other members of the phylum Firmicutes. The most strongly transcribed 5’UTRs ≥150 nt and the 5’UTRs discussed in the following passage are listed in Table 1.
At sampling points I, II, and III, the gene of the sporulation inhibitor KapD (BLi03329) reveals a 5’UTR of 113 nt (Figure 6A), whereas an alternative, dRNA-Seq-supported 5’UTR (BLi_r2510) with a length of 2226 nt is present at the later stages of the fermentation process. The TSS of both 5’UTRs seem to be preceded by a σA recognition site. At sampling point IV, the transcriptional activity of the gene is higher than the activity of the 5’UTR region. In B. subtilis, growth phase-dependent differentiation into subpopulations of distinct cell types with different gene expression patterns is well described [40, 41]. The divergent expression levels of kapD and the long 5’UTR in B. licheniformis might therefore result from different usage of promoter sites dependent on the respective cell type. However, the observed effect could also derive from slow decay rates of the short form of the kapD mRNA transcribed earlier. 52 further 5’UTRs exhibited antisense activity towards upstream genes (A 5’UTR ; Figure 4), as shown for the untranslated region BLi_r1609 upstream of the glutamate synthase operon gltAB (BLi02161/62; Figure 6B). The observed 5’UTR is completely antisense to the gene of the corresponding transcriptional activator GltC (BLi02163). The dRNA-Seq data suggest the presence of only one TSS. This finding might be an example for a regulatory linkage between adjacent genes localized on different strands. This concept has recently been termed the excludon by Sesto et al. , who demonstrated that long 5’UTRs can act negatively on the transcription of the opposite gene. Following this idea in the case of the glutamate synthase operon, the preceding 5’UTR might establish a negative feedback regulation of the transcriptional activator GltC. A corresponding elongated UTR of the gltAB operon has not been found in B. subtilis[42, 43], which indicates different regulations of glutamate homeostasis in the two species.
Next to regulatory effects based on antisense orientation, 5’UTRs can bear intrinsic, so-called cis-regulatory elements . At the time of this study, 62 cis-regulatory elements have been predicted for B. licheniformis DSM13 by covariance models [45, 46]. All elements have been shown to be transcriptionally active during the fermentation process (Additional file 2: Table S11), although some are not located in 5’UTRs but in intergenic read-through regions. Three new T-boxes, located upstream of the serine acetyltransferase gene cysE and the tRNA ligase subunit genes glyQ and pheS, could be identified by comparison to the Rfam database. In B. subtilis, 92 cis-regulatory elements have been described , comprising RNA switches as well as protein-binding RNAs. For 76 of these instances, transcription could be shown at orthologous loci in B. licheniformis (Additional file 2: Table S12).
1365 3’UTRs (Additional file 2: Table S6) with an average length of 276 nt have been identified according to Figure 4, the most strongly transcribed 3’UTRs ≥150 nt and the 3’UTR discussed in this chapter are listed in Table 2. Of the identified 3’UTRs, 42% exceed 100 nt and 16% even exceed 500 nt in length (Figure 5B). In total, 338 3’UTRs are localized antisense to adjacent genes (A 3’UTR ; Figure 4). A detailed manual inspection revealed that all 3’untranslated regions longer than 1000 nt seem to be protruding after incomplete termination [18, 32]. Altogether, 684 3’UTRs with internal termination sites could be determined, whereas 511 3’UTRs end at predicted termination sites. These findings suggest that the effect of fading-out at the end of operons due to imperfect termination might be a common effect in B. licheniformis. An example is the 3965 nt 3’UTR (BLi_r2654) downstream of the cell envelope stress response operon liaIHGFSR (BLi03492-97; Figure 6C). The mRNA transcript of this operon protrudes beyond a termination signal, which is located directly behind the stop codon of liaR. This protruding mRNA sequence is antisense to the next four genes which comprise the germination receptor operon gerAAABAC (BLi03488-90) and a hypothetical protein (BLi03491). A second terminator structure can be found 370 nt upstream of the end of the transcript.
Non-coding RNA features
Non-coding RNAs were identified in non-coding regions of the chromosome, for example in intergenic regions or localized in antisense direction to protein genes (see Methods). The boundaries of the identified transcripts were determined by upshifts or downshifts of transcriptional activity. All identified RNA features were checked for similarities to complete protein genes as well as protein domains to ensure that they indeed represent non-coding RNAs.
To extract the basic types of ncRNA expression profiles during the examined fermentation process, cluster analysis based on the k-means algorithm  was applied to those 273 ncRNAs with highly reliable replicates. In total, 15 clusters of divergent expression profiles were generated (Figure 7, Additional file 2: Table S13). Cluster 1 contains 36% of the applied ncRNAs and 50% of all ncRNAs >1000 nt. It displays a strong up-shift of transcriptional activity at sampling point IV followed by a decrease at sampling point V. The high portion of transcripts in this cluster prompts the conclusion that RNA-based regulation is especially important during the later stages of the fermentation process. Other ncRNAs exhibiting up-shifts of transcriptional activity are displayed in clusters 2 to 4, whereas clusters 5 to 8 include transcripts with activity down-shifts. The further clusters comprise ncRNAs with expression shifts during the early fermentation process, as well as an activity up-shift at sampling point V in clusters 10 to 12.
All assigned non-coding RNAs were categorized according to the scheme displayed in Figure 4 and subdivided into the classes A 5 , A 3 , A I , A misc and indep. Selected ncRNAs are listed in Table 3, whereas an overview of all identified features is given in Additional file 2: Table S6. Several ncRNAs have been selected for validation by Northern blotting (Additional file 1: Figure S3). The analyzed ncRNAs were chosen as they are exemplarily for their respective class. The occurrence of eight ncRNAs could be verified, especially ncRNAs <500 nt are in good accordance with the results gained by RNA-Seq. The results for transcripts >2000 nt are indicative for RNA degradation or processing and leaky transcription termination. However, three ncRNAs could not be validated, which is most probably due to their low expression levels.
As depicted in Figure 4, indep transcripts are defined as non-coding RNAs not localized antisense to any mRNA. Instead they can be found in intergenic regions or any other position of the chromosome. In total, 53 indep RNAs with sizes between 51 and 602 nt have been identified, of which 40 have TSS verified by dRNA-Seq. Within this group five housekeeping sRNAs could be annotated: the tmRNA SsrA (BLi_r2758), the 6S RNAs BsrA (BLi_r2163) and BsrB (BLi_r1454), the RNA component of RNase P RnpB (BLi_r1808) and the signal recognition particle Scr (BLi_r0016) . 87% of the indep transcripts exhibited NPKM values ≥100 in at least three samples, reflecting a strong transcriptional activity of the encoding genomics regions. For example the sRNA Scr, an essential part of the protein secretion system , reaches a maximal NPKM value of almost 400,000. This is in perfect accordance with the fact that the cells are derived from a fermentation process optimized for protein secretion. Interestingly, 39 indep ncRNAs seem to be transcribed constitutively under the examined conditions (Figure 8A), whereas only thirteen indep RNAs show differential expression (likelihood value ≥0,99) ), as illustrated exemplarily for BLi_r1424 (Figure 8B).
In contrast to the class of indep ncRNAs, the antisense ncRNAs (asRNAs) A I , A 3 , A 5 , and A misc comprise non-coding transcripts localized antisense to annotated protein-coding genes. They either target the protein-coding region of a gene (A I ) or the 5’ and 3’ untranslated regions (A 5 and A 3 ). Furthermore, ncRNAs that target more than one gene or that are only partially antisense are classified (A misc ). In this study, 242 A misc RNAs (Figure 8C/D) could be identified. Approximately 150 of them (and also all A 5 ncRNAs) are located opposite to ribosome binding sites and could therefore function as inhibitors of translation, a very common mechanism of cis-encoded asRNAs . The length distribution of the non-coding RNA features is shown in Figure 5C, and illustrates that 42% of the A misc RNAs are less than 400 nt in length. Twenty-seven of these short A misc RNAs reach maximal NPKM values ≥100, suggesting putative sRNA mechanisms. However, some A misc RNAs are much longer, i.e. BLi_r2246 is 6348 nt in length and spans six genes. The occurrence of antisense transcripts of such length is not unexpected, as asRNAs with very diverse sizes, reaching more than 7000 nt, have been described for several species . Furthermore, 146 A I transcripts (Figure 8E) could be assigned, ranging in size from 54 to 1572 nt. Over 95% of the A I transcripts exhibit maximal NPKM values ≤100, 68% even ≤20, due to the low coverage only 20 TSS could be verified by dRNA-Seq for these asRNAs. In total, 408 non-coding asRNAs were determined, comprising 89% of all identified non-coding RNA transcripts and targeting 15% of all genes. The number of identified antisense ncRNAs is in accordance to previous studies which assume that antisense transcription concerns ~10 to 20% of the bacterial genes .
Antisense transcripts with putative impact on productivity
The general aim of our group is the identification of productivity related features. Thus, a special focus has been set on the identification of antisense transcripts with a putative impact on protease production as targets for strain improvement.
The alkaline serine protease Subtilisin Carlsberg (apr, BLi01109) represents the major secreted feeding protease of B. licheniformis. Thus, it competes for energetic and secretory resources with the production protease. We identified a cis-encoded 144 nt A I asRNA (BLi_r0872) which is located at the 3’end of the apr mRNA (Figure 9). A highly active TSS determined from the dRNA-Seq data and a terminator structure downstream of the adjacent gene yhfN confirm the characterization of the transcript as independently transcribed asRNA. BLi_r0872 is highly expressed at all fermentation stages, whereas the transcriptional activity of the Subtilisin Carlsberg gene increases at the productive stages of the process. The presence of the cis-encoded asRNA opposite to the 3’end of the target mRNA resembles the B. subtilis RatA/txpA toxin/antitoxin system or the Escherichia coli GadY/gadX system in which an antisense RNA promotes either mRNA degradation or stability . To elucidate the impact of the detected asRNA, further analyses will be necessary, especially as a corresponding transcript is absent in the transcriptome of B. subtilis.
Further antisense transcripts against genes involved in cell differentiation, cell stress response, and thiamine and folate biosynthesis could be observed and are presented in Additional file 1: Figure S4.
It is exciting to think about a regulatory impact of the mentioned ncRNAs, but there are also some noteworthy limitations to putative effects. (i) The completed RNA-Seq experiments cannot discern if the sense and the antisense transcript are transcribed in the same type of differentiated cells, which especially challenges stoichiometric estimations of asRNAs and their mRNA targets. Whether they can influence each other or fulfill different purposes in different cell types has to be a topic of single cell targeted investigations. (ii) It has been reported that functional sRNAs are produced in excess amounts over the targeted mRNA [16, 51]. Therefore, a regulatory mechanism of poorly transcribed antisense RNA cannot be assumed bona fide, but has to be evaluated carefully. Nonetheless, our data implicate that there might be a biological function assignable to the RNA features, especially when they are conserved within related species as B. subtilis. (iii) At last, it has to be experimentally excluded, especially for low abundant instances, that the found ncRNAs originate from spurious transcriptional events, for instance driven by alternative sigma factors .
In total, we determined 461 candidate non-coding RNA transcripts, including antisense, as well as indep ncRNAs (see Non-coding RNA features). For Synechocystis sp. PCC6803, Sinorhizobium meliloti and the archaea Sulfolobus solfataricus P2 and Methanosarcina mazei Gö1 between 50 and 107 non-coding RNAs per Mb were identified [31, 52–54], matching our result of 109 ncRNAs/Mb. For B. subtilis, the close relative of B. licheniformis, Nicolas et al.  have found 472 non-coding RNA features in a tiling array-based, condition-dependent transcriptome study. The majority (68%) of these features are intergenic transcripts determined by promoter analysis, whereas only 32% are derived from independently transcribed (antisense) RNAs. In contrast, the majority of ncRNAs identified in B. licheniformis are antisense RNAs (89%), transcribed independently from protein-coding genes. The identification of more antisense transcripts in B. licheniformis might be accounted to the reduced background noise in RNA-Seq in comparison to tiling arrays, which allows a better detection of low abundant transcripts . 167 of the B. licheniformis ncRNAs are located in regions with high sequence similarity to B. subtilis and 126 ncRNAs are encoded at the frontiers of conserved and not conserved regions of the two genomes. Based on sequence similarity, only 43 (Additional file 2: Table S14) out of the, in total, 293 ncRNAs located in these regions seem to occur in the B. subtilis transcriptome , emphasizing the differences of the two closely related species. Comparisons to two earlier B. subtilis transcriptome studies show similar low levels of accordance [56, 57]. However, as mentioned above, it is also possible that the identified antisense ncRNAs partly derive from spurious transcription events , and hence do not introduce a species-specific effect.
For B. subtilis, 22 sRNAs have been validated experimentally [43, 58]. Comparison to Rfam and/or comparison of genomic locations facilitated the detection of eleven of these sRNAs in the transcriptome of B. licheniformis (Additional file 2: Table S15). These include, in addition to the mentioned five housekeeping sRNAs , two regulatory RNAs with well-known function in B. subtilis: SR1 and RnaA [59, 60]. The other RNAs found in B. licheniformis are BsrI, CsfG, SurC and RsaE [61–64]. The B. subtilis sRNAs which could not be confirmed in B. licheniformis originate from loci with no conserved gene pattern in this organism and thus may contribute to the differences between the two species. Jahn et al.  described the toxin-antitoxin system BsrG/SR4 located in the SPβ prophage region of B. subtilis. Although B. licheniformis does not harbor a homolog of the SPβ prophage, two distinct transcripts were found to encode peptides similar to the BsrG toxin (Additional file 1: Figure S5). Additionally, the transcriptional activity of the corresponding loci revealed pairs of overlapping transcripts from both strands (Figure 3 and Additional file 1: Figure S5) as shown for the BsrG/SR4 type toxin-antitoxin system. Therefore both newly identified ORFs were annotated as BsrG-like peptides (BLi05015 and BLi05038). Furthermore, the antisense transcripts (indep RNAs BLi_r1034 and BLi_r2780) resemble the SR4 antitoxin, especially in stem loops SL3, SL4 and TSL  directly antisense to the BsrG-encoding mRNA.
The presented study generated substantial data on the transcriptional activity of B. licheniformis within five relevant growth stages of an industrial-oriented fermentation process. A detailed analysis of the transcriptome data enabled us to accomplish a high quality functional genome reannotation of B. licheniformis DSM13 (Figure 10, Ring 4). The integration of the reannotation and the transcriptionally active regions (Figure 10, Ring 1&2) resulted in the identification and quantification of hundreds of RNA based regulatory elements as well as protein encoding genes.
In total, 3314 RNA features have been sorted into ten functional classes (Figure 4). 1433 5’UTRs and 1365 3’UTRs (Figure 10, Ring 8) as well as 461 ncRNAs (Figure 10, Ring 7) and 55 antisense intergenic read-through (A rt ) transcripts have been identified. A striking observation was the identification of 855 RNA features, which mapped antisense to annotated genomic features (Figure 10, Ring 6). Notably antisense RNA features have been found in each of the functional classes and include transcripts of a size range from 38 to 6348 base pairs in length. We have identified both: constitutively as well as growth phase dependently expressed RNA features.
Our data represent a solid amount of knowledge on regulatory elements which orchestrate the cellular activities of B. licheniformis during the succession of growth phases within a productive fermentation. To generate an overview of the functional diversity of the identified RNA features, all instances have been screened against the Rfam database. This approach resulted in hits to experimentally well characterized RNA features known from B. subtilis and other relatives, as well as in a multitude of so far unknown RNA features without any Rfam hit. The knowledge on genes and regulatory RNA features which are transcriptionally active during an industrial-oriented fermentation enables an excellent access to a rational strain design approach for the optimization of B. licheniformis as industrial workhorse. Especially the regulatory features which represent differences to the model organism B. subtilis give new insights to the still open question what makes strains of the species B. licheniformis superior to B. subtilis strains in terms of protease production capacity in industrial applications . In the future it may be promising to correlate the transcriptional activity of the RNA features to the corresponding protein expression patterns.
Bacterial strain and fermentation conditions
Bacillus licheniformis MW3Δspo (kindly provided by F. Meinhardt and St. Wemhoff, University of Münster) was used for the fermentation experiments. B. licheniformis MW3Δspo is a derivate of the B. licheniformis wild type strain DSM13, bearing three deletions: ΔhsdR and ΔhsdR2 coding for restriction endonucleases  and ΔyqfD to prevent the production of viable spores and thus the long-term contamination of the used fermenters .
Fermentation was carried out for 46 h in aerated 16 L fermenters with a culture volume of 6 L at 39°C. Medium contained 12% w/v of a complex nitrogen source, 57 mM KH2PO4, 21 mM (NH4)2SO4, 0.53 mM Mn(II)SO4, 0.17 mM Fe(II)SO4, 2.0 mM CaCl2 * 2 H2O, 5.7 mM MgSO4, 0.4% v/v PPG200, 0.03 mM tetracycline and 3% w/v glucose. The pH value was regulated to a set point of 7.9 with sodium hydroxide solution. Glucose-feed was started after exceeding the point of biphasic growth.
RNA isolation and preparation
5 mL of the harvested cells were mixed with 5 mL of RNAprotect Bacteria Reagent (Qiagen) directly upon sampling. After 10 min incubation at room temperature the samples were centrifuged at 4500× g, the supernatant was removed, the sample was snap-frozen in liquid nitrogen and finally stored at -80°C. The cells were separated from the remainders of the fermentation broth by washing repeatedly with Buffer RLT (Qiagen). Subsequent RNA isolation was carried out with a modified protocol of the RNeasy Midi Kit (Purification of RNA including small RNAs using the RNeasy Midi Kit RY39 Apr-09, Qiagen) to retain short RNAs. The cells were disintegrated with the ball mill Mikro-Dismembrator U (B. Braun Biotech) in 400 μL Buffer RLT and afterwards resuspended in 1.4 mL Buffer RLT and 2.7 mL pure ethanol. The initial washing step of the column was done using 4 mL Buffer RWT (Qiagen). The DNA was digested successively with two different DNases (TURBO™ DNase, Ambion and DNase I recombinant, Roche), with a purification step after the first treatment. Purification was performed with a protocol adapted for small RNA purification of the RNeasy MinElute Cleanup Kit (Qiagen). Instead of 250 μL, 675 μL pure ethanol were added to the RNA before binding to the column to shift the binding capacity of the column. A control PCR with 35 cycles was conducted to confirm complete DNA removal. Depletion of rRNA was obtained using the MICROBExpress™ Bacterial mRNA Enrichment Kit (Ambion) according to manufacturer’s instructions. The following purification step was also carried out with the described adaption to the RNeasy MinElute Cleanup Kit.
Library construction and sequencing
cDNA libraries were prepared by vertis Biotechnologie AG, Germany (http://www.vertis-biotech.com). For whole transcriptome libraries [28, 33] the RNA samples were fragmented by ultrasound and dephosphorylated with Antarctic phosphatase. After polynucleotide kinase treatment the RNA was poly(A)tailed and an RNA adapter was ligated to the 5’phosphate. cDNA synthesis was accomplished by the use of poly(T) adapters and M-MLV reverse transcriptase. The subsequent PCR was carried out with cycle numbers between nine and twelve. The construction of the libraries for the dRNA-Seq was performed as described by Sharma et al. , supplemented by an additional treatment with polynucleotide kinase after the fragmentation step to allow removal of fragments previously not phosphorylated. The samples were incubated with Terminator™ 5’-Phosphate-Dependent Exonuclease (Epicentre) and a poly(A) tail was ligated to the 3’end of the transcripts. Hereafter an incubation step with tobacco acid pyrophosphatase and the ligation of an RNA adapter to the 5’end was conducted. Reverse transcription was processed as described above, the cycle numbers of the following PCR were 14 or 15. The RNA-Seq libraries as well as the libraries for dRNA-Seq were size fractioned in the range of 200 to 400 nt on agarose gels and then sequenced on an Illumina HiSeq 2000 machine with a read length of 50 nt.
In silico sequence read processing
Initially, all sequence reads mapping to B. licheniformis rRNA and tRNA genes according to BLAST analysis were removed. The remaining reads were processed in a multi-step procedure to ensure the reliability of the read mappings used for the analysis of the transcriptional activity of the genome and to estimate the quality of the RNA-Seq data. All reads which mapped over the full read length of 50 bases with 98% or sequence identity were used for further analyses. Additionally, a distinct bit score was required to ensure an unambiguous assignment to one locus. All discarded reads were screened with relaxed similarity quality criteria vs. the B. licheniformis genome. 75% of these reads generated hits and were therefore assigned as bad quality B. licheniformis reads. The remaining reads (approximately 3% of the total generated sequence) cannot be mapped on the genome. A detailed sequence analysis of these unmappable reads revealed that they mainly contain poly(A) tails or concatenated adapters and therefore represent methodic artifacts. All datasets were depleted for plasmid-mapping reads and have been deposited in NCBI Sequence Read Archive database under accession number SRP018744 (Additional file 2: Table S16).
To obtain the maximal number of features, a dataset containing all reads of the 15 samples was prepared and is referred to as pooled RNA-Seq data. To reduce putative background noise, all reads with coverage of one and no intersecting or adjacent reads were omitted prior to combination of the datasets. This is done to reduce transcriptional activity that was not replicated within a dataset in order to avoid incorrect extension of predicted features (e.g. transcriptional activity from leaky termination masking or extending 5’UTR extensions due to overlap). The generation of five datasets describing each sampling point was processed accordingly.
Expression strength values
The analytical methods used to process the 15 generated RNA-Seq datasets require the use of single nucleotide activities instead of read mappings. This makes RPKMs  inapplicable as a measure of transcriptional activity. Instead, we defined the n ucleotide activity p er k ilobase of exon model per m illion mapped reads (NPKM) value. An NPKM is defined as:
Where n and m are the start and stop of the region of interest, f(i) is the base activity of base i on a specific strand and g(i) is the sum of the activities of base i of positive and negative strands.
NPKM values are a derivate of RPKMs , adapted to per base nucleotide activities. They are designed to be functionally equivalent to RPKMs, albeit they are more accurate due to the single base-resolution. We are aware that RPKMs and therefore NPKMs do not account for sequencing-based bias . Although sequencing-based bias produces some local errors, the overall comparability of active genomic regions is still possible.
5’ and 3’ UTRs were considered as regions of continuous, non-interrupted transcriptional activity upstream or downstream of annotated genomic features, respectively. The boundary of an identified 5’UTRs was set at the point of the rising of the continuous transcript from zero transcriptional activity. The boundary of a 3’ UTRs was accordingly set at the point of the downshift of the continuous transcript to zero transcriptional activity.
The analysis of 5’ and 3’untranslated regions was aimed to find the longest UTR, as the longest transcript should cover all possible alternative UTRs and contain all transcribed regulatory elements. Therefore, the computational analysis was based on the pooled RNA-Seq data. Few 5’ and 3’UTRs were manually extended on account of adjacent transcripts which are only separated from the UTR by a very short downshift and potentially are part of the UTR. To exclude that the resulting UTRs correspond to previously not annotated protein genes, searches versus the InterPro and the UniProtKB/Swiss-Prot databases were performed [70, 71].
5’ and 3’UTRs which are antisense to an adjacent gene on the opposite strand were classified as A 5’UTR and A 3’UTR . The respective UTRs were computationally examined and assigned to be antisense when their overlap to an opposite gene exceeded 100 nt in length.
Intergenic read-through transcripts localized antisense to an opposite gene were determined manually and classified as A rt .
Non-coding RNA features
The RNA-Seq data were scanned for transcriptionally active regions that were clearly separated from the transcripts corresponding to any annotated gene or its untranslated regions. This primary computational search identified transcripts which were either located on the opposite strand of a protein-coding gene, in intergenic regions or any other region of the chromosome. The boundaries of the identified transcripts were set to those nucleotides with the first and last occurrence of transcriptional activity higher than zero of the corresponding transcriptional unit. NPKM values for the resulting loci were generated from each of the 15 datasets. Subsequently, all results from the computational search were evaluated as depicted in Additional file 1: Figure S6A to approve the reliability of the identified ncRNAs. Searches vs. the InterPro and the UniProtKB/Swiss-Prot databases were performed to exclude the possibility that the resulting non-coding RNA features correspond to non-annotated protein genes [70, 71]. Subsequently, the non-coding transcripts were subdivided into the ncRNA classes described in Figure 4 (A 3 , A 5 , A I , A misc , and indep).
The class indep comprises all identified ncRNAs that are not located antisense to any protein-coding gene or its respective untranslated regions. Several transcripts of this category were added manually as this class comprises RNA transcripts which could not clearly be distinguished from surrounding mRNAs by complete down-shifts of transcriptional activity, but were detected by their remarkably higher abundance.
The categories A 3 , A 5 , A I and A misc comprise ncRNAs which are localized antisense to protein-coding genes or their respective untranslated regions. The class A I contains all ncRNAs with an antisense localization solely towards a protein-coding gene. The class A 5 contains all ncRNAs with an antisense localization solely towards the 5’UTR of an opposite mRNA. The class A 3 contains all ncRNAs with an antisense localization solely towards the 3’UTR of an opposite mRNA. The class A misc contains all ncRNAs with an antisense localization towards more than one protein-coding gene and all ncRNAs which are only partially antisense to an mRNA transcript.
Analysis of dRNA-Seq reads
Transcriptional start sites were determined by the identification of significant increases of the log-scaled expression strength of the dRNA-Seq data from succeeding bases greater than ln 4. The reference value of ln 4 was empirically determined based on the observation that ln 4 represents the smallest expression strength increase for TSS present across all samples of one sampling point. In a second step, all TSS in promoter regions of rRNA or tRNA genes and all TSS being apart less than 20 bp were excluded. TSS matching the boundaries of RNA-Seq predicted 5’UTRs or ncRNAs were determined accordingly to the flow chart depicted in Additional file 1: Figure S6B.
Additionally, the gained RNA-Seq data were used to generate logarithmic scaled, color coded graphs representing strand-specific transcription.
Operon predictions based on whole transcriptome sequencing, dRNA-Seq transcription start sites, and operon and transcription terminator site determination with DOOR , OperonDB , and TransTermHP . Operon predictions were curated manually as described by Sharma et al. , regarding especially level shifts in transcriptional activity.
Functional reannotation was carried out using the ERGO software tool (Integrated Genomics, Chicago, USA)  and the IMG/ER (Integrated Microbial Genomes/Expert Review) system . Subsequent manual curation was based on the results of a bidirectional BLAST analysis comprising B. subtilis, B. pumilus and related, manually annotated organisms, the comparisons to UniProtKB/Swiss-Prot and UniProtKB/TrEMBL databases  and the analysis of functional domains with InterProScan . The annotation of new genes and the correction of reading frames was based on transcriptional activity and was performed upon analysis of GC frame plots, ribosome-binding sites and -10 and -35 promoter regions using Artemis v12  and comparisons to UniProtKB/Swiss-Prot, UniProtKB/TrEMBL, and InterProScan [70, 71]. The removal of gene annotations relied on the combined evaluation of GC frame plots, ribosome-binding sites and -10 and -35 promoter regions using Artemis v12  and comparisons to UniProtKB/Swiss-Prot, UniProtKB/TrEMBL, and InterProScan [70, 71]. The absence of transcriptional activity was not used to support the removal of gene annotations. Prophage regions have been annotated by an initial bioinformatic search using Prophagefinder  followed by manual evaluation of the candidate regions. Based on the existence of GC content deviations, genes in these regions with significant similarities to known prophages and the identification of insertion repeats, genomic regions were assigned as prophages. The annotation followed the principles of prophage annotation outlined by Casjens . The reannotated data set has been used to update the B. licheniformis DSM13 genome data initially submitted by Veith et al.  and is now available at NCBI under accession number AE017333.1.
Clustering of ncRNAs
Cluster analysis to elucidate the fundamental types of ncRNA expression profiles was performed based on the respective NPKM values (Additional file 2: Table S2). To ensure that the data of each replicate are sufficiently reliable, t-tests were performed with MeV . For at least three out of the five samples, the respective ncRNA had to have a P value <0.15 to be taken into further analysis, as described by Koburger et al. . Furthermore, all ncRNAs taken into analysis had to have a minimal NPKM value >10. Means of the replicates of each sampling point were built and z-score transformation was performed. The number of clusters was determined by Figure of merit (FOM) analysis, which basically is an estimate of the predictive power of a clustering algorithm . Clusters were generated by employing k-means clustering  with Euclidian distances in the MeV software  and subsequent manual curation.
Utilized software and databases
ACT and Mauve
The comparison of RNA features from B. licheniformis with the reference genome B. subtilis was based on sequence similarity analyzed with ACT v11, the Artemis comparison tool . Quantification of ncRNAs located in conserved or not-conserved loci, was done employing the progressive Mauve alignment tool .
Determination of constitutive or differential expression of the RNA features was employed with baySeq , which uses an empirical Bayes approach assuming a negative binomial distribution and is capable of dealing with multi-group experimental designs. Input data were generated by counting the reads referring to every gene.
DOOR and OperonDB
The determination of the genome mappability was calculated for a read length of 50 nt with the Gem mappability program .
Cluster analysis was performed using the Multiexperiment Viewer v4.8 .
Transcription terminators pre-computed with TransTermHP v2.07 were gratefully downloaded from transterm.cbcb.umd.edu . 3’UTRs were checked for terminators as described by Martin et al. . Terminators were considered as internal if they were located at least 50 nt upstream of the end of the transcript.
Northern blot analysis
B. licheniformis DSM13 was cultivated at 37°C and 160 rpm in a 5 L Erlenmeyer flask on defined minimal medium . Cells were harvested at OD600 1 and 4.5 and after having reached the stationary phase for at least 2 h. Escherichia coli DH5α was cultivated in Luria broth at 37°C and 180 rpm to an OD600 of 2. RNA was isolated as described in RNA isolation and preparation. Digoxigenin-labeled RNA probes were prepared by in vitro transcription with T7 RNA polymerase (DIG Northern Starter Kit, Roche). Templates for in vitro transcription were generated by PCR using primer pairs (Additional file 2: Table S17) containing a primer flanked with the T7 promoter sequence. Gel electrophoresis of the RNA was carried out using a 1% agarose formaldehyde MOPS gel  with 100 V applied for 2,5 h. RNA was transferred to the membrane (Nylon Membranes, positively charged, Roche) via vacuum blotting with the Amersham VacuGene XL Vacuum Blotting System (GE Healthcare) using the recommended protocol. The RNA probe hybridization pro-cedure was performed following the manufacturer’s instructions (DIG Northern Starter Kit, Roche). Detection was accomplished with ChemoCam Imager (Intes). RiboRuler High Range RNA Ladder (Thermo Scientific) ranging from 200 to 6000 nt was used as RNA marker.
Differential RNA sequencing
Nucleotide activity per kilobase of exon model per million mapped reads
Open reading frame
Polymerase chain reaction
Revolutions per minute
Tobacco Acid Pyrophosphatase
Terminator™ 5′-Phosphate-Dependent Exonuclease
Transcription start sites
Volume per volume
Weight per volume.
Veith B, Herzberg C, Steckel S, Feesche J, Maurer K-H, Ehrenreich P, Bäumer S, Henne A, Liesegang H, Merkl R, Ehrenreich A, Gottschalk G: The complete genome sequence of Bacillus licheniformis DSM13, an organism with great industrial potential. J Mol Microbiol Biotechnol. 2004, 7: 204-211. 10.1159/000079829.
Schallmey M, Singh A, Ward O: Developments in the use of Bacillus species for industrial production. Can J Microbiol. 2004, 50: 1-17. 10.1139/w03-076.
Maurer K-H: Detergent proteases. Curr Opin Biotechnol. 2004, 15: 330-334. 10.1016/j.copbio.2004.06.005.
Çalık P, Takac S, Çalık G, Özdamar T: Serine alkaline protease overproduction capacity of Bacillus licheniformis. Enzyme Microb Tech. 2000, 26: 45-60. 10.1016/S0141-0229(99)00126-X.
Çalık P, Çalık G, Takaç S, Ozdamar TH: Metabolic flux analysis for serine alkaline protease fermentation by Bacillus licheniformis in a defined medium: effects of the oxygen transfer rate. Biotechnol Bioeng. 1999, 64: 151-167. 10.1002/(SICI)1097-0290(19990720)64:2<151::AID-BIT4>3.0.CO;2-U.
Degering C, Eggert T, Puls M, Bongaerts J, Evers S, Maurer K-H, Jaeger K-E: Optimization of protease secretion in Bacillus subtilis and Bacillus licheniformis by screening of homologous and heterologous signal peptides. Appl Environ Microbiol. 2010, 76: 6370-6376. 10.1128/AEM.01146-10.
Gupta R, Beg QK, Lorenz P: Bacterial alkaline proteases: molecular approaches and industrial applications. Appl Microbiol Biotechnol. 2002, 59: 15-32. 10.1007/s00253-002-0975-y.
Nielsen AK, Breüner A, Krzystanek M, Andersen JT, Poulsen TA, Olsen PB, Mijakovic I, Rasmussen MD: Global transcriptional analysis of Bacillus licheniformis reveals an overlap between heat shock and iron limitation stimulon. J Mol Microbiol Biotechnol. 2010, 18: 162-173. 10.1159/000315457.
Rey M, Ramaiya P, Nelson N, Brody-Karpin S: Complete genome sequence of the industrial bacterium Bacillus licheniformis and comparisons with closely related Bacillus species. Genome Biol. 2004, 5: R77-10.1186/gb-2004-5-10-r77.
Waschkau B, Waldeck J, Wieland S, Eichstädt R, Meinhardt F: Generation of readily transformable Bacillus licheniformis mutants. Appl Microbiol Biotechnol. 2008, 78: 181-188. 10.1007/s00253-007-1278-0.
Nahrstedt H, Waldeck J, Gröne M, Eichstädt R, Feesche J, Meinhardt F: Strain development in Bacillus licheniformis: Construction of biologically contained mutants deficient in sporulation and DNA repair. J Biotechnol. 2005, 119: 245-254. 10.1016/j.jbiotec.2005.04.003.
Madan Babu M, Teichmann SA, Aravind L: Evolutionary dynamics of prokaryotic transcriptional regulatory networks. J Mol Biol. 2006, 358: 614-633. 10.1016/j.jmb.2006.02.019.
Romby P, Charpentier E: An overview of RNAs with regulatory functions in gram-positive bacteria. Cell Mol Life Sci. 2010, 67: 217-237. 10.1007/s00018-009-0162-8.
Storz G, Vogel J, Wassarman KM: Regulation by small RNAs in bacteria: expanding frontiers. Mol Cell. 2011, 43: 880-891. 10.1016/j.molcel.2011.08.022.
Gottesman S, Storz G: Bacterial small RNA regulators: versatile roles and rapidly evolving variations. Cold Spring Harb Perspect Biol. 2011, 3: a003798-10.1101/cshperspect.a003798.
Waters L, Storz G: Regulatory RNAs in bacteria. Cell. 2009, 136: 615-628. 10.1016/j.cell.2009.01.043.
Schmiedel JM, Axmann IM, Legewie S: Multi-target regulation by small RNAs synchronizes gene expression thresholds and may enhance ultrasensitive behavior. PLoS One. 2012, 7: e42296-10.1371/journal.pone.0042296.
Georg J, Hess WR: cis-antisense RNA, another level of gene regulation in bacteria. Microbiol Mol Biol R. 2011, 75: 286-300. 10.1128/MMBR.00032-10.
Brantl S: Regulatory mechanisms employed by cis-encoded antisense RNAs. Curr Opin Microbiol. 2007, 10: 102-109. 10.1016/j.mib.2007.03.012.
Sesto N, Wurtzel O, Archambaud C, Sorek R, Cossart P: The excludon: a new concept in bacterial antisense RNA-mediated gene regulation. Nat Rev Microbiol. 2013, 11: 75-82.
Sorek R, Cossart P: Prokaryotic transcriptomics: a new view on regulation, physiology and pathogenicity. Nat Rev Genet. 2010, 11: 9-16.
Güell M, Yus E, Lluch-Senar M, Serrano L: Bacterial transcriptomics: what is beyond the RNA horiz-ome?. Nat Rev Microbiol. 2011, 9: 658-669. 10.1038/nrmicro2620.
Breaker RR: Prospects for riboswitch discovery and analysis. Mol Cell. 2011, 43: 867-879. 10.1016/j.molcel.2011.08.024.
Araujo PR, Yoon K, Ko D, Smith AD, Qiao M, Suresh U, Burns SC, Penalva LOF: Before It Gets Started: Regulating Translation at the 5’ UTR. Comp Funct Genom. 2012, 2012: 475731-
Zoll J, Heus H, van Kuppeveld F, Melchers W: The structure-function relationship of the enterovirus 3’-UTR. Virus Res. 2009, 139: 209-216. 10.1016/j.virusres.2008.07.014.
Kaberdin VR, Bläsi U: Translation initiation and the fate of bacterial mRNAs. FEMS Microbiol Rev. 2006, 30: 967-979. 10.1111/j.1574-6976.2006.00043.x.
Evguenieva-Hackenberg E, Klug G: New aspects of RNA processing in prokaryotes. Curr Opin Microbiol. 2011, 14: 587-592. 10.1016/j.mib.2011.07.025.
Vivancos AP, Güell M, Dohm JC, Serrano L, Himmelbauer H: Strand-specific deep sequencing of the transcriptome. Genome Res. 2010, 20: 989-999. 10.1101/gr.094318.109.
Arnvig KB, Comas I, Thomson NR, Houghton J, Boshoff HI, Croucher NJ, Rose G, Perkins TT, Parkhill J, Dougan G, Young DB: Sequence-based analysis uncovers an abundance of non-coding RNA in the total transcriptome of Mycobacterium tuberculosis. PLoS Pathog. 2011, 7: e1002342-10.1371/journal.ppat.1002342.
Vockenhuber M-P, Sharma CM, Statt MG, Schmidt D, Xu Z, Dietrich S, Liesegang H, Mathews DH, Suess B: Deep sequencing-based identification of small non-coding RNAs in Streptomyces coelicolor. RNA Biol. 2011, 8: 468-477. 10.4161/rna.8.3.14421.
Schlüter J-P, Reinkensmeier J, Daschkey S, Evguenieva-Hackenberg E, Janssen S, Jänicke S, Becker JD, Giegerich R, Becker A: A genome-wide survey of sRNAs in the symbiotic nitrogen-fixing alpha-proteobacterium Sinorhizobium meliloti. BMC Genomics. 2010, 11: 245-10.1186/1471-2164-11-245.
Sharma CM, Hoffmann S, Darfeuille F, Reignier J, Findeiss S, Sittka A, Chabas S, Reiche K, Hackermüller J, Reinhardt R, Stadler PF, Vogel J: The primary transcriptome of the major human pathogen Helicobacter pylori. Nature. 2010, 464: 250-255. 10.1038/nature08756.
Passalacqua KD, Varadarajan A, Ondov BD, Okou DT, Zwick ME, Bergman NH: Structure and complexity of a bacterial transcriptome. J Bacteriol. 2009, 191: 3203-3211. 10.1128/JB.00122-09.
Dötsch A, Eckweiler D, Schniederjans M, Zimmermann A, Jensen V, Scharfe M, Geffers R, Häussler S: The Pseudomonas aeruginosa transcriptome in planktonic cultures and static biofilms using RNA sequencing. PLoS One. 2012, 7: e31092-10.1371/journal.pone.0031092.
Kristoffersen SM, Haase C, Weil MR, Passalacqua KD, Niazi F, Hutchison SK, Desany B, Kolsto A-B, Tourasse NJ, Read TD, Okstad OA: Global mRNA decay analysis at single nucleotide resolution reveals segmental and positional degradation patterns in a Gram-positive bacterium. Genome Biol. 2012, 13: R30-10.1186/gb-2012-13-4-r30.
Wurtzel O, Sesto N, Mellin JR, Karunker I, Edelheit S, Bécavin C, Archambaud C, Cossart P, Sorek R: Comparative transcriptomics of pathogenic and non-pathogenic Listeria species. Mol Syst Biol. 2012, 8: 1-14.
Lemire S, Figueroa-Bossi N, Bossi L: Bacteriophage crosstalk: coordination of prophage induction by trans-acting antirepressors. PLoS Genet. 2011, 7: e1002149-10.1371/journal.pgen.1002149.
Zheng X, Hu G-Q, She Z-S, Zhu H: Leaderless genes in bacteria: clue to the evolution of translation initiation mechanisms in prokaryotes. BMC Genomics. 2011, 12: 361-10.1186/1471-2164-12-361.
Nakagawa S, Niimura Y, Miura K, Gojobori T: Dynamic evolution of translation initiation mechanisms in prokaryotes. Proc Natl Acad Sci U S A. 2010, 107: 6382-6387. 10.1073/pnas.1002036107.
Veening J-W, Igoshin OA, Eijlander RT, Nijland R, Hamoen LW, Kuipers OP: Transient heterogeneity in extracellular protease production by Bacillus subtilis. Mol Syst Biol. 2008, 4: 184-
Shank EA, Kolter R: Extracellular signaling and multicellularity in Bacillus subtilis. Curr Opin Microbiol. 2011, 14: 741-747. 10.1016/j.mib.2011.09.016.
Gunka K, Commichau FM: Control of glutamate homeostasis in Bacillus subtilis: a complex interplay between ammonium assimilation, glutamate biosynthesis and degradation. Mol Microbiol. 2012, 85: 213-224. 10.1111/j.1365-2958.2012.08105.x.
Nicolas P, Mäder U, Dervyn E, Rochat T, Leduc A, Pigeonneau N, Bidnenko E, Marchadier E, Hoebeke M, Aymerich S, Becher D, Bisicchia P, Botella E, Delumeau O, Doherty G, Denham EL, Fogg MJ, Fromion V, Goelzer A, Hansen A, Hartig E, Harwood CR, Homuth G, Jarmer H, Jules M, Klipp E, Le Chat L, Lecointe F, Lewis P, Liebermeister W: Condition-Dependent Transcriptome Reveals High-Level Regulatory Architecture in Bacillus subtilis. Science. 2012, 335: 1103-1106. 10.1126/science.1206848.
Brantl S: Acting antisense: plasmid- and chromosome-encoded sRNAs from Gram-positive bacteria. Future Microbiol. 2012, 7: 853-871. 10.2217/fmb.12.59.
Markowitz VM, Mavromatis K, Ivanova NN, Chen I-MA, Chu K, Kyrpides NC: IMG ER: a system for microbial genome annotation expert review and curation. Bioinformatics. 2009, 25: 2271-2278. 10.1093/bioinformatics/btp393.
Gardner PP, Daub J, Tate J, Moore BL, Osuch IH, Griffiths-Jones S, Finn RD, Nawrocki EP, Kolbe DL, Eddy SR, Bateman A: Rfam: Wikipedia, clans and the “decimal” release. Nucleic Acids Res. 2011, 39: D141-D145. 10.1093/nar/gkq1129.
Sherlock G: Analysis of large-scale gene expression data. Curr Opin Immunol. 2000, 12: 201-205. 10.1016/S0952-7915(99)00074-6.
Pool MR: Signal recognition particles in chloroplasts, bacteria, yeast and mammals (Review). Mol Membr Biol. 2005, 22: 3-15. 10.1080/09687860400026348.
Hardcastle TJ, Kelly KA: baySeq: empirical Bayesian methods for identifying differential expression in sequence count data. BMC Bioinformatics. 2010, 11: 422-10.1186/1471-2105-11-422.
Kiley Thomason M, Storz G: Bacterial antisense RNAs: how many are there, and what are they doing?. Annu Rev Genet. 2010, 44: 167-188. 10.1146/annurev-genet-102209-163523.
Levine E, Zhang Z, Kuhlman T, Hwa T: Quantitative characteristics of gene regulation by small RNA. PLoS Biol. 2007, 5: e229-10.1371/journal.pbio.0050229.
Mitschke J, Georg J, Scholz I, Sharma CM, Dienst D, Bantscheff J, Voß B, Steglich C, Wilde A, Vogel J, Hess WR: An experimentally anchored map of transcriptional start sites in the model cyanobacterium Synechocystis sp. PCC6803. Proc Natl Acad Sci U S A. 2011, 108: 1-6.
Wurtzel O, Sapra R, Chen F, Zhu Y, Simmons B, Sorek R: A single-base resolution map of an archaeal transcriptome. Genome Res. 2010, 20: 133-141. 10.1101/gr.100396.109.
Jäger D, Sharma C, Thomsen J, Ehlers C, Vogel J, Schmitz R: Deep sequencing analysis of the Methanosarcina mazei Gö1 transcriptome in response to nitrogen availability. Proc Natl Acad Sci U S A. 2009, 106: 21878-21882. 10.1073/pnas.0909051106.
Barbe V, Cruveiller S, Kunst F, Lenoble P, Meurice G, Sekowska A, Vallenet D, Wang T, Moszer I, Médigue C, Danchin A: From a consortium sequence to a unified sequence: the Bacillus subtilis 168 reference genome a decade later. Microbiology. 2009, 155: 1758-1775. 10.1099/mic.0.027839-0.
Rasmussen S, Nielsen H, Jarmer H: The Transcriptionally Active Regions in the Genome of Bacillus subtilis. Mol Microbiol. 2009, 73: 1043-1057. 10.1111/j.1365-2958.2009.06830.x.
Irnov I, Sharma C, Vogel J, Winkler W: Identification of regulatory RNAs in Bacillus subtilis. Nucleic Acids Res. 2010, 38: 6637-6651. 10.1093/nar/gkq454.
Mäder U, Schmeisky AG, Flórez LA, Stülke J: SubtiWiki–a comprehensive community resource for the model organism Bacillus subtilis. Nucleic Acids Res. 2012, 40: D1278-D1287. 10.1093/nar/gkr923.
Eiamphungporn W, Helmann JD: Extracytoplasmic function sigma factors regulate expression of the Bacillus subtilis yabE gene via a cis-acting antisense RNA. J Bacteriol. 2009, 191: 1101-1105. 10.1128/JB.01530-08.
Gimpel M, Heidrich N, Mäder U, Krügel H, Brantl S: A dual-function sRNA from B. subtilis: SR1 acts as a peptide encoding mRNA on the gapA operon. Mol Microbiol. 2010, 76: 990-1009. 10.1111/j.1365-2958.2010.07158.x.
Geissmann T, Chevalier C, Cros M-J, Boisset S, Fechter P, Noirot C, Schrenzel J, François P, Vandenesch F, Gaspin C, Romby P: A search for small noncoding RNAs in Staphylococcus aureus reveals a conserved sequence motif for regulation. Nucleic Acids Res. 2009, 37: 7239-7257. 10.1093/nar/gkp668.
Marchais A, Duperrier S, Durand S, Gautheret D, Stragier P: CsfG, a sporulation-specific, small non-coding RNA highly conserved in endospore formers. RNA Biol. 2011, 8: 358-364. 10.4161/rna.8.3.14998.
Saito S, Kakeshita H, Nakamura K: Novel small RNA-encoding genes in the intergenic regions of Bacillus subtilis. Gene. 2009, 428: 2-8. 10.1016/j.gene.2008.09.024.
Silvaggi J, Perkins J, Losick R: Genes for Small, Noncoding RNAs under Sporulation Control in Bacillus subtilis. J Bacteriol. 2006, 188: 532-541. 10.1128/JB.188.2.532-541.2006.
Jahn N, Preis H, Wiedemann C, Brantl S: BsrG/SR4 from Bacillus subtilis–the first temperature-dependent type I toxin-antitoxin system. Mol Microbiol. 2012, 83: 579-598. 10.1111/j.1365-2958.2011.07952.x.
Feucht A, Evans L, Errington J: Identification of sporulation genes by genome-wide analysis of the σE regulon of Bacillus subtilis. J Mol Biol. 2003, 149: 3023-3034.
Wemhoff S: Master thesis. Deletionsmutagenese des yqfCD/phoH-Operons in Bacillus licheniformis - Untersuchungen zur Auswirkung auf die Sporulation. 2008, Westfälische Wilhelms-Universität Münster, Institut für Molekulare Mikrobiologie und Biotechnologie
Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B: Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008, 5: 621-628. 10.1038/nmeth.1226.
Risso D, Schwartz K, Sherlock G, Dudoit S: GC-content normalization for RNA-Seq data. BMC Bioinformatics. 2011, 12: 480-10.1186/1471-2105-12-480.
Zdobnov EM, Apweiler R: InterProScan–an integration platform for the signature-recognition methods in InterPro. Bioinformatics. 2001, 17: 847-848. 10.1093/bioinformatics/17.9.847.
The UniProt Consortium: The Universal Protein Resource (UniProt) 2009. Nucleic Acids Res. 2009, 37: D169-D174.
Mao F, Dam P, Chou J, Olman V, Xu Y: DOOR: a database for prokaryotic operons. Nucleic Acids Res. 2009, 37: D459-D463. 10.1093/nar/gkn757.
Pertea M, Ayanbule K, Smedinghoff M, Salzberg SL: OperonDB: a comprehensive database of predicted operons in microbial genomes. Nucleic Acids Res. 2009, 37: D479-D482. 10.1093/nar/gkn784.
Kingsford CL, Ayanbule K, Salzberg SL: Rapid, accurate, computational discovery of Rho-independent transcription terminators illuminates their relationship to DNA uptake. Genome Biol. 2007, 8: R22-10.1186/gb-2007-8-2-r22.
Overbeek R: The ERGOTM genome analysis and discovery system. Nucleic Acids Res. 2003, 31: 164-171. 10.1093/nar/gkg148.
Rutherford K, Parkhill J, Crook J, Horsnell T, Rice P, Rajandream M-A, Barrell B: Artemis: sequence visualization and annotation. Bioinformatics. 2000, 16: 944-945. 10.1093/bioinformatics/16.10.944.
Bose M, Barber RD: Prophage Finder: a prophage loci prediction tool for prokaryotic genome sequences. In Silico Biol. 2006, 6: 223-227.
Casjens S: Prophages and bacterial genomics: what have we learned so far?. Mol Microbiol. 2003, 49: 277-300. 10.1046/j.1365-2958.2003.03580.x.
Saeed A, Bhagabati N, Braisted J, Liang W, Sharov V, Howe E, Li J, Thiagarajan M, White J, Quackenbush J: TM4 microarray software suite. Methods Enzymol. 2006, 411: 134-
Koburger T, Weibezahn J, Bernhardt J, Homuth G, Hecker M: Genome-wide mRNA profiling in glucose starved Bacillus subtilis cells. Mol Genet Genomics. 2005, 274: 1-12. 10.1007/s00438-005-1119-8.
Hahne H, Mäder U, Otto A, Bonn F, Steil L, Bremer E, Hecker M, Becher D: A comprehensive proteomics and transcriptomics analysis of Bacillus subtilis salt stress adaptation. J Bacteriol. 2010, 192: 870-882. 10.1128/JB.01106-09.
Carver TJ, Rutherford KM, Berriman M, Rajandream M-A, Barrell BG, Parkhill J: ACT: the Artemis Comparison Tool. Bioinformatics. 2005, 21: 3422-3423. 10.1093/bioinformatics/bti553.
Darling AE, Mau B, Perna NT: ProgressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One. 2010, 5: e11147-10.1371/journal.pone.0011147.
Marco-Sola S, Sammeth M, Guigó R, Ribeca P: The Gem mapper: fast, accurate and versatile alignment by filtration. Nat Methods. 2012, 9: 1186-1188.
Nawrocki EP, Kolbe DL, Eddy SR: Infernal 1.0: inference of RNA alignments. Bioinformatics. 2009, 25: 1335-1337. 10.1093/bioinformatics/btp157.
Martin J, Zhu W, Passalacqua K, Bergman N, Borodovsky M: Bacillus anthracis genome organization in light of whole transcriptome sequencing. BMC Bioinformatics. 2010, 11: S10-
Schwarzer M: PhD thesis. Physiologische Untersuchungen zur Regulation des Aminosäure-Stoffwechsels von Bacillus licheniformis DSM13. 2010, Georg-August-Universität Göttingen, Institut für Mikrobiologie und Genetik
Sambrook J, Fritsch EF, Maniatis T: Molecular Cloning - A Laboratory Manual. 1989, Cold Spring Harbor: Cold Spring Harbor Laboratory Press
This study was funded by the Bundesministerium für Bildung und Forschung (FKZ-0315387).
The authors would like to thank the Henkel Company for kind access to their fermentation facility. We are grateful for expert technical assistance by Ayhan Aydemir and Maik Schlieper.
The authors declare that they have no competing interests.
SW performed the experiments, analyzed data and wrote paper, SD developed the analysis tools, RH performed northern blots, JB and SE provided industrial fermentation facilities and performed the fermentation, SV prepared submission of genome and transcriptome data, RD wrote paper and provided research facilities, HL wrote paper, designed research and analyzed data. All authors read and approved the final version of the manuscript.
Electronic supplementary material
Additional file 1: Figure S1.: Distribution of whole transcriptome sequencing reads. Comparative operon prediction. Figure S3: Northern blot confirmation of non-coding RNAs. Figure S4: Antisense RNAs with putative impact on productivity. Figure S5: BsrG/SR4-like loci in B. licheniformis. Figure S6: Work flow charts. (PDF 818 KB)
Additional file 2: Table S1.: Whole transcriptome sequencing reads. NPKM values of RNA features. Table S3: NPKM values of all genes. Table S4: Differential RNA-Seq reads. Table S5: Transcription start sites. Table S6: Identified RNA features. Table S7: Predicted operons. Table S8: Corrected genes. Table S9: New genes. Table S10: Removed genes. Table S11: Predicted cis-regulatory elements. Table S12: Comparison of cis-regulatory elements known from B. subtilis. Table S13: Cluster analysis of ncRNA expression profiles. Table S14: Comparison of ncRNAs to B. subtilis. Table S15: Comparison of small RNAs from B. subtilis to identified ncRNAs. Table S16: Sequence Read Archive accession. Table S17: Primer pairs for Northern blots. (XLS 3 MB)
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
About this article
Cite this article
Wiegand, S., Dietrich, S., Hertel, R. et al. RNA-Seq of Bacillus licheniformis: active regulatory RNA features expressed within a productive fermentation. BMC Genomics 14, 667 (2013). https://doi.org/10.1186/1471-2164-14-667
- RNA-based regulation
- Antisense RNA
- Transcription start site
- Operon prediction