Bacterial growth conditions and RNA isolation
RNA was isolated from exponentially growing cultures of Staphylococcus aureus MW2 (kindly provided by Dr. William Kelley, University of Geneva, Switzerland), Staphylococcus epidermidis ATCC 12228 (kindly provided by Dr. Arnaud Riat, Geneva University Hospital, Switzerland), Enterobacter aerogenes KCTC 2190 (kindly provided by Dr. Thilo Köhler, University of Geneva, Switzerland), and Acinetobacter baumannii ATCC 17978 (kindly provided by Prof. Gottfried Wilharm, Robert Koch Institute, Berlin, Germany) in the following manner.
Over-night cultures of E. aerogenes and A. baumannii were diluted 1:100 in LB medium (Merck) and the cultures were agitated at 37 °C until OD600 reached 0.5, whereupon 4 ml of the culture was added to 20 ml ice-cold ethanol/acetone (1:1 mix), which immediately kills the cells and inactivates all enzymes. The cells were pelleted by 5 min of 4000 g centrifugation at 4 °C, the supernatant removed, and 1 ml ethanol/acetone was added before the pellet was stored at -80 °C. To prepare for lysis, the pellet was centrifuged again at 4 °C for 5 min at 4000 g, and the ethanol/acetone supernatant removed. The pellet was then resuspended in 1 ml 1xTE buffer (10 mM Tris-HCl pH 8, 1 mM EDTA), transferred to a 1.5 ml microtube and re-pelleted by 2 min of 17000 g centrifugation. The cells were lysed in 100 μl TE buffer for 10 min at 37 °C with 100 μg lysozyme and 40 U RNasin Plus RNase A inhibitor (Promega), and pure RNA was immediately isolated using the ReliaPrep RNA Tissue Miniprep System (Promega).
An over-night culture of S. aureus MW2 was diluted 1:100 in RPMI medium (RPMI1640 with HEPES buffer, from Sigma-Aldrich R7388) or MH (cation adjusted Mueller-Hinton, from Becton Dickinson), or 50 μl was spread on an MH-agar plate (13 % agar). One set of MH cultures were agitated at 30 °C (MH30) and at OD600 = 0.4 harvested as described above, the remaining were incubated at 37 °C and either harvested at OD600 = 0.4 (MH37 and RPMI) or after 24 h (MH_Agar). The lawn of MH_Agar cells were scraped off the plate and plunged into ice cold ethanol/acetone. The RNA was isolated as for E. aerogenes and A. baumannii, but lysed with 100 μg lysostaphein instead of lysozyme. S. epidermidis was treated identically to the S. aureus MH37 culture as described above.
TSS-EMOTE protocol
The TSS-EMOTE protocol is a further development of the original EMOTE protocol, which was designed to detect mono-phosphorylated 5’-ends, and details about the rationale behind the various oligo designs can be found in [18]. All water used in this protocol is molecular biology grade RNase-Free water (Amimed, Bioconcept, Allschwil, Switzerland). Micro tubes are also RNase-Free (Treff, Degersheim, Switzerland), as are the filtered micro-pipette tips used (Biotix, VWR International, Nyon, Switzerland and Sarstedt, Nümbrecht, Germany).
XRN1 digestion
10 μg total RNA from each sample was treated with 5 U XRN-1 (New England Biolabs) in a 80 μl volume of buffer NEB3, with 80 U RNasin Plus RNase inhibitor. After 4 h of incubation at 37 °C, 220 μl water was added and the enzymes were removed with two sequential phenol/chloroform/isoamyl (Sigma) extractions, using Phase Lock Gel (www.5prime.com) to facilitate separation of the phases. 20 μg glycogen (Roche) was added, and the salt concentration adjusted to 0.3 M sodium acetate, whereupon 3 volumes of 96 % ethanol were added. After thorough vortexing, the samples were stored over night at -80 °C, pelleted at 17000 g for 45 min at 4 °C, and washed twice in 750 μl cold 75 % ethanol. Finally, the XRN1 digested RNA was resuspended in 20 μl 0.1x TE buffer.
Ligation step
Two tubes for each RNA sample were prepared with 5 μl XRN-treated RNA and 50 pmol Rp6 oligo (Table 2). The tubes are heated to 70 °C for 5 min and then immediately flash-cooled in icewater. To each tube was added 10 μl of either “-RppH mix” or “+RppH mix”, which were then incubated at 37 °C. The “-RppH mix” consisted of 3.5 μl water, 2 μl RNA ligase buffer, 2 μl 10 mM ATP, 1 μl Murine RNase inhibitor (New England Biolabs), and 2 μl RNA ligase 1 (New England Biolabs). The “+RppH mix” consisted of 1.5 μl water, 2 ul RNA ligase buffer, 2 μl 10 mM ATP, 1 μl Murine RNase inhibitor, 2 μl RNA ligase 1, and 2 μl RppH (New England Biolabs). After 30 min of incubation, 1 μl 10 mM ATP was added, and the reaction was incubated over night at 16 °C. 70 μl water, 10 μl 3 M sodium acetate and 0.5 μl glycogen (20 μg/μl) was added and the tubes were vortexed. Then 300 μl ethanol was added, and the tubes were vortexed again, to precipitate the ligated RNA over night at -80 °C. The ligated RNA was pelleted for 45 min at 17000 g and 4 °C, then washed once with 900 μl cold 75 % ethanol, and resuspended in 20 μl 0.1xTE. The samples were heated to 70 °C for 10 min to dissolve RNA well, and to inactivate any residual enzymatic activity.
Reverse transcription
The DROAA oligo (Table 2) used to prime the formation of cDNA is designed to hybridise almost completely randomly on the RNA, however two A’s have been added at the extreme 3’-end of the DROAA oligo to reduce the number of possible priming sites approximate 16 times (it varies according to the G + C content of the organism). This semi-random design furthermore serves to prevent the DROAA from priming on the Rp6 oligo (which contains no uridine residues), which is present in excess at this step of the protocol.
For each ligation reaction, 8 μl Rp6-ligated RNA, was mixed with 1 μl 100 mM DTT and 1 μl 20 μM DROAA oligo, then heated to 75 °C for 3 min and the tubes were allowed to cool slowly to room temperature. An RT-mix was prepared for each tube, with 3 μl water, 4 μl 5x Reverse Transcriptase buffer (New England Biolabs), 1 μl 100 mM DTT, 1 μl 10 mM dNTP, 0.5 μl Murine RNase Inhibitor Murine and 1 μl M-MLV (-H) Reverse Transcriptase (New England Biolabs). 10 μl of the RT-mix was added to each tube of RNA + primer, which was incubated 10 min at room temperature, then 50 min at 42 °C, and finally 30 min at 65 °C (to inactivate enzymes). Finally, the cDNA was purified using a PCR-purification kit (GeneJET Gel Extraction Kit, Thermo Scientific, Milian, Vernier, Switzerland), and eluted in 50 μl Elution Buffer.
Second-strand PCR
To generate double stranded DNA, with the appropriate adaptors for Illumina sequencing, as well as addition of a four nucleotide EMOTE barcode which serves to identify the RNA sample and pool, a PCR reaction was prepared for each RT-reaction: 10 μl PCR-purified RT-reaction, 27 μl water, 10 μl Q5 Polymerase buffer, 1.5 μl dNTP (2.5 mM each), 1.5 μl 100nM of one of primers D6A, D6B, D6C, etc. (each with a unique EMOTE barcode, Table 2), 1.5 μl 10 uM Primer B-PE-PCR20, and 0.5 μl Q5® Hot Start High-Fidelity DNA Polymerase (NEB).
These 50 μl PCR reactions with low D6x primer concentration was run for the 5 first cycles of the following program: 2 min @98 °C, (10s @98 °C, 20s @50 °C, 2 min @ 72 °C) 31 cycles, 5 min @72 °C and finally 4 °C. At the end of cycle 5, the PCR machine was paused, and 1.5 μl 10 μM A-PE-PCR10 was added (Table 2), and the tubes replaced in the PCR machine for the program to continue for another 25 cycles. To visually verify that the yields were similar, 10 μl of the PCR reactions was loaded on an agarose gel, and the remaining 40 μl from each PCR were mixed with 40 μl Binding Buffer (GeneJET Gel Extraction Kit) and 20 μl Isopropanol, to be purified according to protocol and eluted in 40 μl elution buffer.
Size-selection of PCR-products
The Illumina technology gives poor results with inserts above 800 bp, and the TSS-EMOTE protocol frequently yields low molecular weight products (probably a mixture of unincorporated primers, false priming products and Rp6-concatamers). A 1 % agarose gel was therefore used to select PCR-products in the size-range 300 bp to 1000 bp, which were subsequently extracted from the gel with GeneJET Gel Extraction Kit, adding an equal volume of isopropanol during binding (to ensure that DNA below 500 bp was not lost), and eluted in 50 μl Elution Buffer. The upper limit in PCR-product length should not bias the TSS-EMOTE assay towards shorter transcripts, since the Revers Transcription reaction is based on random priming of the DROAA oligo. Thus each transcript will give rise to a random range of cDNA molecules, all ending in a sequence complementary to the Rp6 oligo, but with different start positions and therefore different lengths.
Illumina sequencing
The concentration of the DNA recovered from the agarose gel (see above) were quantified with a Qubit Fluorometer, diluted appropriately, and loaded onto a HiSeq 2500 machine (Illumina, San Diego, USA). Between 6 and 14 million reads were obtained for each RNA sample, and on average about 3 million of these could be mapped onto their respective genomes.
Additionally, the Illumina TruSeq total RNA stranded protocol was used to prepare a library of the same RNA that was used for the TSS-EMOTE of RNA from S. aureus grown in RPMI medium. The resulting 100 nt paired-end reads were mapped to the reference genome with Bowtie [50]. However, for further analyses (for example in Figs. 2d and 4), only the upstream reads were counted in order to maximise the signal near the 5’-ends.
In silico TSS identification
The raw sequence read data from the Illumina sequencing was converted to EMOTE table-format according to the principles described in [18], using the EMOTE-conv software package [51], which lists all genomic positions for which an RNA 5'-end has been detected, with the number of Illumina reads and the number of Unique Molecular Identifiers (UMIs) (i.e. bona fide unique ligation events) that correspond to each position (examples of read-counts plotted against UMIs are shown in Additional file 7: Figure S3). With the Rp6 oligo used in this study, the maximum UMI-count for a given position is 2187 (=37). This value was not reached in the data presented here, however if saturation becomes a problem in future projects, then the EMOTE-conv software permits down-sampling of the Illumina fastq-file, to get below saturation level.
The data in the EMOTE tables were then analysed as described in the results section to identify the TSSs. The reference sequences used were: NC_003923 (S. aureus MW2 chromosome), CP000521 (Acinetobacter baumannii ATCC 17978 chromosome), CP000522 (Acinetobacter baumannii ATCC 17978 plasmid pAB1), CP000523 (Acinetobacter baumannii ATCC 17978 plasmid pAB2), CP012004.1 (Acinetobacter baumannii ATCC 17978-mff chromosome), CP012005.1 (Acinetobacter baumannii ATCC 17978-mff plasmid pAB3), NC_004461 (Staphylococcus epidermidis ATCC 12228 chromosome), NC_005008 (Staphylococcus epidermidis ATCC 12228 plasmid pSE-12228-01), NC_005007 (Staphylococcus epidermidis ATCC 12228 plasmid pSE-12228-02), NC_005006 (Staphylococcus epidermidis ATCC 12228 plasmid pSE-12228-03), NC_005005 (Staphylococcus epidermidis ATCC 12228 plasmid pSE-12228-04), NC_005004 (Staphylococcus epidermidis ATCC 12228 plasmid pSE-12228-05), NC_005003 (Staphylococcus epidermidis ATCC 12228 plasmid pSE-12228-06), and NC_015663 (Enterobacter aerogenes KCTC 2190 chromosome).
A number of statistical tools have been developed for mRNA-seq differential expression analysis [52, 53], however they are designed to compare mRNA levels of two separate RNA samples (each sample representing multiple biological replicates). In contrast, in TSS-EMOTE data, the comparison is within the same RNA sample, only treated with slightly different enzyme mixtures. Therefore, we favour a simple statistical test that rely on a beta-binomial distribution to integrate this information, and is able to assign a p-value to each potential TSS:
Let N+ and N- be the number of reads that align onto the genome in the + RppH and the -RppH pools. Let also q+ and q- be the UMIs obtained for a given position on the genome in the two pools. Our model assumes that in absence of TSS, q+ follows a beta-binomial distribution parameterized by q-, N- and N+ as follows,
$$ P\left({q}^{+}\left|{N}^{+},\kern0.5em {q}^{-},\kern0.5em {N}^{-}\right.\right)\kern0.5em =\kern0.5em \beta \_ binomial\begin{array}{c}\hfill \beta =1+{N}^{-}-{q}^{-}\hfill \\ {}\hfill a=1+{q}^{-}\kern3.5em \hfill \end{array}\left({q}^{+},\kern0.5em {N}^{+}\right). $$
The area under the upper tail of the distribution reflects how the data deviate from the model, and reveal our confidence in the position to be a TSS. Once the model for each -RppH/+RppH pair has been calculated, p-values of biological replicates of the TSS-EMOTE assay are further combined with Fisher’s method. Probabilities are adjusted for multiple testing by computing the False Discovery Rate (FDR), and requiring TSS sites to have a FDR < 0.01. Only candidates with q+ ≥ 5 reads in one of the replicates are considered in this TSS detection process. An example of the UMI-corrected read-counts mapping to the sarA locus of S. aureus are shown in Additional file 8: Figure S4B, together with the TSSs previously identified by others (Additional file 1: Table S1) and the TSSs identified in this study (Additional file 3: Table S2).
All computations are performed with the R programming language, making use of VGAM package for the beta binomial distribution [54, 55], and the R-scripts used are available upon request.
TSS annotation
Identified TSSs further enter an annotation process where we determine: 1) if the TSS fall inside an annotated ORF; 2) the name of the first annotated ORF following the TSSs position in a strand specific manner; 3) the 50 bp surrounding sequence (between positions -45 and +5 around the TSS), into which we look for a match of the sigma-factor A recognition pattern TATAAT and TTGACA around positions -11 and -37 respectively. Additionally, TSSs that are less than 5 bp apart from each other are clustered into a TSS cluster, and the TSS with the smallest p-value is considered as the representative TSS (rTSS) of this cluster.
Finally, S. aureus TSSs are annotated with the closest predicted non-coding RNA for MW2 strain taken from the staphylococcal regulatory RNA database [22]. We also make use of the work of [4] on the sigma-factor B regulon to flag rTSSs that immediately precede an ORF upregulated by sigma-factor B in S. aureus.
Transcription terminator prediction
Transcription terminators are predicted using the software TransTermHP v2.09 [39]. The program is run with standard parameter values, and an annotation file containing coordinates of all annotated genes.
Sequence logos
All sequence logos were generated with the R package motifStack from Bioconductor, and using the background correction to adjust the signal according to GC content of the organism [56].