Skip to main content
Fig. 1 | BMC Genomics

Fig. 1

From: TSS-EMOTE, a refined protocol for a more complete and less biased global mapping of transcription start sites in bacterial pathogens

Fig. 1

TSS-EMOTE flowchart. The TSS-EMOTE assay consists of a wet-lab library preparation (panels a to g) and in silico analyses (panel H to N). An asterisk continually marks the original 5’-base of tri-phosphorylated RNA (thin red line). a Total RNA is purified, and digested with XRN1 5’-exonuclease, which removes the vast majority of 5’ mono-phosphorylated RNA from the sample (including 16S and 23S rRNA). b and c The XRN1 treated RNA is mixed with large excess of a synthetic RNA oligo (Rp6, shown in blue), and split into two pools. Both pools receive T4 RNA ligase, but only the “+RppH” pool is co-treated with RppH, an enzyme that converts 5’ tri-phosphorylated ends to mono-phosphorylated ends, thus allowing the ligase to use them as substrates. d and e After the ligation reaction, a semi-random primer is used to reverse-transcribe the RNA and simultaneously add a 2.0 Illumina adapter (black “B”). This results in cDNA with a 2.0 Illumina adaptor (for reverse reads in paired-end sequencing) at the 5’-end and if the template RNA was ligated to an Rp6 oligo, then the cDNA will also have a complementary sequence to Rp6 at the 3’-end (cRp6). f PCR is used to specifically amplify cDNAs that carry the 2.0 Illumina adaptor and cRp6 sequences. This step moreover adds a 1.0 Illumina adaptor (for forward reads in paired-end sequencing) and a sample-specific 4-base EMOTE barcode (blue line and “XXX”, respectively) to index the molecules (different barcodes for the -RppH and + RppH pools). The barcode of the -RppH pool will designate molecules where the XRN1 treatments has been incomplete, and this information is incorporated into the in silico analysis (see below). g The barcoded DNA from various samples (and pools) can be mixed, and loaded directly into an Illumina HiSeq machine. Millions of 50 nt sequences are obtained, each of which will span the EMOTE barcode, both known and random sections of the Rp6 oligo (see Methods), and it will reveal the first 20 nt of the native 5’-end of the ligated RNA molecule. These 20 nt are sufficient to map the vast majority of 5’-ends to a unique position on the small genomes of the bacteria in this study. However, longer Illumina reads (and thus longer mapping sequences) can be used if the TSSs are in repeated regions or if large-genome organisms, such as humans, are being examined. h The in silico pipeline input consists of stranded RNA-seq reads for one or multiple biological replicates in FASTQ format. Each replicate includes a FASTQ for the -RppH pool and another for the + RppH pool. i The FASTQ files go through EMOTE-conv software [51] that parses the reads, aligns them to the genome, and perform the quantification. Thus, for each genomic position we obtain the number of reads whose first nucleotide align at this genomic position, and on which strand it maps. The counts are further corrected for PCR biases by looking at the unique molecular identifiers (UMIs) sequences available in the unaligned part of the EMOTE read. j Quantification counts obtained for + RppH and -RppH pools are compared through a beta-binomial model that tests whether the identified 5’ ends in the + RppH pool is significantly enriched over the identified 5’ ends in the -RppH pool at a given position. The process results in a p-value that reflects our confidence in the genomic position to be enriched in the + RppH pool of the biological replicate. k The p-values of all the biological replicates are combined into a single p-value with Fisher’s method. l and m To correct the p-values for multiple testing across all genomic positions, the false discovery rate (FDR) is evaluated and only those with a FDR ≤ 0.01 are considered to be TSSs. Note also that for the FDR is only calculated for genomic positions with at least 5 detected ligation-events in at least one of the + RppH pools (UMI ≥ 5). n The TSSs then enter an annotation process that retrieve their surrounding sequence and downstream ORFs. TSSs separated by less than 5 bp are clustered together. Finally, to draw a global picture of operon structures, an independent detection of transcription terminators is operated with the software TransTermHP [39]. o Sequence of the RNA oligo Rp6 and a typical Illumina sequencing read from a TSS-EMOTE experiment. The Recognition Sequence serves as priming site for the PCR in panel F. UMI: The randomly incorporated nucleotides in the Rp6 oligo that serves to whether Illumina reads with identical Mapping Sequences originate from separate ligation events. CS: Control Sequence. EB: EMOTE barcode to index the Illumina reads. An asterisk indicates the 5’ nucleotide of the original RNA molecule

Back to article page