The Mycobacterium tuberculosis transcriptional landscape under genotoxic stress

Background As an intracellular human pathogen, Mycobacterium tuberculosis (Mtb) is facing multiple stressful stimuli inside the macrophage and the granuloma. Understanding Mtb responses to stress is essential to identify new virulence factors and pathways that play a role in the survival of the tubercle bacillus. The main goal of this study was to map the regulatory networks of differentially expressed (DE) transcripts in Mtb upon various forms of genotoxic stress. We exposed Mtb cells to oxidative (H2O2 or paraquat), nitrosative (DETA/NO), or alkylation (MNNG) stress or mitomycin C, inducing double-strand breaks in the DNA. Total RNA was isolated from treated and untreated cells and subjected to high-throughput deep sequencing. The data generated was analysed to identify DE genes encoding mRNAs, non-coding RNAs (ncRNAs), and the genes potentially targeted by ncRNAs. Results The most significant transcriptomic alteration with more than 700 DE genes was seen under nitrosative stress. In addition to genes that belong to the replication, recombination and repair (3R) group, mainly found under mitomycin C stress, we identified DE genes important for bacterial virulence and survival, such as genes of the type VII secretion system (T7SS) and the proline-glutamic acid/proline-proline-glutamic acid (PE/PPE) family. By predicting the structures of hypothetical proteins (HPs) encoded by DE genes, we found that some of these HPs might be involved in mycobacterial genome maintenance. We also applied a state-of-the-art method to predict potential target genes of the identified ncRNAs and found that some of these could regulate several genes that might be directly involved in the response to genotoxic stress. Conclusions Our study reflects the complexity of the response of Mtb in handling genotoxic stress. In addition to genes involved in genome maintenance, other potential key players, such as the members of the T7SS and PE/PPE gene family, were identified. This plethora of responses is detected not only at the level of DE genes encoding mRNAs but also at the level of ncRNAs and their potential targets. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-3132-1) contains supplementary material, which is available to authorized users.


Background
Tuberculosis (TB) remains the leading cause of mortality due to a bacterial pathogen worldwide. The causative agent of TB, Mycobacterium tuberculosis (Mtb), infects approximately one third of the world's human population [1]. The main route of Mtb infection is from the respiratory tract to the lung. After entering the lung, the first immune cell type encountered by the bacteria is the alveolar macrophage. Mtb has an extraordinary ability to persist and replicate inside the macrophages, which represent a hostile environment where most other pathogens perish. This ability is a prerequisite for Mtb to maintain infection. During latency, Mtb is viable but non-replicating and the bacteria are confined to granulomas [2]. The granuloma is a unique ecosystem where Mtb growth remains most of the time in balance with host antibacterial effector mechanisms [3].
After phagocytosis, Mtb experiences endogenous and exogenous oxidative stress [4]. The endogenous stress is the result of the incomplete reduction of molecular oxygen during aerobic respiration while the exogenous stress is due to the oxidative burst generated by NADPH oxidase in the phagosome [5]. The reactive oxygen species (ROS) that are created in these processes include peroxides, hydroxyl radicals and superoxide. In the macrophage, Mtb also faces nitrosative stress from reactive nitrogen species (RNS), such as nitric oxide, nitrite, nitrogen dioxide and nitrates [6]. ROS and RNS can be bactericidal as they react with a wide range of macromolecules, like nucleic acids, proteins, lipids and carbohydrates [7][8][9]. The tubercle bacillus counteracts the effects of ROS and RNS by producing several enzymes, including catalase, peroxidase, superoxide dismutase, and nitrosothiol reductase, to ensure its intracellular survival and persistence [10].
DNA double strand breaks (DSBs) are critical lesions that can result in cell death or a wide variety of genetic alterations. In mycobacteria, DSBs are repaired by three distinct repair mechanisms. These are homologous recombination (HR), dependent on AdnAB nucleases and RecA, non-homologous end-joining (NHEJ), involving KU and LigD [11], and single strand annealing (SSA), which is achieved through RecBCD in a RecAindependent manner [11].
Regulatory non-coding RNAs (ncRNAs) play important roles in bacterial gene regulation at the posttranscriptional level [12]. They are functional RNA molecules that are not translated into proteins. They usually have a length of 50-400 nucleotides and base-pair with mRNA to regulate mRNA stability or the efficiency of mRNA translation [13]. NcRNAs can be 5′ or 3′ untranslated regions (UTRs), antisense or cis-encoded small RNAs, and intergenic or trans-encoded transcripts [14]. Most ncRNAs regulate gene expression and can provide a quick response to changing environmental conditions such as genotoxic stress, nutrient deprivation or infection [15][16][17]. Studies on infectious processes have demonstrated that ncRNAs can coordinate precise gene expression and thus play an important role in microbial pathogenesis [16]. While the number of characterised ncRNAs steadily increases, only a limited number of the corresponding mRNA targets have been identified.
Until now, only a few studies reported on the global mycobacterial responses to genotoxic stress [6,18] and, to our knowledge, there were no reports addressing the complexity of the mycobacterial response to several forms of genotoxic stress. Our main goal in this study was to explore the Mtb responses to various forms of genotoxic stress conditions by analysing the whole transcriptome. Mtb responses to oxidative agents (H 2 O 2 and paraquat) and the nitrosative agent diethylenetriamine nitric oxide adduct (DETA/NO) were investigated. As one of the cellular consequences of nitrosative stress is alkylation and thereby damage of the DNA, we also investigated the response of Mtb to the alkylating agent methylnitronitrosoguanidine (MNNG). Furthermore, we studied the transcriptomic response of Mtb to mitomycin C (MMC), a substance known to induce DNA crosslinking leading to DSBs [19]. In addition to identifying the DE genes encoding mRNAs, this study aims to detect ncRNA transcripts which are expressed under different stress conditions and to predict their potential target genes. This analysis will enable improved understanding of Mtb regulatory networks that operate under various forms of genotoxic stress.

Nitrosative stress induced the most prominent transcriptomic response
Several DE genes were identified for each genotoxic stress condition investigated. Treatment of Mtb with paraquat, H 2 O 2 and MNNG caused only the upregulation of genes. Oxidative stress generated by paraquat led to the up-regulation of five genes namely, katG, kmtR, Rv3269, Rv1405c, and mmsA ( Up-regulation of 128 genes and the down-regulation of the rrs gene, encoding 16S rRNA, and the two ncRNAs MTS2823 and ssr were observed under MMC treatment (Additional file 1). Nitrosative stress in the form of DETA/NO resulted in the up-regulation of 383 genes and the down-regulation of 340 genes with fold changes ranging from +257 to -27 (Additional file 1).
In order to validate the RNA-seq data analysis, a small subset of the DE genes identified from each genotoxic stress was selected based on the level of their expression fold change compared to the control. The expression levels were tested using quantitative real-time PCR (qPCR) (Additional file 2). The qPCR results confirmed the results from the deep sequencing approach (Additional file 3: Figure S1).
Involvement of 3R genes, the type VII secretion system, and PE/PPE genes in the genotoxic response A Clusters of orthologous group (COG) analysis of the DE genes from each genotoxic stress was carried out to identify any bias in gene-functional-category distribution.
Oxidative stress inflicted by H 2 O 2 induced the expression of genes belonging to 8 COG categories (Fig. 2a). The top three of these categories includes replication, recombination and repair (3R) genes (COG L), post-translational modifications, protein turnover and chaperones (COG O) and secondary metabolites biosynthesis, transport and catabolism (COG Q) (Fig. 2a). Among the 128 up-regulated genes due to the MMC treatment, 38 come under the COG L category (Fig. 2b). For the DETA/NO stress, the DE genes were distributed in 17 COG categories whereby nine of them contained more down-regulated genes than up-regulated genes (Fig. 2c). The energy production and conversion category (COG C) included the most genes affected with 28 up-and 35-down-regulated genes ( Fig. 2c; Additional file 1). The two categories transcription (COG K) and signal transduction mechanisms (COG T) Genes with a fold change greater than 2 and a p-value ≤ 0.05 were considered as DE. Genes that do not fulfil these criteria are shown in black. Green dots correspond to up-regulated genes while red dots correspond to down-regulated genes under stress encompassed mostly up-regulated genes. The genes in the categories of nucleotide transport and metabolism (COG F) and cell wall/membrane/envelope biogenesis (COG M) were mostly down-regulated. In COG K, genes encoding sigma factors and SOS response genes were found to be DE. The cell motility category (COG N) contained only 5 up-regulated genes. Four out of these five genes belong to the PE/PPE family (ppe1, ppe4, ppe20, and ppe37). Furthermore, the DE genes include members of the T7SS, namely eccA3, eccC3, eccD1, eccD3 and eccE3. Interestingly, among the up-regulated genes, one transcription factor (Rv3249c) was already predicted to regulate the expression of multiple genes [20], including alkB, rubA and rubB, but also eccB1 eccB2, eccB3, and eccB5. The proteins EccB1, EccB2, EccB3 and EccB5 are all part of the T7SS.
Among the DE 3R genes, predominantly affected by MMC and DETA/NO treatments, are mostly genes that encode helicases, endonucleases and resolvases and have transcription fold changes ranging from -5 to +18 (Fig. 3). The proteins encoded by these 3R genes are directly involved in genome maintenance.

Identification of new potential 3R genes
For all DE genes annotated as coding for HPs (Additional file 4), we performed 3D structure predictions using Phyre 2 [21,22]. By considering only proteins modelled with more than 70 % identity and 30 % coverage, we detected proteins that are predicted to be hydrolases/endonucleases, metal binding proteins, and DNA binding proteins. Some proteins were predicted with a high level of confidence to contain a DNA binding domain

Clustering of DE genes into transcription units
By analysing the distribution of DE genes throughout the genome, we found that several DE genes were organized in clusters, defined as at least two neighbouring genes according to their genomic location ( Fig. 4; Additional file 5). Genes in clusters may be part of the same transcriptional unit. No clusters were identified for the DE genes detected in the sample from the paraquat treatment (Fig. 4a). Regarding H 2 O 2 stress, 5 clusters were identified corresponding to 17 DE genes. The largest cluster (cluster 2) includes 6 genes (mbtG, mbtF, mbtE, mbtD, mbtC, mbtB) belonging to the secondary metabolites biosynthesis, transport and catabolism category (COG G). They are involved in the biosynthesis of mycobactin, a mycobacterial siderophore with high affinity for environmental iron [23]. Of the 15 clusters identified for MMC treatment, nine of them contain 3R genes. The cluster 10 is the largest with 4 genes (Rv2734, Rv2735c, recX, and recA). Among the 723 DE genes due to the DETA/NO stress, 294 (40.6 %) of them were organized in clusters. The three biggest clusters, cluster 15, cluster 73 and cluster 90 include 19, 13 and 11 DE genes, respectively. The cluster 15 contains ribosomal binding proteins (rpsJ -rplO), the cluster 73 includes genes encoding for the subunits of the NADH dehydrogenase I (nuoB to nuoN) and the cluster 90 contains the genes encoding Esx-3 and its secreted substrates. All genes in the cluster 15 and the cluster 73 were down-regulated and the genes in cluster 90 were up-regulated under nitrosative stress. The mbt genes were also found to be highly upregulated under nitrosative stress (Cluster 57 and Cluster 89). As already reported by Yang and colleagues [24], the two up-regulated genes alkA and ogt due to MNNG stress are located in one cluster (Fig. 4e). Sigma factor binding site analysis in the 100 bp upstream of all 110 identified clusters revealed the presence of binding sites for SigB, SigC, SigE and SigF. The most common was the SigB binding site, found for 21gene clusters, while the binding sites for the other sigma factors were only found once each (Additional file 5).

Potential role of ncRNA for genotoxic stress response
Thorough bioinformatics analysis of the sequencing data from the samples treated with H 2 O 2 and paraquat, 44 and 17 ncRNAs were predicted, respectively (Additional file 6). Out of these, 16 ncRNAs were found under both oxidative stress conditions. Only one ncRNA was The data from the DETA/NO stressed samples revealed 36 ncRNAs. From the 120 ncRNAs identified in the data from the MMC stressed sample, we found 93 to be novel (Additional file 6). Among all the 202 ncRNAs identified 44 were previously reported (Additional file 6, highlighted in yellow) [25][26][27][28][29]. The length of the ncRNAs varied between 50 and 270 bases. A subset of the identified ncRNAs was experimentally verified by northern blot analyses confirming the predicted transcript sizes (Fig. 5). By using intaRNA, the potential gene targets of the ncRNAs were predicted. A total of 76 ncRNAs were predicted to regulate 80 genes under paraquat (6), H 2 O 2 (17), MMC (51) and DETA/NO (13) treatments ( Fig. 6; Additional file 6) with 5 genes targeted by more than one ncRNA.

Discussion
In this study, we determined the transcriptomic responses of the Mtb to various forms of genotoxic stress. Our results show massive and complex gene regulation especially for the nitrosative stress inflicted by DETA/ NO. Alkylation damage due to MNNG treatment led to the up-regulation of only two genes, alkA and ogt that are part of the ada operon of Mtb and this result corroborates a previous study [24].
When facing oxidative stress by H 2 O 2 or by paraquat, Mtb responded differently in terms of the number of DE genes. However, regarding the ncRNAs, 16 ncRNAs are common under both treatments. Regarding gene regulation, the catalase-peroxidase gene katG was found to be the only common gene up-regulated under both treatments. This was also the case under nitrosative stress, confirming the importance of KatG in stress responses [30]. The enzyme KatG, together with the metalloenzyme KmtR, is involved in the oxidative stress response by playing a role in the detoxification of host-generated free radicals [31][32][33]. In addition, furA is up-regulated under oxidative stress due to H 2 O 2 (fold change = 2.87) and nitrosative (fold change = 4.5) stress conditions. Both oxidizing treatments caused the up-regulation of genes that are mainly taken part in metal transport. The correlation between oxidative stress and expression of metal transport proteins has already been observed [34,35]. FurA is known to be involved in iron homeostasis and stress response in many bacteria [36,37]. We suggest that FurA might contribute as an oxidative stress sensing regulator. Our observation of the up-regulation of mbt genes under both oxidative and nitrosative stress corroborate the study reported by Voskuil and colleagues [6]. The mbt genes participate in the biosynthesis of the hydroxyphenyloxazolinecontaining mycobactins, a class of siderophores involved in iron acquisition [38][39][40]. Iron homeostasis must be finely controlled as both excess and limitation of iron may cause oxidative stress [41][42][43]. Under normal conditions, the regulation of the siderophore biosynthesis genes is dependent on the iron concentration in the environment which is fluctuating due to a continuous oxidation/reduction of Fe 2+ /Fe 3+ [44]. The up-regulation of the mbt genes demonstrates that Mtb is using this pathway to limit the oxidative stress damage by scavenging more iron from its environment. In addition, the genes mmpS4, mmpS5, mmpL3-L5, whose products are needed to transport and export siderophores [40], were up-regulated by more than two fold changes. Concomitantly, the gene bfrB, which encodes the iron storage protein bacterioferritin [43], was found to be down-regulated nearly 27 fold under nitrosative stress. These observations indicate that, especially under nitrosative stress, Mtb strongly reacts by acquiring more iron from its environment.
The high number of DE genes under nitrosative stress is probably linked to the up-regulation of several transcription factors from COG K category. Several genes that are part of the T7SS were also up-regulated. A genome-wide binding approach of all predicted transcription factors showed that these genes could be regulated by the transcription factor encoded by Rv3249c [20] that was up-regulated by 7 fold under nitrosative stress. The T7SS, a mycobacterial specialised secretion machinery, translocates different effectors throughout the cell wall [45,46] and plays a major role in the virulence of Mtb [45,47]. Of the five T7SSs in Mtb (ESX-1 to ESX-5), ESX-1 [48] and ESX-3 [49,50] are known to be essential for virulence and growth of Mtb. Components of the T7SS might represent good targets for antituberculosis drug discovery [51]. The newly discovered up-regulation of several ESX-3 core channel genes and the genes encoding for ESX-3 secreted substrates, pe5, and ppe4 (Cluster 90), might be a consequence of the iron depletion under nitrosative stress. To the best of our knowledge this is the first observation that nitrosative stress affects the expression of T7SS genes in Mtb. Interestingly, the secreted substrates PE5 and PPE4 are found to be critical for the siderophore-mediated ironacquisition functions of ESX-3 while ESXG and EsxH play an essential role in virulence [52]. Eight pe/ppe genes were up-regulated with ppe37 being the most upregulated one (131 fold). This result corroborates previous studies reporting the over-expression of ppe37 under nitrosative and oxidative growth conditions [53,54]. In murine macrophages infected with M. smegmatis, PPE37 induced the low production of tumour necrosis factor alpha (TNFα) and interleukin 6 (IL6) [55]. Some of the gene products of the up-regulated pe/ppe genes might therefore be involved in regulating cytokine production of the host cell. It is also important to highlight that under DETA/NO stress dosR and dosS in cluster 71 were up-regulated 30 and 18 fold, respectively. The genes Fig. 5 Validation of selected ncRNAs by northern blot. Four newly discovered ncRNAs (ncRNA34, ncRNA83, ncRNA192, ncRNA200) were randomly selected for northern blot validation. For each selected ncRNA, the normalized coverage plots from RNA-seq data of aligned reads is shown on the top with the genomic context identified from the Mtb H37Rv (NC_000962.3) genome and the position of the ncRNA is indicated by coloured arrows. The coverage data is indicated with the following colours: no stress -black; H 2 O 2 -red; MMC -blue; DETA/NO -green. RNA samples for the northern blots are from the correspondent samples. Control RNA is from non-stressed cells. For each northern blot, the loading control (5S rRNA) and decade marker are shown dosR and dosS are conserved in many mycobacterial species and DosS/DosR represents the two-component system that regulates the dormancy regulon in Mtb [56]. The up-regulation of dosS/dosR would lead to slower bacterial growth and lower metabolic activity. By reducing its metabolic activity, Mtb could circumvent the genotoxic stress. While the DosS/DosR two component system is extensively studied [56][57][58][59] its selective involvement in only some stress reactions needs further investigation.
Notably, DSB damage inflicted by MMC treatment led especially to the up-regulation of 3R genes, predominantly DNA repair genes ( Fig. 3 and Additional file 6). Among the up-regulated 3R genes recA and lexA were the highest up-regulated genes which are in agreement with the fact that bacteria respond to DNA damage by mounting a coordinated cellular SOS response, governed by the RecA and LexA. During normal bacterial growth, the LexA repressor down-regulates its own expression and regulates, in addition, the expression of more than 40 unlinked genes [60]. RecA protein, upon DNA damage, binds to single-stranded DNA and induces the selfcleavage of LexA [61]. The dissociation of LexA from its DNA targets causes the de-represseion of genes for DNA damage repair and tolerance. Among the upregulated genes, we found nearly 20 genes (Additional file 1, highlighted in blue) that harbour a putative SOS box [62] and thereby could be regulated by LexA. Taken together, this shows that MMC induces a classic SOS response in Mtb. Several DE genes that encode for HPs were found in our analyses. The 3D structural modelling of these proteins showed that some of them may code for DNA binding proteins, metal binding proteins, or endonucleases. They could play an important role in genome maintenance and stability during genotoxic stress as some of them were strongly induced (more than 20 fold) (Additional file 4). These HPs will be the subjects of future functional characterisation.
While the DETA/NO stress gave the highest number of DE mRNAs, the highest number of stress induced ncRNAs was predicted for MMC stress when compared to the other treatments. Collectively, these results show that Mtb responds very differently to the various forms of stress [4]. Among the cumulative 202 ncRNAs, 19 (9.4 %) 3′ UTRs and 34 (16.8 %) 5′ UTRs were identified. The rest of the ncRNAs (73.8 %) were located in coding sequences. This distribution could be explained by the fact that more than 90 % of the Mtb genome is occupied by coding sequence. The ncRNAs identified are distributed throughout the genome, while there seems to be some ncRNA clustering (red dots in Fig. 6). Local genome content and architecture could be involved [63]. The classification of the predicted target genes (triangles in Fig. 6) showed that six belong to the 3R category indicating an involvement in DNA repair. Among the genes targeted under oxidative stress by ncRNAs are mutM (fpg), encoding a DNA glycosylase [64], and Rv2090, encoding an exonuclease. Another targeted gene, Rv3202c (adnA), was found to be upregulated under oxidative, nitrosative and MMC stress. This reflects the importance of adnA, which was previously reported to be up-regulated together with Rv3201c (adnB) under MMC treatment [65]. AdnAB is a helicase-nuclease which is involved in RecA-dependent homologous recombination [11].

Conclusions
This study demonstrated the distinct transcriptomic responses Mtb exerts when subjected to genotoxic stress. The data analyses revealed that in addition to the already reported DE genes, genes involved in the T7SS and the pe/ppe gene families were found to be upregulated under genotoxic stress. The structural analysis of the hypothetical proteins from DE genes suggested that some of them are involved in genome maintenance. We also identified several ncRNAs expressed under stress conditions and their potential gene targets thereby unravelling new regulatory networks of Mtb. The insights into the networks that operate under genotoxic stress will enable deeper understanding of Mtb survival strategies and in turn will help to gain better insight into the mycobacterial response to the host defense. Our results furthermore provide a list of genome maintenance, survival and virulence factors that potentially could be considered as targets in drug discovery and vaccine development. . Total RNA was isolated using TRIzol (Invitrogen) followed by RNeasy spin columns (Qiagen) as previously described [64] and treated with Turbo DNase (Life Technologies) following the manufacturer's instruction. Three independent experiments were conducted for each genotoxic stress and the controls. The workflow of methods employed to identify DE genes and noncoding RNAs are depicted in Additional file 7: Figure S2.

Library preparation for cDNA and sequencing
The cDNA library construction and pyrosequencing were performed as previously described [67]. RNA sequencing was carried out using the HiSeq 2000 sequencing system (Illumina) at the Max Planck Genome Centre, Cologne, Germany. For each sample a read depth ranging from 5 to 10 million reads was obtained.

Processing of RNAseq data and bioinformatics analysis
The workflow for bioinformatics data processing of RNAseq data is outlined in Additional file 7: Figure S2. Quality control was performed using the fastx-toolkit [68] and FastQC [69]. Reads with mean quality score lower than 30 were excluded from downstream analysis. Reads were aligned against the Mtb H37Rv genome sequence (version NC_000962.3) as a reference, using segemehl [70]. The segemehl implements an enhanced suffix array (ESA) matching strategy, yielding higher sensitivity and lower false positive rate than the Burrows-Wheeler Aligner (BWA) when aligning RNA-seq data [71]. The number of reads corresponding to each annotated gene was determined using the HTSeq python package by applying the 'union' overlap resolution mode [72]. Data were normalized and DE genes were identified using DESeq2 package [73] from the Bioconductor framework release 3.0 [74,75]. For normalization, a scaling factor for each RNA-seq data was calculated as the median of the ratio of read count (for each gene) divided by the geometric mean across all generated RNA-seq data. Raw read counts were divided by the scaling factor. A fold-change in expression equal or higher than 2 with a p-value < 0.05 was considered as significant, and genes fulfilling these criteria were considered to be DE genes. For DE genes coding for HPs, structural modelling was done using Phyre 2 [21,22].

Sigma factors binding sites distribution and clustered genes
The presence/absence of binding sites for all known sigma factors, SigA-SigH and SigJ-SigM [76][77][78][79][80][81] were identified. For each binding site, the corresponding consensus sequence was retrieved and converted in python regular expressions. These regular expressions were used to check the presence/absence of each binding site in a region of 100 bp upstream of each identified genes organized in clusters.

Identification of non-coding RNAs
Putative ncRNAs were identified with Rockhopper [82,83] using default parameters for strand-specific reads, and then manually verified after visualization using Rockhopper viewer and COV2HTML [84]. In addition, a python script written using the two libraries matplotlib [85] and seaborn [86] to analyse and plot the coverage rate using different sizes of sliding windows across the reads alignment for the control and treated samples. All scripts for data management and analyses were written in-house in Python or R.