Bacterial small RNAs in the Genus Rickettsia

Background Rickettsia species are obligate intracellular Gram-negative pathogenic bacteria and the etiologic agents of diseases such as Rocky Mountain spotted fever (RMSF), Mediterranean spotted fever, epidemic typhus, and murine typhus. Genome sequencing revealed that R. prowazekii has ~25 % non-coding DNA, the majority of which is thought to be either “junk DNA” or pseudogenes resulting from genomic reduction. These characteristics also define other Rickettsia genomes. Bacterial small RNAs, whose biogenesis is predominantly attributed to either the intergenic regions (trans-acting) or to the antisense strand of an open reading frame (cis-acting), are now appreciated to be among the most important post-transcriptional regulators of bacterial virulence and growth. We hypothesize that intergenic regions in rickettsial species encode for small, non-coding RNAs (sRNAs) involved in the regulation of its transcriptome, leading to altered virulence and adaptation depending on the host niche. Results We employed a combination of bioinformatics and in vitro approaches to explore the presence of sRNAs in a number of species within Genus Rickettsia. Using the sRNA Identification Protocol using High-throughput Technology (SIPHT) web interface, we predicted over 1,700 small RNAs present in the intergenic regions of 16 different strains representing 13 rickettsial species. We further characterized novel sRNAs from typhus (R. prowazekii and R. typhi) and spotted fever (R. rickettsii and R. conorii) groups for their promoters and Rho-independent terminators using Bacterial Promoter Prediction Program (BPROM) and TransTermHP prediction algorithms, respectively. Strong σ70 promoters were predicted upstream of all novel small RNAs, indicating the potential for transcriptional activity. Next, we infected human microvascular endothelial cells (HMECs) with R. prowazekii for 3 h and 24 h and performed Next Generation Sequencing to experimentally validate the expression of 26 sRNA candidates predicted in R. prowazekii. Reverse transcriptase PCR was also used to further verify the expression of six putative novel sRNA candidates in R. prowazekii. Conclusions Our results yield clear evidence for the expression of novel R. prowazekii sRNA candidates during infection of HMECs. This is the first description of novel small RNAs for a highly pathogenic species of Rickettsia, which should lead to new insights into rickettsial virulence and adaptation mechanisms. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-2293-7) contains supplementary material, which is available to authorized users.


Background
Small noncoding RNAs (sRNAs) were first identified and described in the 1960s, but remained largely ignored until recently when they were recognized as important post-transcriptional regulators in both eukaryotic and prokaryotic organisms [1]. These sRNAs have been found in a number of pathogenic bacteria belonging to family Enterobacteriaceae, Pseudomonas aeruginosa, Listeria monocytogenes, Streptococcus pyogenes, Clostridium perfringens, and Staphylococcus aureus [2]. For example, Helicobacter pylori, the causative agent of chronic active, chronic persistent, and atrophic gastritis in adults and children, and implicated in a majority of duodenal and gastric ulcers, carries a repertoire of at least one anti-sense transcriptional start site on approximately 46 % of its open-reading frames, 28 % of tRNAs, and the 5′ leader sequences for both 16S rRNA and 23S rRNA [3]. Barring a few exceptions, sRNAs are typically 50 to 500 nucleotides in length and do not code for proteins [4][5][6]. Regulatory functions can predominantly be attributed to their interactions with proteins or target mRNA transcripts [7,8]. The sRNAs in the latter category manipulate RNA transcription through either cis-acting or trans-acting mechanisms [4,9]. By definition, cis-acting sRNAs are encoded on the anti-sense strand and generally display perfect nucleotide complementarity with the target sequence in the open reading frame. Trans-acting sRNAs, on the other hand, are encoded within the intergenic regions, act on the targets elsewhere in the genome, and possess short segments of partial nucleotide complementarity to their target genes [6][7][8]. Accordingly, they require a known RNA chaperone, namely Hfq, encoded by nearly 50 % of all bacterial species to facilitate their binding interactions with an mRNA transcript [2,7]. In other organisms such as Listeria monocytogenes, however, most trans-acting small RNAs function independent of the chaperone activity of Hfq [10].
The genus Rickettsia includes obligate, intracellular Gram-negative bacteria belonging to the class Alphaproteobacteria. Based on the transmitting natural vector, disease presentation, and antigenicity, this genus was traditionally divided into spotted fever and typhus as two major groups, but sophisticated molecular phylogenetic analysis now classifies rickettsiae into four groups, namely ancestral (R. bellii and R. canadensis), typhus (R. prowazekii and R. typhi), transitional (R. australis, R. akari, and R. felis), and spotted fever (R. rickettsii, R. conorii, R. massiliae, and numerous more) [11]. Upon transmission into humans from the arthropod vector, vascular endothelial cells are the primary targets of rickettsial infections, with the notable exception of R. akari, which primarily invades macrophages [12,13]. Among well-known human rickettsioses, Rocky Mountain spotted fever (RMSF) due to R. rickettsii and epidemic typhus caused by R. prowazekii are considered to be the most severe forms of disease. Without proper antibiotic treatment, mortality rate for RMSF is approximately 20 % [14,15], while the same for epidemic typhus reportedly ranges from 10 % to 60 % [12,16]. Also, R. prowazekii is unique in that patients can harbor sub-clinical infections after successful treatment of the primary infection and later develop recrudescent typhus, also known as Brill-Zinsser disease, despite being symptom-free for years [16,17]. On the other hand, Mediterranean spotted fever and endemic typhus, caused respectively by R. conorii and R. typhi, generally represent relatively milder forms of spotted fever and typhus [18,19].
Due to the historic importance of rickettsial diseases, their global distribution and associated morbidity or mortality, and potential implications in bioterrorism, the genome of R. prowazekii was the first to be sequenced and published [20]. Unlike other intracellular bacteria, whose genomes have very high coding densities, R. prowazekii was found to have 24 % non-coding DNA [20,21]. Such large amount of non-coding DNA was projected to be the consequence of genomic reduction and pseudogenization due to the loss or degradation of genes involved in several mechanisms leading to obligate intracellular lifestyle of this pathogen. A number of other rickettsial genomes, including those of R. rickettsii, R. conorii, R. typhi, and other notable species have since been sequenced and either published or made available in biomedical databases [22][23][24]. However, the presence of small, non-coding RNAs in different Rickettsia species still remains undetermined. With an aim to address this important knowledge gap, we used SIPHT/sRNAPredict2 to identify candidate novel sRNAs within the intergenic regions of all four rickettsial groups, leading to the prediction of a total of 1,785 novel sRNAs within 16 different strains representing 13 rickettsial species. We further analyzed the predicted sRNAs in R. prowazekii strain Brienl using other bioinformatic tools and experimentally validated their expression using reverse transcriptase-polymerase chain reaction (RT-PCR) and deep sequencing approaches. In tandem, these analyses constitute the very first evidence documenting the presence of sRNAs in rickettsial genomes.

Bioinformatic prediction of Small RNAs
Ready availability of complete genome sequences render computational approaches a widely acceptable first step for identification of sRNAs [25][26][27]. Such bioinformatic approaches search the intergenic regions (IGRs) of a bacterial genome for specific sRNA features. In general, the strategy involves an in-depth search of IGRs for the presence of Rho-independent terminators, promoters, and transcription factor binding sites, followed by the analysis of their secondary structure and comparison of such IGRs with closely related species [3,26,28]. In this study, we employed the web-based program SIPHT to examine the genomes of 16 rickettsial strains, which represent a total of 13 species spanning all four rickettsial groups. Four known plasmids, representing three from the spotted fever group strains and one from a transitional group strain, were also included. The R. felis pRFdelta plasmid (NC_007111.1) was excluded from the analysis as it was found to be an artifact from genome assembly [11]. We primarily employed recommended default settings to identify most sRNA features. To perform a more stringent search, however, we chose to decrease the Expectation Value (E value) from the default 5e −3 to 1e −15 to minimize false positives. As a result, we identified a total of 1,785 candidate rickettsial sRNAs (Additional file 1). On average, we predicted 74 candidates per ancestral strain, 21 candidates per typhus strain, 152 candidates per transitional strain, and 158 candidates per spotted fever strain. Table 1 categorizes the number of predictions by nucleotide size and rickettsial strain. Of the four plasmids examined, R. peacockii plasmid RPR had a single sRNA prediction.

Computational analysis of sRNA predictions
Based on the predictive analysis suggesting sRNAs in all rickettsial groups, we set out to analyze each of the candidate sRNAs. In order to examine a common set of bacterial sRNAs within Rickettsia species, we first investigated five well-known bacterial sRNAs, namely 6S RNA (ssrS), α-tmRNA (ssrA), RNaseP_bact_a, rpsL_ricks, and 4.5S RNA (ffs), and confirmed their presence in R. prowazekii. Using BLAST with an E-value cut-off of 1e −5 , we compared the prediction for each strain against others included in the study to find shared sRNA candidates [29]. As expected, we noted that rickettsial strains closely related to each other phylogenetically had a greater number of shared sRNA candidates when compared to those that are distantly related (Table 2). For example, R. rickettsii strain Sheila Smith (virulent) and strain Iowa (avirulent), which share 96.6 % homology [30,31], had 115 sRNAs in addition to ten other sRNA predictions in Iowa, and 20 present only in Sheila Smith.
We next employed the web-based program, BPROM, to determine promoter motifs for predicted sRNAs in the spotted fever and typhus group of rickettsiae, since they represent the most prominent groups of human pathogens. While it is still undetermined whether the ancestral group causes human disease, the transitional group species are established pathogens, but they account for a small fraction of reported rickettsiosis cases. Using all sRNA predictions within the typhus group (3 strains) and the spotted fever group (8 strains), we searched 150 nucleotides upstream of the predicted sRNA start site for the -10 and -35 promoter motifs because nearly 80 % of known σ 70 promoters in E. coli, considered to be a model organism, fall within 150 bp of the transcription start site [32]. While BPROM successfully predicted the -10 and -35 promoter sites for all candidate typhus group sRNAs, it was unable to predict a promoter site for one sRNA candidate (#132) belonging to R. rickettsii strain Sheila Smith.
Using the data obtained from the BPROM software, we calculated the average distance for the predicted -10 motif and -35 motif for both spotted fever and typhus groups of rickettsiae. For the typhus group, -10 and -35 motifs were an average of 67 and 88 (Stan. Dev. = ±22) nucleotides upstream of the sRNA start site, respectively. The spotted fever group had similar nucleotide distances at 70 and 91 (Stan. Dev = ±22) nucleotides upstream. The average distance between the -10 and -35 motifs      was 21 nucleotides for both groups, slightly longer than the reported 17 ± 1 nucleotide distance optimal for Escherichia coli genes [33,34]. Typhus group and spotted fever nucleotide frequencies for the -10 and -35 motifs were plotted using WebLogo3 (Fig. 1) In addition, the second nucleotide position in typhus is most conserved with a thymine in nearly 100 % of predicted sites, while it is approximately 90 % for spotted fever. This is in contrast to E. coli, which has a thymine with 79 % probability at the same nucleotide position.
In an attempt to explain the differences in sRNA predictions between Sheila Smith and Iowa strains of R. rickettsii, we performed a comparative analysis by mapping the sRNAs present only in one of the strain but absent in another. Accordingly, we compared 20 predictions from Sheila Smith strain and their corresponding 150bp up-and downstream sequences to the strain Iowa genome. All "prediction ± 150 bp" sequences were >99 % identical with the exception of two (#23 and #71) sRNAs. For instance, sRNA candidate #71 had a 20 bp sequence absent from the predicted sRNA sequence in the Iowa genome (Fig. 2). The same analysis was conducted for the 10 predictions present only in R. rickettsii strain Iowa but absent in strain Sheila Smith. Again, all but two "prediction ± 150 bp" sequences were nearly identical (>99 %). In this case, prediction #118 for strain Iowa had a 46 bp sequence that was absent in strain Sheila Smith (Fig. 3). Also, an ORF was annotated in corresponding genomic regions in strain Sheila Smith for nearly 30 % of sRNAs predicted only in strain Iowa, while two sRNAs had SNPs and indels in the Sheila Smith sequences potentially leading to altered thermostability of secondary structures. Similar observations were made for sRNAs present only on Sheila Smith but absent in Iowa strain. Since SIPHT relies on the conserved intergenic regions, secondary structures, presence of promoters, and terminator sequences, it is likely that the sRNAs predicted only in one strain, but not the other, result from these stringent criteria.

Candidate sRNA target identification
Although spotted fever rickettsiae cause disease worldwide, we chose to initially focus on R. prowazekii due to the high public health threat. In addition, due to bioweapon testing during World War II and the development of antibiotic resistant strains during the Cold War, R. prowazekii remains on the list of select agents with potential for bioterrorism [35]. To identify potential mRNA targets for predicted sRNAs within the R. prowazekii genome, we chose two independent web-based programs, TargetRNA2 and CopraRNA, to predict sRNA:mRNA interactions by assessing the base pairing potential based on a Smith-Waterman dynamic and conservation profile, respectively [36][37][38]. The search parameters were set to default and a p-value threshold of ≤0.05. A total of 393 potential targets were identified by TargetRNA2, whereas CopraRNA predicted 1154 protein coding genes to be regulated by sRNAs. A detailed comparative analysis revealed that 16 sRNA candidates had common target genes predicted by both programs and the remaining 10 candidates had independent predictions (Additional file 2). Two sRNA candidates (#6 and #22) had the highest number of 6 common targets predicted by TargetRNA2 and CopraRNA, in contrast to only one commonly predicted target for candidates #12 and #19. In summary, a total of 51 target genes were predicted by both programs, of which only 9 were categorized as hypothetical proteins. Of note, target genes such as virB10, ftsL, ftsQ, secA, ruvB, and 190kDa antigen were commonly predicted, indicating the potential role of post-transcriptional regulatory mechanisms in type IV secretion, cell division, and DNA repair. Table 3 lists the number of predicted target transcripts for each predicted sRNA. Interestingly, TargetRNA2 failed to predict targets for candidate #7, but CopraRNA predicted 53 target genes (Table 3). Nevertheless, if the lack of target prediction by TargetRNA2 holds true for candidate #7, this may be due to a sRNA:protein interaction, as is the case with 6S RNA, or it may simply represent a degraded ORF with an active promoter and terminator. Using this information, we parsed the predicted mRNA targets based on their respective protein function. We selected eight categories, including a category for 'other' (annotated ORFs with known function, but not categorized into a separate class based on function) and 'hypothetical proteins' (annotated ORFs with unknown/uncharacterized function), and separated the 393 predictions into different categories. The categories included cell division, cell wall, metabolism, ribosomal functions, virulence, type IV secretion system, transport proteins, and phagosomal escape (Table 4). Our TargetRNA2 and CopraRNA results demonstrate that the majority of known targets for sRNAs are involved in metabolism (71 vs. 197), ribosomal functions (51 vs. 129), and cell division (44 vs. 30). However, both the programs predicted a large number of target genes potentially regulated by sRNAs that were categorized as 'other' (90 vs. 370) and 'hypothetical proteins' (73 vs. 232), respectively (Table 4). While TargetRNA2 uses Smith-Waterman dynamic based base pairing, CopraRNA predictions are largely based on conservation between different genomes, possibly resulting in the differences in the number of predicted targets.

RNA sequencing
Next, we set out to confirm the expression of rickettsial sRNAs during infection of cultured human microvascular endothelial cells (HMECs) with R. prowazekii. We infected HMECs with R. prowazekii strain Breinl and extracted total RNA at 3 and 24 h post-infection. Our rationale for choosing these durations was to allow ample time for rickettsial entry and establishment of infection within the host cells and sufficient time for at least two replication cycles keeping in mind that replication time for intracellular rickettsiae ranges from 9 to 11h. After removal of rRNAs and eukaryotic mRNA, the enriched bacterial RNA was reverse transcribed into cDNA libraries and subjected to next generation sequencing using the Illumina HiSeq™ 1500 system. The resulting RNA reads were mapped onto the R. prowazekii strain Breinl (NC_020993) genome. Our deep sequencing resulted in approximately 42.6 to 46.2 million total reads for RNA isolated at 3 h and 27.4 to 28.6 million total reads at 24 h post-infection. Out of these, an average of 1.4 and 2.8 million reads mapped to the R. prowazekii genome at 3 h and 24 h, respectively. Rickettsia species include obligate intracellular bacteria with fastidious growth requirements in a host cell and cannot yet be cultured in a cell-free environment. Recently, it has been reported that intracellular organisms such as Rickettsia represent only 5 % of the extracted total RNA, while the remaining 95 % belongs to the eukaryotic host. Out of approximately 5 % bacterial total RNA, 95 % is composed of ribosomal and transfer RNA, while the remaining 5 % of the transcripts correspond to bacterial mRNA and sRNA, yielding a ratio of~1:400 bacterial mRNA and sRNA in total RNA extracted during the infection [39]. Although microbe enrichment is aimed at removing most of the polyadenylated eukaryotic transcripts and ribosomal RNAs, the process often accomplishes only limited removal of other interfering eukaryotic RNAs such tRNAs, noncoding RNAs, and mitochondrial RNA. Furthermore, high abundance of rRNAs in the host cells also interferes with the efficacy of their removal from the sample preparations. Supporting our results, a recent study has reported that only 2-5 % of the total reads mapped to the intracellular bacterial genomes despite enrichment of the total RNA [39]. By analyzing the sequencing data at the genome locations predicted by SIPHT, we found that twelve out of 26 predicted sRNA had a Mean Expression Value (MEV) that was ≥1.5 times compared to their respective 50 nucleotide upstream and downstream flanking regions ( Table 5). As expected, all five well known sRNAs,

Validation of sRNA predictions via RT-PCR
Prior to validating our predicted sRNAs, we decided to investigate the expression of 6S RNA within R. prowazekii. The underlying rationale for choosing 6S RNA to begin with was its particularly high abundance in E. coli, which can reach~10,000 copies during late stationary phase [40]. Using 16S rRNA as the endogenous control and infection for 1.5 h as baseline, we demonstrated a significant (p < 0.01) increase in its expression from 6 to 72 h post-infection (n = 5) using TaqMan-based real-time RT-PCR (Fig. 4). After confirming expression of 6S RNA, we chose nine sRNA candidates (#1, #2, #5, #9, #10, #11, #21 #24, and #25) to verify their expression in R. prowazekii str. Brienl during infection of HMECs. These were chosen based on their location within the genome, orientation comparative to the neighboring genes, and potential mRNA targets (Fig. 5). Candidates #11 and #21 were not detected using RT-PCR. However, the remaining seven candidates were detected using RT-PCR (n = 3) with an amplicon near the expected size. Figure 6 shows a representative agarose gel for candidates #1, #5, #9, #10, #24, and #25. Upon cross-reference with other R. prowazekii genomes that included strains Madrid E, Dachau, BuV67, Katsinyian, Chernikova, RpGvF24, GvV257, and Rp22, it was found that candidate #2 was anti-sense to the gene rnpB (RNaseP_bact_a) annotated only in R. prowazekii strain Rp22 (NC_017560). Therefore, any amplification is likely the result of rnpB expression. The remaining predictions demonstrated no association with any other annotated open-reading frames. To further confirm this observation, each sRNA sequence was examined for its ability to code for a protein. Using the ExPASy Translate Tool (Swiss Institute of Bioinformatics), all six possible translation initiation positions (3 each on 5′ and 3′ strands) were assessed to be devoid of protein coding capacity, yielding evidence that these are indeed small non-coding RNAs expressed during rickettsial infection of host endothelium.

Discussion
In this study, we report on a genome wide computational analysis to identify novel sRNAs within the genus Rickettsia. We have identified 1,785 sRNAs in 16 rickettsial strains belonging to 13 different species and spanning across all rickettsial groups. To further confirm our sRNA  Target genes are classified into ten categories based on either known or hypothetical function for R. prowazekii strain Breinl predictions, we have validated the expression of six predicted trans-acting sRNAs in R. prowazekii strain Breinl using high throughput sequencing and RT-PCR approaches. Since the initial discovery of sRNAs in 1960s, E. coli has been shown to harbor nearly 80 to 100 small RNAs, while Salmonella enterica serovar Typhimurium genome encodes for~140 small RNAs [4,41,42]. Abundant evidence now demonstrates the ubiquitous nature of sRNAs in bacterial genomes and implicates them to play an important role in virulence, quorum sensing, survival, plasmid expression, and primary and secondary metabolism in addition to several other housekeeping functions [43][44][45][46][47][48][49]. In Vibrio cholerae and V. harveyi, quorumsensing genes hapR and luxR are under the regulatory control of four and five sRNAs, respectively [46]. Furthermore, the deletion of three sRNAs in Listeria monocytogenes results in an attenuated phenotype in mouse models and the mutant strain is unable to grow in murine macrophages [44]. Similarly, rli38 knockout mutants of Listeria were found to be attenuated in orally infected mice, suggesting a role in the pathogen's virulence [50], and the deletion of lhrA in L. monocytogenes was capable of altering the expression of over 300 genes [10]. In Salmonella enterica, the AmgR small RNA controls the expression of the mgtCBR mRNA required for survival in macrophages and its over expression leads to decreased virulence in mouse models [45]. However, studies examining the potential regulatory roles of sRNAs in obligate intracellular bacteria remain rather limited. In addition to an sRNA that regulates hctA, 16 trans-acting and 25 cis-acting sRNAs have been identified in Chlamydia trachomatis, an intracellular human pathogen [51,52]. More recently, Coxiella burnetii and Buchnera aphidicola genomes are shown to encode for 14 and 140 sRNAs, respectively, and Coxiella sRNAs exhibit differential expression at different growth stages [53][54][55]. Here, we report on the existence of novel small RNAs in rickettsial genomes and their potential roles as determinants of pathogen virulence, host adaptation, and metabolism. Several prediction programs using parameters such as comparative genomics, RNA structure, and thermodynamic stability, have been developed and utilized to identify bacterial small RNAs [37,[56][57][58][59]. In this study, we have chosen SIPHT to predict trans-acting sRNAs in rickettsial genomes as this prediction tool uses several other well established and widely used programs to identify  potential transcription factor binding sites [60][61][62]; Rhoindependent terminators using RNAMotif [63], Trans-TermHP [64], and FindTerm [65]; conserved secondary structures by QRNA [56]; and conserved nucleotide sequences by BLASTN 2.0 [60]. Further, this program has been widely applied for sRNA predictions in several other bacteria attesting to its potential for accurately predicting bacterial sRNAs and its web-based availability makes it both user-friendly and easily accessible. Additionally, unlike its counterparts such as eQRNA and RNAz, SIPHT specifically searches for Rho-independent terminators and conserved intergenic structures significantly eliminating the chances of false-positive predictions [27]. Using SIPHT, we have predicted an average of 21, 74, 152 and 158 sRNAs in typhus, ancestral, transitional, and spotted fever groups of Rickettsia species, respectively. To test if predicted sRNAs have upstream transcription factor binding sites and downstream Rho-independent terminator (two independent criteria used by SIPHT), we have further analyzed all R. prowazekii sRNAs using BPROM [66] and TransTermHP [64]. All R. prowazekii sRNAs have a predicted upstream σ 70 promoter and a Rho-independent termination confirming the results retrieved from SIPHT (Fig. 1). Our genus-wide global analysis suggests that the repertoire of predicted sRNAs is independent of the size of respective rickettsial genomes. This is exemplified by the presence of 191 sRNAs in 1.31Mbp genome of R. peacocki (spotted fever), in contrast to only 100 sRNAs in 1.54Mbp genome of ancestral R. bellii. The number of sRNAs among rickettsiae in different groups, however, tends to directly correlate with their respective genome size, while the average number of sRNAs per Mbp of the genome within a particular group varies depending on the Rickettsia species/strain. For example, R. bellii and R. canadensis, belonging to the ancestral group and carrying the genomes of 1.54Mbp and 1.15Mbp, respectively, encode for bellii and R. canadensis revealed that R. bellii has 63 % more IGRs ranging from 1-300 bp. It is, therefore, possible that the lower number of sRNAs in R. canadensis is due to the differences in the number of IGRs included in the SIPHT analysis. Another notable difference is evident between R. akari and R. felis in the transitional group. While both had over 100 predictions, the coding density of sRNAs in R. akari (94 sRNAs/Mbp of genome) was 22 % lower in comparison to R. felis (121 sRNAs/Mbp of genome). Rickettsia are generally presumed to exhibit limited horizontal gene transfer (HGT) due to their obligate intracellular lifestyle. However, recent reports document the dynamic nature of their genomes and transposable elements, palindromic repeats, and horizontally acquired genes have been identified in several Rickettsia species [11,67,68]. For example, the transposable elements in R. felis cause inactivation of genes or integration of foreign DNA resulting in changes to both genomic content and arrangement [69]. In this regard, at least 79 genes in R. felis have been suggested to be acquired through horizontal gene transfer from other proteobacteria or amoebae [70]. Additionally, R. prowazekii and R. typhi, despite having similar size genomes, encode for 26 and 15 sRNAs, respectively. Again, the number of IGRs included in the SIPHT analysis varies between R. prowazekii strain Brienl and R. typhi strain Wilmington (540 vs 504), which may explain the differences in the total number of predicted sRNAs in these typhus rickettsiae genomes.
Although computational approaches yield convincing evidence for the existence of sRNAs in Rickettsia, it is critical to validate the expression of predicted sRNAs during host-pathogen interactions via experimental strategies. In this context, we first attempted to confirm the expression of 6S RNA (ssrS), a well-characterized small, noncoding RNA ubiquitously present in most bacterial lineages, including Gammaproteobacteria and Bacillales [53,71]. Although most bacteria encode a single copy of the ssrS gene, some bacterial species including Bacillus subtilis reportedly encode for two copies that are differentially expressed depending on the stage of growth [1]. 6S RNA is most abundantly expressed during late stationary phase, where it interacts with RNA polymerase and regulates σ 70 function. These data further support possible roles of 6S RNA in long-term survival and nutrient uptake [72,73]. Also, 6S RNA is potentially involved in intracellular stress response in C. burnetii and Legionella pneumophila [53,74]. We report a significant increase in the 6S RNA expression from 6 to 72 h post-infection when compared to the basal expression level at 1.5 h. Also, based on our MEV calculations, we observed an increase of 2-and 5-fold in the expression of 6S RNA at 3 and 24 h post-infection, respectively. This is in agreement with earlier findings from other pathogenic bacteria. Specifically, a 2-fold upregulation in its expression at 72 h appears to correspond to a similar increase seen in C. burnetii [53]. Our data further suggest that the highest expression at 72 h postinfection coincides with the intracellular growth kinetics of rickettsiae [75]. A comprehensive analysis to elucidate its mechanisms of action and regulatory functions in intracellular rickettsiae is warranted and currently in progress.
Because RNA sequencing is a novel and robust methodology, which provides valuable insights into the global transcriptome [76], we subjected total RNA from R. prowazekii-infected endothelial cells to validate the presence of sRNAs and their expression during host-pathogen interactions. Since the major focus of this study was the identification and validation of intergenic trans-acting sRNAs, additional information on the cis-acting expressed by R. prowazekii strain Breinl during infection of HMECs was excluded to perform a direct comparative and confirmatory analysis of SIPHT based sRNA predictions versus their expression in vitro. Furthermore, none of the web based sRNA prediction tools have the ability to identify cis-acting sRNAs in the genomes. MEV based identification of sRNAs in bacterial genomes is a widely utilized approach that exploits the expression profile of sRNAs and their respective flanking regions to determine the biogenesis of sRNAs [53,77,78]. To this end, the reads mapping to each nucleotide of the sRNAs and their respective 50 bp flanking regions were normalized using the total number of reads mapping to the rickettsial genome (excluding those mapping to the rRNAs and tRNAs) and then to their length, and MEVs were determined to decipher the expression of sRNAs from potential readthroughs attributed to flanking ORFs due to leaky transcriptional termination in R. prowazekii [79]. Nearly 50 % of predicted sRNAs in R. prowazekii exhibited an MEV of ≥1.5 when compared to respective flanking regions indicating their biogenesis and expression independent of neighboring genes. The reads' coverage plots for 6S RNA, RNAseP_bact_a, and α-tmRNA clearly demonstrate the independent expression of these sRNAs (Additional file 3).
During invasion of epithelial cells and intracellular replication within macrophages, Salmonella expresses IsrM RNA encoded in Salmonella pathogenicity islands [80]. L. monocytogenes encodes a thermosensor sRNA, which upon encountering human body temperature (37°C) forms an alternative secondary structure and activates adhesins, phagosome escape mechanisms, and other immune-regulating factors [81]. Further assessment of the expression profile of five sRNAs (#5, 9, 10, 24, & 25) showing an MEV of ≥1.5 revealed their steady-state expression of these sRNAs from 1.5 to 72 h post-infection indicating potential roles in pathogenesis. Interestingly, despite an MEV of <1.5, candidate #1 was expressed between 6 to 72 h and its expression increased over time, most notably after 24 h. Since our in-depth transcriptome analysis was performed at 3 and 24 h post-infection, we hypothesize that the relative expression of candidate #1 is likely inadequate to generate sufficient reads to achieve an MEV greater than 1.5 fold. Alternatively, the sequencing depth may not have been high enough to achieve a 1.5fold difference. Even though 50 % of the predicted sRNAs were either not detected or expressed below the cut-off MEV in our RNA-seq analysis, it is plausible that they are bonafide sRNAs conditionally expressed during other conditions such as stress and host-vector interactions. Previous studies have shown that different environments induce specific sRNAs. Small RNA ryhB, known to downregulate genes involved in iron storage in E. coli, is induced mainly during low iron conditions [82]. In Salmonella enterica serovar Typhimurium, IsrJ sRNA is induced under low oxygen and magnesium environments and elevated levels of IsrE are observed in iron responsive environment [83]. Similarly, H. pylori is known to induce the expression of six small RNAs (IsoA1-6) associated with acid stress [3]. Alternatively, it is also plausible that SIPHT may have identified degraded ORFs [20,84]. Rickettsial genomes are known to evolve by reductive evolution (gene degradation) and transposons are known to play a pivotal role in gene inactivation [11,22,68,85,86]. R. prowazekii is known to have pseudogenes potentially resulting from gene inactivation [20]. Since SIPHT uses the presence of an upstream promoter and downstream transcriptional terminator as the main criteria for predicting sRNAs, it is possible that some of sRNA transcripts predicted by SIPHT potentially map to the degrading ORFs, which still retain conserved promoter and terminator regions.
Despite the abundance of sRNAs in all bacterial lineages, little is known about their function and mechanism of action within the bacterial genomes and only a few sRNAs have been assigned with functions till date [53]. Using TargetRNA2 and CopraRNA, we have predicted the target mRNAs regulated by R. prowazekii sRNAs. Functional categorization of the target genes regulated by sRNAs resulted in identification of genes involved in key pathways of cell division, transport, phagosomal escape, virulence, type IV secretion system, and metabolism. A majority of these pathways are critical for the growth and survival of Rickettsia in the host cytoplasm. For example, we have identified 33 genes involved in transport mechanisms and potentially regulated by sRNAs, a function important for rickettsial survival in vivo as they encode for translocases required for the exchange of ADP with ATP from host cell cytosol [11,87,88]. Following invasion into host cell, rickettsiae quickly escape escape into the host cytosol by phagosome degradation and published studies have implicated a role of rickettsial hemolysin C (tlyC) and phospholipase D (pld) in phagosomal escape [89,90]. We have identified two sRNAs, #24 and 27, with the potential to regulate tlyC and pld, respectively, suggesting an important role for these sRNA in the establishment of infection. A significant number (18 %) of predicted target genes were categorized as 'hypothetical proteins' , which is not surprising considering that nearly 26 % of the 914 R. prowazekii genes are still reported as uncharacterized ORFs. As rickettsial genes are further investigated for their functional roles, we anticipate that most of these hypothetical proteins will likely be assigned a role in virulence, survival, and pathogenesis during host-pathogen and vector-pathogen interactions.

Conclusions
Bacterial small RNAs are now well appreciated as the major post-transcriptional regulators involved in key processes such as virulence, quorum sensing, survival, plasmid expression, and primary and secondary metabolism. Rickettsia species, despite undergoing reductive evolution, generally harbor~25-30 % intergenic regions and presumably encode for trans-acting sRNAs. This study was aimed at identifying trans-acting sRNAs in rickettsial genomes and their possible roles in host-pathogen interactions. We have identified 1785 sRNAs in 13 rickettsial species spanning across all four rickettsial groups, and validated the expression of R. prowazekii sRNAs by RT-PCR and high throughput transcriptome analysis. Furthermore, using Taqman assay, we have quantified the expression of R. prowazekii 6S RNA, a small RNA known to regulate the housekeeping transcription factor σ70, during host cell infection. Our study is the first to report on the existence of small RNAs in the genus Rickettsia. Further studies are required to validate candidate trans-acting and identify cis-acting sRNAs as well as determine their functions in rickettsial physiology and pathogenesis.

Rickettsia species and strains
For this study, available genome sequences of 16 rickettsial strains, encompassing 13 species belong to the genus Rickettsia were used. Included species represent all four, namely ancestral, typhus, transitional, and spotted fever, rickettsial groups (Additional file 4). Further, four known rickettsial plasmids were also subjected to the proposed analysis (Additional file 4).

Prediction of sRNAs
We used the web-based program SIPHT available from the University of Wisconsin at Madison (http://newbio.cs. wisc.edu/sRNA/index.php) for predicting sRNAs in rickettsial genomes [26]. This program predicts sRNAs within the intergenic regions of bacterial genomes by searching for Rho-independent terminators downstream of conserved sequences, followed by an analysis of conservation with other species, potential transcription factor binding sites, the spacing between flanking genes, and homology with known sRNAs [26]. The specific parameters used for all searches were as follows. The maximum expected value for BLAST (BLAST E) was changed from the default setting of 5e-3 to a more stringent value of 5e-15 in order to eliminate the possibility of false positives. The minimum score for BLAST (BLAST S) and minimum percent identity (BLAST % identity) were set at the default 0. The maximum BLAST high-scoring segment pairs (HSP) length was set at 1000. The maximum Rho-independent terminator criteria were 86, -10, and -6 for TransTerm, FindTerm, and RNAMotif, respectively. The minimum predicted locus length was 30, while the maximum predicted locus length was 550. These scores take into account the generally defined 50 to 500nucleotide length of bacterial sRNAs [4]. The minimum distance by default for both the locus start site to ORF start site and for the locus start site to ORF end site was −65. However, the minimum distance set by default from the locus end site to ORF start site was −20. The minimum distance from the locus end site to ORF stop site was left at 35. Lastly, the minimum distance from the transcription factor-binding site to the ORF start site was 0. This value allows for the inclusion of all possible candidate sRNAs in the final report. To ensure consistency, all of these parameters were set as the default analytical criteria for the program.

Promoter prediction
For bacterial promoter predictions, the web-based software BPROM was used. This program searches for bacterial σ 70family promoter -10 box and −35 box, transcription start site, and other transcription factor binding sites in a given genomic sequence with a reported accuracy and specificity of 80 % [66]. Each promoter prediction was conducted using 150 base pairs upstream of the predicted sRNA start site. Nucleotide frequency plots were created using the −10 box and −35 box predictions. The web based program WebLogo3 from the University of California at Berkeley was used to generate the sequence logos [91].

Target prediction
Target genes for each candidate sRNA were predicted using the web based program TargetRNA2 [36]. This program searches a genome's annotated features for a statistically significant base pair-binding potential to the queried nucleotide input and calculates a hybridization score followed by a statistical significance of each potential RNA-RNA interaction [36]. The individual base pair model was used throughout our target prediction procedures. The following parameters were used for each prediction. For statistical significance, the p-value was set at ≤0.05. The program searched 80 nucleotides before the start codon and 20 nucleotides after the start codon. The selected seed length was 7 consecutive nucleotides, which corresponds to the average seed length (6 to 8 nucleotides) for trans-acting sRNAs [4]. The filter size, which corresponds to how the program filters out nontarget mRNA, was set at the default 400.

Cell culture, infection, and RNA isolation
Human dermal microvascular endothelial cells (HMECs) were cultured in MCDB131 medium with L-glutamine (10mmol/L), mouse epidermal growth factor (10ng/ml), hydrocortisone (1μg/mL), and 10 % heat-inactivated fetal bovine serum [92]. The use of HMECs as an established cell line for in vitro studies was exempt from the review and approval by the Institutional Review Board, but was approved by the Institutional Biosafety Committee at the University of Texas Medical Branch. Cells were grown at 37°C with 5 % CO 2 until approximately 80 to 90 % confluency. Rickettsia prowazekii strain Breinl was cultivated in Vero cells and purified by differential centrifugation to prepare seed stocks for host cell infection experiments. Titers were estimated by a combination of plaque formation assay and quantitative PCR (qPCR) using primer pair Rp877p-Rp1258n for citrate synthase gene (gltA) [92,93]. Infection was carried out under appropriate Biosafety Level 3 conditions using approximately 6 X 10 4 pfu of rickettsiae/cm 2 of culture surface area. These conditions yield an infection of >80 % of cells with approximately six intracellular rickettsiae per cell [92,94,95]. After 15 min of incubation with gentle rocking to allow for sufficient adhesion and invasion, the medium was removed and replaced with fresh medium. Infected cells were then further incubated at 37°C with 5 % CO 2. Total RNA was isolated at 1.5, 3, 6, 12, 24, 48, and 72 h postinfection using Tri-Reagent ® (Molecular Research Center). RNA was extracted using the Direct-zol™ RNA MiniPrep kit (Zymo Research). Column DNaseI treatment (Zymo Research) was performed on all RNA samples to eliminate contaminating genomic DNA.

RNA sequencing
In order to sequence the rickettsial transcriptome, HMECs were infected with R. prowazekii strain Breinl and total RNA was isolated at 3 and 24 h post-infection using Tri-Reagent (Molecular Research Center). Samples were treated with DNaseI (Zymo Research) to eliminate contaminating genomic DNA and subjected to the MICROBEnrich Kit (Ambion) to remove interfering eukaryotic mRNAs. Ribosomal RNA was then removed using the Ribo-Zero kit (Epicentre). RNA was quantified using the MultiSkan Go (Thermo Scientific) and analyzed using the Agilent 2100 Bioanalyzer (Agilent Technologies). For each experimental condition, two independent cDNA libraries were created and sequenced on the HiSeq 1500 (Illumina) located at the Next Generation Sequencing Core, University of Texas Medical Branch at Galveston. Enriched RNA used for cDNA synthesis was not size selected and strand-specific sequencing was performed. Each library consisted of 50bp long subsequences (reads) in a FASTQ format. Each read was assessed for its quality. Any base with a PHRED score of 15 or below was excluded. The first 14 bases of the read were trimmed and the remaining 36 bases were used for analysis. All reads mapping to human genome version GRCh38/hg38 were excluded from the analysis. The remaining rickettsial transcripts were then mapped to R. prowazekii strain Breinl genome (NC_020993) allowing up to two base mismatches using Bowtie2 [96]. For each prediction, the average read coverage for each nucleotide was normalized to the length of the predicted sRNA. The same was computed for 50 nucleotides up-and downstream of each prediction. The Mean Expression Value (MEV) was calculated by computing the ratio between the predicted sRNA and the flanking 50 nucleotides [53,77]. An MEV cutoff value of ≥1.5 was used throughout this work.

Reverse transcriptase PCR
One microgram (1μg) of DNase I treated total RNA was reverse transcribed using SuperScript ® VILO cDNA Synthesis Kit (Life Technologies) with random hexamers following manufacturer's instructions. Quantitative RT-PCR was performed on StepOnePlus™ Real-Time PCR System (Applied Biosystems) using primers designed by Primer Express 3.0.1 (Applied Biosystems). For the TaqMan ® Assay, each 20 μL reaction contained 1X TaqMan ® Universal PCR Master Mix (Life Technologies), 250 nM forward primer, 250 nM reverse primer, 250 nM TaqMan probe, and 1.1 ng/μL of cDNA. Cycler conditions were: stage 1 at 50°C for 2 min, stage 2 at 95°C for 10 min, stage 3 (40 cycles) at 95°C for 15 s and 60°C for 60 s. Each TaqMan ® technical replicate was performed in triplicate using five biological replicates. Primers are listed in Additional file 5.
Reverse transcriptase PCRs were performed using Phusion ® High-Fidelity PCR Kit (New England BioLabs). Each 20 μL reaction contained a final concentration of 1X Phusion HF Buffer, 0.2μM dNTPs, 0.5μM forward primer, 0.5 μM reverse primer, 100 ng cDNA template, and 0.4 units of Phusion DNA polymerase. Thermal cycler conditions were: stage 1 at 98°C for 30 s, stage 2 (35 cycles) at 98°C for 15 s, 60°C for 30 s, and 72°C for 30 s, and stage 3 at 72°C for 10 min. Samples were separated on a 2 % agarose gel, stained with ethidium bromide, and imaged on ChemiDoc MP imaging system (Bio-Rad). Primers are listed in Additional file 5.