Computational discovery and RT-PCR validation of novel Burkholderia conserved and Burkholderia pseudomallei unique sRNAs
© Khoo et al.; licensee BioMed Central Ltd. 2012
Published: 13 December 2012
Skip to main content
© Khoo et al.; licensee BioMed Central Ltd. 2012
Published: 13 December 2012
The sRNAs of bacterial pathogens are known to be involved in various cellular roles including environmental adaptation as well as regulation of virulence and pathogenicity. It is expected that sRNAs may also have similar functions for Burkholderia pseudomallei, a soil bacterium that can adapt to diverse environmental conditions, which causes the disease melioidosis and is also able to infect a wide variety of hosts.
By integrating several proven sRNA prediction programs into a computational pipeline, available Burkholderia spp. genomes were screened to identify sRNA gene candidates. Orthologous sRNA candidates were then identified via comparative analysis. From the total prediction, 21 candidates were found to have Rfam homologs. RT-PCR and sequencing of candidate sRNA genes of unknown functions revealed six putative sRNAs which were highly conserved in Burkholderia spp. and two that were unique to B. pseudomallei present in a normal culture conditions transcriptome. The validated sRNAs include potential cis-acting elements associated with the modulation of methionine metabolism and one B. pseudomallei-specific sRNA that is expected to bind to the Hfq protein.
The use of the pipeline developed in this study and subsequent comparative analysis have successfully aided in the discovery and shortlisting of sRNA gene candidates for validation. This integrated approach identified 29 B. pseudomallei sRNA genes - of which 21 have Rfam homologs and 8 are novel.
Small RNAs (sRNAs) are known to function as regulatory or catalytic molecules in bacteria with sequences normally ranging from ~50-250 nt in length and located in the intergenic regions (IGRs) [1, 2]. Although sRNAs with catalytic functions have been reported [3, 4], many of these molecules are known or believed to function as regulatory nucleic acid elements that target near, or at, the translation start site of their dedicated mRNA targets via imperfect sequence complementarity [5–7]. In E. coli, less than 100 sRNAs, accounting for ~0.3% of the genome, have been reported [8–10]. Although these riboregulators represent only a small fraction of the prokaryotic genome, they have been shown to play essential regulatory roles in bacteria, including cell surface modulation , plasmid number control , stress adaptation , quorum sensing  and carbon storage . Other regulatory sRNAs interact with and modulate cellular protein activities .
In pathogenic bacteria, sRNAs have been associated with regulatory networks that modulate the adherence to, and invasion into the host cell [17, 18], environmental adaptation [19, 20] as well as virulence and pathogenicity [17, 18, 20–23]. In several bacterial pathogens, including Salmonella typhimurium , Vibrio cholerae , Yersinia enterocolitica , Brucella abortus  and Pseudomonas aeruginosa , deletion of the hfq gene which encodes the RNA chaperone Hfq, has been shown to severely attenuate virulence. The Hfq protein is known to facilitate the pairing interaction between sRNAs and their target mRNAs . Identification and analysis of sRNAs in pathogenic bacteria may improve current understanding on the molecular mechanisms of host adaptation and virulence. Hence, we carried out a computational based analysis of available Burkholderia spp. genomes to identify potential sRNA sequences and to further delineate sRNAs that are present only in the pathogenic members.
Members of the Burkholderia genus also play important roles as environmental saprophytes. One species of this genus, B. pseudomallei, is the causative agent of melioidosis, a disease endemic to Southeast Asia and northern Australia. This species reportedly has a highly dynamic genome and versatile phenotypes [29–31], thus contributing to its capability to infect nearly all cell types, resulting in a wide spectrum of disease symptoms that confounds diagnosis and delays prompt treatment. B. pseudomallei is an effective pathogen of a broad range of hosts (amoeba , nematodes , dolphins , birds, camels, alpacas, sheep , humans and even plants ). The enigma of B. pseudomallei is further compounded in having an extremely prolonged latent infection capacity  and has been shown to be capable of surviving in a nutrient-free environment for 16 years .
B. pseudomallei is believed to have an array of virulence and pathogenicity factors, including a toxin which is a deamidase named Burkholderia Lethal Factor 1 (BLF1) that targets the translation initiation factor eIF4a . However, the regulation and delivery mechanism of BLF1 to the target protein remains unclear. To date, the mechanisms of adaptation to environmental stress and changes have not been conclusively identified, however a large number of sRNA genes have been reported for B. cenocepacia J2315, another pathogenic member of the Burkholderia genus . These sRNAs were proposed to be responsible for the bacterium's complexity, phenotypic variability and ability to survive in a remarkably wide range of environments .
At present, one can opt for either a knowledge-based approach or a de novo approach for sRNA discovery in a bacterial genome. Knowledge-based techniques search for homologues of known sRNAs based on specific features of the sequences and will usually include upstream regulatory elements, sequence and structural characteristics and downstream targets as a search profile. A number of knowledge-based programs were developed to identify particular sets of sRNAs through homology analysis. One such program, Infernal , was the workhorse used to build the Rfam database . However, predictions relying on homology information limit the applications of such programs to sRNA genes with known homologues and therefore, the methods are insufficient in situations where many if not most bacterial sRNAs remain unidentified. A de novo approach can serve a complementary role in predicting novel sRNA genes that are beyond the profile scope of knowledge-based approaches. The basis of a de novo search lies in the common features of sRNAs in the genomes - sequence and structural conservation, shared physical co-localization, structural stability, existence of transcriptional signals and GC bias - without prior knowledge of the sRNAs to be discovered. Such an approach was applied with various sRNA gene finders such as QRNA , RNAz [42, 43], sRNAPredict [44, 45] and sRNAscanner . In this paper, we report the development of a computational pipeline that integrated successful sRNA prediction programs to identify candidate sRNA genes in B. pseudomallei and subsequent validation by RT-PCR and Sanger sequencing.
The intergenic sequences (here, defined as sequences between annotated ORFs) of the replicons were extracted using Artemis v12.0.3  and searched against the Rfam database v10.0 by executing the script rfam_scan.pl v1.0. The supporting software used for the search included BLAST v2.2.22 , Infernal v1.0, Perl v5.10.0 and BioPerl v1.6.0.
SIPHT searches were restricted to detect sRNA genes within the range of 30-550 nucleotides and executed via the web server (URL: http://newbio.cs.wisc.edu/sRNA/). Other parameters were optimized as suggested ; i.e., maximum E value: 1e-15, minimum TransTerm confidence value: 87, maximum FindTerm score: -10, maximum RNAMotif score: -9. All replicons, except the replicon of interest, were included as a partner replicon for the search.
The program sRNAscanner_Ubuntu10 (released 31 August 2010) was used to screen both the forward and reverse strands of the query replicon. The searches were restricted to intergenic regions and the sRNA length for prediction was set to 30-550 nucleotides. All other parameters were left at their default values, i.e. 3 provided input matrices: 35box_sRNA.matrix (cut-off: 2), 10box_sRNA.matrix (cut-off: 2), terminator.txt.matrix (cut-off: 3); spacer range between [-35] & [-10] promoter boxes: 12-18; unique hit value: 200; minimum cumulative sum of score (CSS): 14.
The genome sequences of 11 Burkholderia spp and 3 Ralstonia spp (.fna extensions), annotation files (.gbk and .ptt extensions) and the complete genomic sequences of RefSeq-release47 (.genomic.fna extensions) were obtained from NCBI (Additional file 1). The genome sequences of five local strains of B. pseudomallei (unpublished data) were used for cross-referencing purposes. The Rfam database v10.0, both .fasta and .cm extensions for 1,446 sRNA families, was downloaded from ftp://ftp.sanger.ac.uk/pub/databases/Rfam/.
The intergenic sequences of B. pseudomallei K96243 were compared to sRNA candidates predicted in the Ralstonia and Burkholderia genomes using blastn v2.2.21 (parameters: -e 1e-5 -r 1 -q -1 -G 1 -E 2 -W 9 -F "m D"). The results were visualized using ACT v9.0.3  and the gene physical co-localization for the sRNAs of interest were investigated.
The secondary structures of the sRNA transcripts were predicted using mfold (unafold v3.8)  and RNAfold (ViennaRNA v1.8.4) . The default parameters or standard conditions for RNA folding were accepted (37°C, 1M NaCl, no divalent ions). The predicted structures were visualized using VARNA v3.7 .
Sequences for sRNAs of interest were globally aligned and consensus secondary structures were predicted using LocARNA  via its web service (URL: http://rna.tbi.univie.ac.at/cgi-bin/LocARNA.cgi). The default parameters for scoring the alignments were accepted (RIBOSUM85_60 matrix, Indel-opening score: -500, Indel score: -350, structure weight: 180, avoid lonely base-pairs). Covariance models representing the alignments with consensus structures were built, calibrated and searched against complete genome sequences in the RefSeq database release 47 using Infernal v1.0 with an E-value ≥ 1e-3.
The B. pseudomallei D286 human isolate was obtained from the Pathogen Laboratory, School of Biosciences and Biotechnology, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, Malaysia. Stock cultures were stored at -70°C and routinely cultured on brain-heart infusion agar (BHIA) (Pronadisa Hispanlab, South Africa) at 37°C . Bacteria from a stock culture were taken and streaked on Ashdown agar, and incubated at 37°C for 48 hours. A single colony was picked from the plate and inoculated into Brain Heart Infusion broth (BHIB) overnight. The following day, the culture was diluted 1:100 and grown in BHIB until the OD600 reached 0.6 - 1.0. Total RNA was extracted using TRIzol® LS Reagent (Invitrogen, Carlsbad, CA) and purified using Ambion's DNAfree™ DNase Treatment and Removal Reagents (Life Technologies, Carlsbad, CA).
The purified RNA was reverse transcribed into cDNA with an oligo(dT)18 primer using RevertAid First Strand cDNA Synthesis Kit (Fermentas, Hamburg, Germany). The cDNA produced was used as the template for PCR together with primers that were designed based on the sequences of sRNA candidates (Additional file 2). Amplification reactions were performed in a total volume of 25 μL consisting 10x PCR buffer, 10 mmol/L of dNTP mix, approximately 100 ng of cDNA, 25 pmol of each primer, 1.0 U Taq polymerase (Promega, Madison, WI) and distilled water. Mastercycler® personal (Eppendorf, Hamburg, Germany) was used to perform gradient PCR, with an initial denaturation step of 2 minutes at 95°C, followed by 35 amplification cycles of 30 seconds at 95°C, 30 seconds at 54-62°C, and 30 seconds at 72°C, and a final extension of 2 minutes at 72°C. Amplified products were analyzed by 3% agarose gel electrophoresis with O'GeneRuler™ Low Range DNA Ladder (Fermentas, Vilnius, Lithuania) run in parallel. PCR products were purified with the QIAquick Gel Purification Kit (Qiagen, Germany) and used in the reaction with the BigDye® Terminator v3.1 Cycle Sequencing Kit (Applied Biosystem, Foster City, CA). Three biological replicates were carried out for each RT-PCR primer sets. The PCR products were then sequenced on the ABI Prism® 3100 AVANT DNA Sequencer. The sequences obtained were analyzed using BioEdit v184.108.40.206 and compared with the genome sequence of B. pseudomallei D286 human isolate.
Discovery and verification of bacterial sRNAs in previous studies.
Computational discovery method
Number of sRNAs
Pftools2.2 & RNAMotif
BLAST & TransTermHP
RT-PCR & microarray
Salmonella enterica Typhimurium
RNAz & nocoRNAc
SIPHT, sRNAscanner & Rfam_scan
The IGR sequences identified for B. pseudomallei were compared against the 8,920 sRNA candidates using a BLAST-based (blastn) method. The purpose for this comparative analysis is to determine the conservation of sRNA candidates among the closely related bacterial species. As mis-annotations occur in genomes and each of the gene predictors have their own limitations, it was therefore no surprise to detect putative sRNAs from this comparison but not predicted by the sRNA search pipeline. A total of 1,213 out of 4,978 (approximately 24%) B. pseudomallei IGRs were predicted to contain at least one sRNA gene. The complete results list for this comparative analysis is provided as Additional file 4. As two or more sRNA genes could be predicted at the same strand and location, the overlapping candidates were merged before further analysis. For example, if gene A (location: 100 - 200) overlaps with gene B (location 150 - 250), the genes were merged into gene C (location: 100 - 250).
List of B. pseudomallei sRNA sequences with their corresponding sRNA families as reported in Rfam.
Coordinates from Rfam
Excluding the 21 homologues to known sRNAs, 20 previously undescribed candidates (also referred to in this paper as novel sRNAs) that were conserved in at least eight out of the fourteen bacterial genomes analyzed were selected for predicted secondary structure comparison where the calculated secondary structures were visually examined. A total of twelve sRNAs with perceivably conserved secondary structures were selected for experimental validation (discussed in the next section).
List of RT-PCR validated sRNA genes in conserved in Burkholderia and unique to Burkholderia pseudomallei.
Start - end/Length
Conservation (Infernal search)
Highly conserved in Burkholderia
110185 - 110354/170
Bacteria (detected in Proteobacteria, Bacteroidetes, Firmicutes, etc)
2290411 - 2290508/98
2768674 - 2768787/114
Bacteria (detected in Actinobacteria, Cyanobacteria, Firmicutes, etc)
2887980 - 2888055/76
3154052 - 3154260/209
4031759 - 4031986/228
2326038 - 2326224/187
Proteobacteria (predominantly in Burkholderiales, detected in Deltaproteobacteria and Gammaproteobacteria)
Unique to B. pseudomallei
892370 - 892562/193
575285 - 575425/141
Bp1_Cand684_SIPHT was detected in different groups of bacteria, including Actinobacteria, Cyanobacteria and Firmicutes. Physical co-localization analysis showed that the flanking genes were not associated with the same pathways or functions (Figure 6E), suggesting a possible trans-acting role.
Bp1_Cand612_SIPHT, Bp1_Cand697_SIPHT and Bp1_Cand738_SIPHT are RT-PCR validated sRNA candidates that were found to be Burkholderia-specific. These three sRNAs were not detected in bacteria other than Burkholderia spp. during the Infernal search. From the physical co-localization analysis, each of these three sRNA genes has similar flanking genes in different Burkholderia spp. (Figure 6B-D). For Bp1_Cand612_SIPHT and Bp1_Cand697_SIPHT, although R. solanacearum has a similar gene arrangement at the equivalent regions, no such sRNA genes were predicted in that genome.
A total of 1,306 B. pseudomallei sRNA genes were predicted in this study of which: 21 have homologs in Rfam; 15 novel sRNAs were shortlisted due to their conservation in Burkholderia spp. or different B. pseudomallei strains; and 8 of these were verified experimentally. Though the functions for the novel sRNAs obtained in this study remain unknown, their presence in B. pseudomallei is evidence that sRNAs are indeed involved in this bacterium's many different cellular activities that may include regulation of pathogenesis and virulence mechanisms as well as adaptation to environmentally induced changes.
Research funding was provided by Universiti Kebangsaan Malaysia via the UKM research university grants UKM-GUP-KPB-08-33-132 and DIP-2012-13 and the Ministry of Higher Education Malaysia grant ERGS/1/2012/STG08/UKM/02/5. KJS was funded by a National Science Fellowship from the Ministry of Science, Technology and Innovation, Malaysia. CSF was funded by the MyMaster-MyBrain 15 scholarship from the Ministry of Higher Education, Malaysia and the Universiti Kebangsaan Malaysia Zamalah postgraduate research fellowship.
This article has been published as part of BMC Genomics Volume 13 Supplement 7, 2012: Eleventh International Conference on Bioinformatics (InCoB2012): Computational Biology. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcgenomics/supplements/13/S7.
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.