Identification of four novel small non-coding RNAs from Xanthomonas campestris pathovar campestris

Background In bacteria, small non-coding RNAs (sRNAs) have been recognized as important regulators of various cellular processes. Approximately 200 bacterial sRNAs in total have been reported. However, very few sRNAs have been identified from phytopathogenic bacteria. Results Xanthomons campestris pathovar campestris (Xcc) is the causal agent of black rot disease of cruciferous crops. In this study, a cDNA library was constructed from the low-molecular weight RNA isolated from the Xcc strain 8004 grown to exponential phase in the minimal medium XVM2. Seven sRNA candidates were obtained by sequencing screen of 2,500 clones from the library and four of them were confirmed to be sRNAs by Northern hybridization, which were named sRNA-Xcc1, sRNA-Xcc2, sRNA-Xcc3, and sRNA-Xcc4. The transcription start and stop sites of these sRNAs were further determined. BLAST analysis revealed that the four sRNAs are novel. Bioinformatics prediction showed that a large number of genes with various known or unknown functions in Xcc 8004 are potential targets of sRNA-Xcc1, sRNA-Xcc3 and sRNA-Xcc4. In contrast, only a few genes were predicted to be potential targets of sRNA-Xcc2. Conclusion We have identified four novel sRNAs from Xcc by a large-scale screen. Bioinformatics analysis suggests that they may perform various functions. This work provides the first step toward understanding the role of sRNAs in the molecular mechanisms of Xanthomonas campestris pathogenesis.


Background
Numerous evidences show that small non-coding RNAs (sRNAs) exist in all three domains of life, i.e. Eukarya, Bacteria and Archaea. Bacterial sRNAs are normally between 50 and 500 nucleotides in length. It has been demonstrated that many bacterial sRNAs act as regulators of gene expression, although the function of the majority of identified bacterial sRNAs is still unknown. Recent studies have revealed that in bacteria sRNAs control various cellular processes, including acid resistance [1], iron homeostasis [2], sugar metabolism [3], envelope stress response [4,5], quorum sensing [6], as well as virulence [7,8]. Most bacterial sRNAs characterized to date regulate gene expression either by pairing to their mRNA targets and thus affecting their stability and/or transla-tion, or by binding to proteins to modify their mRNAbinding activity [9,10].
The Gram-negative bacterium Xanthomonas campestris pathovar campestris (Xcc) is the causal agent of black rot disease of cruciferous crops worldwide [26]. This pathogen infects almost all the members of crucifer family (Brassicaceae), including important vegetables such as broccoli, cabbage, cauliflower, mustard, and radish; the major oil crop rape; and the model plant Arabidopsis thaliana. In recent decades, the black rot disease has become more prevalent and caused severe losses in vegetable and edible oil productions in many countries [27]. In addition, Xcc is the producer of the acid exopolysaccharide xanthan, which is an important industrial biopolymer and has been widely used as a viscosifer, thickener, emulsifier or stabilizer in both food and non-food industries [28]. Because of its agricultural and industrial importance, molecular genetics of Xcc has attracted particular attention for over two decades. The entire genome sequences of three Xcc strains have been determined and many important genes implicated in pathogenicity, xanthan biosynthesis, and other cellular processes have been characterized [27,[29][30][31][32][33]. However, no sRNA has been identified from Xcc so far. In this article, we report four sRNAs identified from the Xcc strain 8004 by generating and screening a cDNA library of low molecular weight RNAs, providing the first step towards an understanding of the function of sRNAs in Xanthomonas.

Results and discussion
Construction of a cDNA library of low molecular weight RNAs from Xcc As mentioned above, to identify sRNAs in Xcc we employed the approach based on a cDNA library of low molecular weight RNAs (Additional file 1 Figure S1). This strategy, also known as small RNA shotgun cloning, allows detection of sRNAs that are expressed in the bacterial cells grown at given conditions but does not require prior knowledge of sRNA characteristics [12,25]. This method has been proven to be one of the most efficient ways for sRNA identification in bacteria [12,25]. We constructed a cDNA library by reverse-transcribing the RNAs with size ranging from about 50 to 500 nt, which were selected from the total RNA isolated from the bacterial cells of the Xcc strain 8004 [34] grown to the expo-nential phase in the medium XVM2, a minimal medium mimicking plant cells [35]. Since the RNAs with size ranging from 50 to 500 nt overlap in length with the very highly abundant 5S rRNA transcripts (119 nt), we excised the RNA band with the size about 110 nt from the gel after electrophoresis to deplete 5S rRNA and enrich for other sRNAs. By using the method described in Methods, a cDNA library containing approximately 10,000 individual clones was constructed.

Identification of sRNA candidates from the cDNA library
About 2,500 individual clones from the cDNA library were exposed to sequence determination; of which, 2,104 recombinant plasmids with satisfactory cloned sequences were obtained (Additional file 2 Table S1). The obtained insert sequences of these recombinant plasmids were individually aligned by BLASTN against the genomic sequence of Xcc strain 8004 on NCBI GenBank database [30] (GenBank accession number CP000050) and 2,048 of them match to the genome (Additional file 2 Table S1). Of the 2,048 matched sequences, 1,274(60.55%) were derived from tRNA genes, 444 (21.1%) from 5S rRNA genes, 67 (3.18%) from the 16S or 23S rRNA genes, 6 (0.29%) from ORF (open reading frame)-coding regions, and 257 (12.22%) from intergenic regions (IGRs) ( Table 1 and Additional file 2 Table S1 to Table S4). The sequences of the 6 ORF-matched clones all correspond to the sense orientation; therefore, it is probable that they are degeneration products of the full length mRNAs encoded by the ORFs. The 257 IGR-matched clones are comprised 7 species. We considered these species as potential candidates of sRNAs and named them sRNA-C1 to sRNA-C7, respectively (Table 2 and Additional file 2 Table S4).
It is not surprising that 60.55% of the clones were derived from tRNA genes, because the RNAs used for the cDNA library construction overlap in length with the highly abundant tRNA transcripts (about 70 nt in size), which were not removed from the RNA templates used for reverse-transcription in the library construction. The genome of the Xcc strain 8004 harbours 53 copies of tRNA genes consisting of 46 species distinguishable in sequences [30]. The sequences of the 1,274 clones derived from tRNA genes match respectively to 43 different tRNA species (Table 1 and Additional file 2 Table S2). On the contrary, it is surprising that there are still 21.1% of the clones are 5S rRNA transcripts, although the 5S rRNA-included band was removed from the RNA fractionization gel during the cDNA library construction.

Identification of sRNAs from the candidates by Northern blotting
To further verify if the 7 sRNA candidates identified from the cDNA library were authentic sRNAs, we performed Northern blotting analysis using DNA probes complementary to the original cDNA clones of the candidates. The results showed that a single Northern blotting signal band was clearly observed for each of the sRNA candidates, sRNA-C1 and sRNA-C2, and the sizes of the bands were approximately 100 and 200 nt in length, respectively ( Figure 1), which are consistent with the sizes of the corresponding candidate cDNAs (Table 2). We concluded that these two candidates are genuine sRNAs and named them sRNA-Xcc1 and sRNA-Xcc2, respectively (Table 2). For each of the candidates, sRNA-C3 and sRNA-C4, two Northern blotting signal bands were observed; a major band with small size and a very faint band with large size ( Figure 1). As shown in Figure 1, the sizes of the major bands were about 50 and 100 nt in length, respectively, which are consistent with the sizes of the corresponding candidate cDNAs (Table 2). We concluded that they are real sRNAs and named them sRNA-Xcc3 and sRNA-Xcc4, respectively. The faint bands might result from artificial hybridization of the sRNA-C3 and sRNA-C4 probes with unknown transcripts. The blots of the candidates sRNA-C5, sRNA-C6 and sRNA-C7 showed signal band(s) larger than 1000 nt, much larger than the sizes of the cDNAs, thus they are not like to be sRNAs. A summary of the analysis of these sRNA candidates is presented in Table 2, and the locations of the four verified sRNAs in the genome of the Xcc strain 8004 are shown in Figure 2. Interestingly, the gene encoding sRNA-Xcc4 overlaps with the 5S rRNA gene.
To gain a clue to understanding the expression of the identified sRNAs, we compared the expression levels of the sRNAs in the bacterial cells grown to exponential phase in different media by Northern blotting analysis. As shown in Figure 1, the expression levels of the four sRNAs in the rich medium NYG and the minimal media MMX and XVM2 are very high and almost identical. This suggests that in exponential growth phase the expression of the four Xcc sRNAs is not nutrition dependent.

5' and 3' end mapping, secondary structure prediction, and target prediction of the identified sRNAs
Northern blots only provide information about the expression level and the approximate size of a transcript, but can not detect the exact position of the 5' and 3' ends of RNA. To precisely ascertain the transcription start and stop sites of the identified sRNAs, 5' and 3' RACE analysis was performed (see Methods for details). The results are given in Additional file 3 Tables S5 and S6. Since 5' and 3' ends of a sRNA may vary by a few nucleotides, at least 10 clones for each 5' and 3' RACE analysis should be sequenced, and the most upstream 5' nucleotide is regarded as the transcription initiation site and the most downstream 3' nucleotide is regarded as the transcription termination site. The 5' and 3' termini of the four identified sRNAs were determined by the above strategy and shown in Table 2. After the 5' and 3' termini of the transcripts were identified, we assigned the most probable boundaries for the sRNAs, and the secondary structure of each of the resulting sequences was analyzed by using SFold [36]. The predicted secondary structures of the four Xcc sRNAs are shown in Figure 3.
It has been demonstrated that most characterized sRNAs regulate gene expression by pairing to their mRNA targets [9,10]. As a first step in gaining an understanding of the function of the identified Xcc sRNAs, we employed the computational software sRNATarget developed by Cao and associates [37] to predict their potential targets. The results, as shown in Additional file 4 Table  S7, reveal that a large number of genes with various predicted or known functions, including some known virulence-related genes, are potential targets of sRNA-Xcc1, sRNA-Xcc3 and sRNA-Xcc4, suggesting that these sRNAs are probably implicated in the regulation of different cellular processes including pathogenesis. In contrast, only a few genes were predicted (at a very low score) to be potential targets of sRNA-Xcc2, implying that sRNA-Xcc2 might be a structural rather than a regulatory RNA. To ascertain the indisputable biological significance of these sRNAs in Xcc needs further experimental investigations.

Distribution of the identified sRNA genes in other bacteria
To determine whether the sRNAs identified above have any sequence similarity to other known bacterial sRNAs, a BLAST [38] was used to search the sequences of the sRNAs against the small RNA database http://ncrnadb.trna.ibch.poznan.pl/blast.html. None of the four identified sRNAs displayed sequence similarity with any known sRNAs, indicating that these four Xcc sRNAs are novel. To further verify whether homologous DNA sequences of these sRNA coding genes exist in other microorganisms, we used the complementary DNA sequences of these sRNA genes to perform BLAST searches against the NCBI total sequence database http:// www.ncbi.nlm.nih.gov/Genbank/index.html. The result, which is given in Table 3, showed that: (i) to sRNA-Xcc1, homologous sequences were only found in the genomes of the Xcc strains ATCC33913 and B100 but not in any other sequenced bacterial species including the very closely related bacteria X. campestris pv. vesicatoria and X. oryzae pv. oryzae, indicating that sRNA-Xcc1 may be an Xcc specific sRNA; and (ii) to sRNA-Xcc2, sRNA-Xcc3 and sRNA-Xcc4, highly homologous sequences were found in other species of Xanthomonas and its closely related genus Xylella, in addition to the Xcc strains

Conclusion
A cDNA library was constructed with the low molecular weight RNA prepared from the Xcc cells grown to exponential phase in the minimal medium XVM2 and seven sRNA candidates were obtained by sequencing screen of approximately 2,500 clones randomly selected from the library. Four of the candidates were confirmed to be sRNAs by Northern blotting. Bioinformatics analysis revealed that all of the four sRNAs are novel. Their transcription start and stop sites were further determined by 5'-and 3'-end mapping. The secondary structure and potential targets of the four sRNAs were predicted bioinformaticsly, suggesting that a large number of genes related to various cellular processes of Xcc may be regulated by the sRNAs. To the best of our knowledge, this is the first report on identification of sRNAs from a plant pathogen by a large-scale screen. The results provide use-ful information for further studies on the molecular mechanisms of Xanthomonas campestris pathogenesis.

cDNA library construction
A cDNA library of the Xcc low molecular weight RNA was constructed using the TaKaRa small RNA cloning kit DRR065 (TaKaRa, Dalian, China) and the experimental steps were performed according to the manufacturer's instructions. A schematic diagram displaying the experimental procedure used for the cDNA library construction was shown in Additional file 1 Figure S1. In brief,  overnight cultures, grown in the rich medium NYG, of the Xcc strain 8004 were diluted to 1/100 in the minimal medium XVM2 and grown at 28°C. Bacterial cells were harvested at OD 600 = 0.6 (representing exponential phase) and total RNA was isolated using the hot phenol method [42]. 200 μg RNA were subsequently fractionated by denaturing 8% polyacrylamide gel (7 M urea, 0.5 × TBE buffer) electrophoresis (PAGE). The gel containing target RNAs with size ranging from about 50 to 500 nt were excised after removing the RNA band with the size about Figure 3 The predicted secondary structures of Xcc sRNAs. The secondary structures of the four identified Xcc sRNAs were predicted by using SFold program [36]. sRNAxcc4 sRNA-Xcc4 sRNA-Xcc3 sRNA-Xcc2 sRNA-Xcc1 110 nt to deplete 5S rRNA, and RNAs were extracted from the excised gel using the small RNA Gel Extraction Kit D9106 (TaKaRa, Dalian, China). The purified RNAs were then dephosphorylated by bacterial alkaline phosphatase (BAP) treatment and a biotin tagged 3' adaptor (5' phosphorylated) (Additional file 5 Table S8) was ligated to the RNA molecules by T4 RNA ligase (Promega, Shanghai, China). The 3' adaptor-containing RNAs were purified using the Strepto Avidin-labelted Magnet Bead MAGNOTEX-SA (TaKaRa, Dalian, China), which binds specifically to biotin, and a 5' adaptor (Additional file 5 Table S8) was ligated to the small RNA by T4 RNA ligase, and again, the 3' and 5' adaptor-containing small RNAs were purified using the Magnet Bead MAGNO-TEX-SA. These RNAs were then reverse-transcribed using primer complementary to the 3' linker sequence, and finally PCR amplified using primers on both linkers. The amplified products were gel-extracted and digested using Sse8387 I (TaKaRa, Dalian, China), and cloned into the vector pUC19 [43] and transferred into E. coli JM109 by transformation. Transformed bacterial cells were plated on LB plates containing ampicillin and grown overnight. Individual transformants were picked and screened for presence of inserts by colony PCR. Clones with inserts were uses for sequencing analysis.

DNA sequencing
The cDNA clones were sequenced using the M13 reverse primer and the BigDye terminator cycle sequencing reaction kit (Applied Biosystems, Foster City, CA, USA) on an ABI Prism 377 (Applied Biosystems, Foster City, CA, USA) sequencer.

Biocomputational analysis
Mapping of the cDNA clones on the genome of the Xcc strain 8004 was carried out by performing a BLASTN search against the genome sequence [30] on GenBank database (NCBI GenBank accession number CP000050). The Vector NTI (Invitrogen, Carlsbad, CA, USA) sequence analysis program package was used for sequence alignment. The SFold program [36] was used for RNA secondary structure prediction. sRNA targets were predicted by using the software developed by Cao and associates [37].

Northern blotting
Xcc overnight cultures were diluted 1/100, grown at 28°C in the rich medium NYG and the minimal media XVM2 or MMX, and bacterial cells were harvested at exponential phase (OD 600 = 1.0 for NYG and 0.6 for MMX or XVM2

5' and 3' RACE
5'-rapid amplification of cDNA ends (5'RACE) was carried out using the 5'RACE System for Rapid Amplification of cDNA ends kit (Invitrogen), following the manufacturer's instructions. After purified using the Watson Gel Extraction Mini Kit (Watson Biotechnologies, Inc), the PCR products of 5'RACE were cloned into the T-vector pMD18-T (TaKaRa, Dalian, China) and the cloned cDNA fragments were sequenced and analyzed. 3'-RACE was conducted using the TaKaRa Small RNA cloning Kit (TaKaRa, Dalian, China), following the manufacturer's instructions, and the 3'-RACE PCR products were cloned and sequenced using the same method for 5'-RACE. Primers and RNA adaptors used for 5'-and 3'-RACE are listed in Additional file 5 Table S8.