Genomics and transcriptomics of Xanthomonas campestris species challenge the concept of core type III effectome

Background The bacterial species Xanthomonas campestris infects a wide range of Brassicaceae. Specific pathovars of this species cause black rot (pv. campestris), bacterial blight of stock (pv. incanae) or bacterial leaf spot (pv. raphani). Results In this study, we extended the genomic coverage of the species by sequencing and annotating the genomes of strains from pathovar incanae (CFBP 1606R and CFBP 2527R), pathovar raphani (CFBP 5828R) and a pathovar formerly named barbareae (CFBP 5825R). While comparative analyses identified a large core ORFeome at the species level, the core type III effectome was limited to only three putative type III effectors (XopP, XopF1 and XopAL1). In Xanthomonas, these effector proteins are injected inside the plant cells by the type III secretion system and contribute collectively to virulence. A deep and strand-specific RNA sequencing strategy was adopted in order to experimentally refine genome annotation for strain CFBP 5828R. This approach also allowed the experimental definition of novel ORFs and non-coding RNA transcripts. Using a constitutively active allele of hrpG, a master regulator of the type III secretion system, a HrpG-dependent regulon of 141 genes co-regulated with the type III secretion system was identified. Importantly, all these genes but seven are positively regulated by HrpG and 56 of those encode components of the Hrp type III secretion system and putative effector proteins. Conclusions This dataset is an important resource to mine for novel type III effector proteins as well as for bacterial genes which could contribute to pathogenicity of X. campestris. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-2190-0) contains supplementary material, which is available to authorized users.


Background
Gram-negative bacteria of the species Xanthomonas campestris are able to cause disease on Brassicaceae and are responsible for important yield and quality losses in brassica crops such as cabbage, radish, cauliflower or Chinese cabbage [1]. Interestingly, Xanthomonas campestris isolates are natural pathogens of the model plant species Arabidopsis thaliana [2]. Based on host range, mode of infection and the disease symptoms caused on plants, the species was further divided in three pathovars [3]. The pathovar campestris regroups strains which are able to cause black rot on at least one cruciferous species. Strains of the pathovar incanae cause bacterial blight on garden stock and wallflower while those of pathovar raphani are the causal agent of bacterial leaf spot on both cruciferous and solanaceous plants [3]. Finally, weakly or non-pathogenic (NP) strains, which were originally assigned to the pathovars barbareae or armoraciae, were included into a fourth group of lessdefined strains due to their relatedness by multilocus sequence analysis (MLSA) [3,4]. While strains of pathovars campestris and incanae use the hydathodes (and wounds) to initiate a vascular infection of the plant, strains of the pathovar raphani seem to preferentially use stomata (and wounds) to enter the leaf and colonize the mesophyll.
Extensive molecular genetics of Xanthomonas campestris has identified key pathogenicity determinants [5][6][7] such as extracellular polysaccharides, lipo-polysaccharides, DSF-dependent quorum sensing, extracellular enzymes (proteases, cellulases,…) exported by the type II secretion systems or proteins secreted by the type III secretion (T3S) system. Type III-secreted proteins (T3SP) include type III effector (T3E) proteins which are injected inside the plant cells where many of them interfere with cell physiology and plant immunity. Because the Hrp (hypersensitive response and pathogenicity) T3S system is essential for Xanthomonas virulence, it is assumed that the type III effectome is also globally essential for any interaction with the plant. Yet, individual effectors usually have limited or non-significant contribution to pathogenicity when studied alone, probably reflecting functional redundancy and/or additivity between effectors. hrp genes are not expressed in Xanthomonas cultivated in rich media. Yet, hrp gene expression can be induced in Xanthomonas in specific minimal media (XVM2, MME, MMX) that were thought to mimic in planta conditions [8][9][10]. Two master regulators of the hrp systems which are both required for virulence and hrp gene expression in minimal medium have been identified: HrpX is an AraC-type transcriptional activator inducing the expression of all hrp operons but hrpA upon binding to the plant-inducible promoter (PIP) box (TTCGB-N 15 -TTCGB; B represents C, G, or T) in the promoter region [11,12]. Expression of most T3E genes is under the control of hrpX [13][14][15]. HrpG, an OmpR-type transcriptional regulator and its putative cognate sensor kinase HpaS control the expression of hrpX, the hrpA operon and other genes [16,17]. Interestingly, several point mutations in hrpG (hrpG*) can render its activity constitutive in the absence of inducing condition and result in an increased aggressiveness on plants [18].
To date, four complete genome sequences are available in the X. campestris species (pathovar campestris: 8004, ATCC33913, B100; pathovar raphani: 756C) and six draft genomes (pathovar campestris: Xca5, JX, B-1459, CN14, CN15 and CN16) [7,[19][20][21][22][23][24][25]. These strains have in common a ca. 5-Mb circular chromosome with approx. 65 % G + C content. The presence of plasmids has only been reported in strains B-1459, CN14, CN15 and CN16 [23,25]. No genomic information is available yet on the pathovar incanae and the non-pathogenic group of X. campestris. Comparative genomics has proven to be an important tool to mine for pathogenicity determinants, host specificity factors and in particular for T3E [21]. Due to the lack of conserved secretion/translocation signals in T3SP, those can only be predicted by indirect means: in silico proteomes can be studied by homology searches with known T3E sequences, by searching for eukaryotic signatures (nuclear localization signals, myristoylation/palmitoylation signals, F-box motifs,…) and/or predicted genes can be scrutinized for the presence of a PIP box. Indeed, coexpression of T3SP with the Hrp T3S system is the rule and has proven to be a powerful tool for the identification of novel T3E in Xanthomonas [13], Ralstonia solanacearum [26] and Pseudomonas syringae [27]. To this end, the analysis of Xanthomonas transcriptomes in hrp-inducing conditions or in deregulated mutants expressing constitutively the hrp regulon has been a major source for the discovery of novel effectors [13][14][15]28].
Here, we report on the genome sequencing of four X. campestris strains of the pathovars incanae, raphani and NP and their comparison with publically available X. campestris genomes. This approach identified a very limited core type III effectome for this species of Xanthomonas. A transcriptomic analysis of the hrpG regulon has also been performed in a X. campestris pv. raphani strain CFBP 5828R and provides important information for the structural annotation of the genes and precious hints for the identification of candidate type III effector proteins as well as genes which could contribute to pathogenicity of X. campestris.

Genome sequencing and properties
Four X. campestris strains belonging to pathovars not or poorly characterized at the genomic levels were selected for this study (Table 1). Two strains of the pathovar incanae, CFBP 2527 and CFBP 1606, were chosen because these strains were isolated on two distinct continents at 24 years interval. Notably, strain CFBP 2527, which was isolated from hoary stock (Matthiola incana) in the USA in 1950, is the pathotype strain of pathovar incanae. Strain CFBP 5828 belongs to the raphani pathovar and was isolated from radish in the USA. So far, only one complete genome has been determined in this pathovar [21]. The sequenced strain 756C was isolated from cabbage in East Asia and was classified in the raphani pathovar based on its MLSA profile and pathogenicity [3,29]. Strain CFBP 5825 was initially classified as pathovar barbareae and designated as the pathotype strain of this pathovar but has recently been assigned to the X. campestris NP ("non-pathogenic") clade [3].
Shotgun sequencing of genomic DNA of rifampicinresistant derivatives of these strains ("R" suffix; e.g. CFBP 5825R) was performed on HiSeq2000 Illumina platform.  (Table 1). Genome assembly was performed using a combination of SOAPdenovo [30] and Velvet [31] assemblers and yielded 6 to 11 contigs per genome. These contigs were further ordered into a large pseudochromosome based on the chromosomal organization of X. campestris pv. campestris strain 8004. Resulting pseudochromosome sizes have both genome sizes (ca. 5Mb) and G + C contents (ca. 65 %) which are typical for Xanthomonas. For both strains CFBP 5825R and CFBP 5828R, one small (<1kb) contig showing no homology to Xanthomonas chromosomal sequences could not be assembled into the final pseudochromosome. Consistently, we did not detect any endogenous plasmids in those strains (data not shown) nor plasmid-like sequences in their genomes. Due to their highly repetitive nature, transcription activator-like (TAL) effector sequences present in strains CFBP 5825R and CFBP 1606R could not be assembled and are not represented in the final assemblies. Southern blot analyses of BamHI digested genomic DNA indicate that at least one TAL gene is present in each genome (Denancé and Noël, unpublished results). With less than ten scaffolds per genome [GenBank:ATNN00000000, GenBank: ATNO00000000, GenBank: ATNP00000000, GenBank: ATNQ00000000], these draft genomes are sufficiently well assembled to allow gene discovery and gene functional analyses and to develop molecular typing tools.
Genome annotation and X. campestris genomic properties De novo annotation was performed using Eugene-P ( In order to determine the phylogenetic relationship of the newly sequenced strains in relation to nine published X. campestris genomes [7,[19][20][21][22][23][24], all genomes were first structurally re-annotated using Eugene-P. The resulting proteomes were compared using OrthoMCL and yielded a X. campestris core genome composed of more than 3481 CDS which was used to perform phylogenetic analyses. As inferred from amplified fragment length polymorphism (AFLP) analyses [32], X. campestris pv. campestris, is organized into at least three clades: Clade XccA contains strains B100 and JX, clade XccC contains strains 8004, Xca5 and ATCC33913 and clade G contains strains CN14, CN15 and CN16 (Fig. 1a). In contrast to strains of pathovar campestris, strains of pathovars incanae and raphani did not cluster in monophyletic groups: this phylogenetic analysis could not fully discriminate between X. campestris NP and X. campestris pv. incanae (Fig. 1a). The phylogenetic relationships at the subspecies level can also be determined using CRISPR systems [33]. CRIPR loci are bacterial immune systems which store fossil records of exogenous DNA acquired during evolution upon phage infection or plasmid acquisition in the so-called spacer regions. Interestingly, X. raphani strains CFBP 5828R and 756C were the only strains carrying a CRISPR locus. These two loci harbored identical repeats and 101 and 85 spacers in strains CFBP 5828R and 756C, respectively. The lack of common spacers between these two CRISPR loci and the significant distance separating the core genomes of these two strains (Fig. 1a) support the hypothesis that both strains diverged a long time ago.
The X. campestris core ORFeome was composed of 3481 protein coding genes as estimated using OrthoMCL ( Fig. 1b and d). In particular, the X. campestris pv. campestris core ORFeome was composed of 3790 CDS. While each genome contains ca. 4400 CDS, the pan genomes of X. campestris and X. campestris pv. campestris encompass 6472 and 5439 CDS, respectively (Fig. 1c). For X. campestris pv. campestris, the number of strain-specific CDS in the pan ORFeome was low indicating that these strains are genetically closely related. In contrast, each strains of the other X. campestris pathovars contributed more than 140 CDS to the X. campestris pan ORFeome probably as a result of the lower number of representative genomes per pathovar available for analysis. These results are in agreement with the phylogenetic relationships observed in the X. campestris core genome (Fig. 1a) indicating that the acquisition/fixation of these accessory genes is likely the result of speciation events that were essentially vertically inherited. The core ORFeomes of these four X. campestris pathovars were compared (Fig. 1d). These analyses highlighted that the core ORFeomes of pathovars raphani and incanae are slightly bigger in size than that of pathovar campestris which might reflect the lower number of genomes analysed for pathovars raphani and incanae. The X. campestris NP clade was represented by a single strain CFBP 5825R which artificially increased the size of its core ORFeome to 4079 CDS. The identification of these core and pan ORFeome for the species and the different pathovars is an important resource to classify new strains and to mine for determinants of pathogenicity and host specificity.
Avoidance of FLS2-and EFR-mediated pattern-triggered immunity (PTI) is restricted to pathovar campestris Perception of pathogen-associated molecular patterns (PAMP) elicits pattern-triggered immunity which strongly restricts microbial pathogenicity. In this study, we investigated the diversity of two major bacterial  Fig. 1 Comparison of 13 publically available and newly sequenced Xanthomonas campestris proteomes. Orthologous proteins were determined using OrthoMCL software using homogenously re-annotated genomes. a A phylogenetic tree of X. campestris core proteomes (3481 orthologous coding sequences, CDS) was generated using the PhyML software (Default parameters). Bootstrap values are indicated in grey for each branch.
b Size of X. campestris core ORFeome was determined considering only CDS with a single ortholog per genome. c Numbers above black bars indicate the size of the pan ORFeome. Only one CDS per orthology group was considered. The number of annotated CDS per genome is indicated (open bar). The number of isolate-specific CDS is given (green bars). d Venn diagram illustrating the number of coding sequences shared among the core ORFeomes (as defined in (b)) in the four X. campestris pathovars. Numbers in brackets indicate the number of genes in the core genome of the pathovars. Xcc (blue): X. campestris pv. campestris, Xci (red): X. campestris pv. incanae, Xcr (black): X. campestris pv. raphani, XcNP (green): X. campestris non-pathogenic PAMP proteins: the flg22 peptide from the FliC flagellin protein which is perceived by the FLS2 (flagellin sensitive) plant immune receptor of Arabidopsis and the elf18 peptide from the elongation factor Tu (EF-Tu) which is monitored by the Brassicaceae-specific EF-Tu receptor (EFR). The lack of recognition of the FliC flagellin of X. campestris pv. campestris by FLS2 was previously reported [32,34] and holds true for the eight strains studied here (Fig. 2a). In contrast, analysis of FliC diversity in other pathovars showed that each other FliC isoforms were predicted to be recognized by FLS2. The perception of X. campestris elongation factor Tu (EF-Tu) by EFR is probably not systematic either. Though essential residues K4, F5 and R7 of the elicitor peptide elf18 are conserved at the species level, five strains of the pathovar campestris (8004, ATCC33913, Xca5, JX and B100; clades XccA and XccC) express an EF-Tu with a K2R substitution (Fig. 2b). This polymorphism likely prevents recognition by EFR since elf18-K2R peptides failed to elicit EFR-dependent responses [35]. These results suggest that avoidance of FLS2-mediated PTI have been acquired first during the evolution of the pathovar campestris for FliC and later for EF-Tu.
A reduced core type III secretome in X. campestris In order to determine the type III secretome of the thirteen X. campestris strains, genomes were analyzed manually using tblastn against T3SP reported for the Xanthomonas genus ([http://www.xanthomonas.org/ t3e.html] and [36]) ( Table 3). Presence of genes encoding homologues of T3SP was validated if sequence identity was higher than 60% over the full-length protein. At least 13 T3SP were present for CFBP 5828R, 18 for CFBP 5825R, 21 for CFBP 1606R, and 24 (plus one pseudogene) for CFBP 2527R. These predicted secretomes are rather small, especially considering that they include HrpW, XopA and HpaA which are likely involved in the secretion/translocation process per se. In contrast, the secretome of X. campestris pv. campestris comprised 22 T3SP on average (min 17, max 27) [32]. AvrXccA1 was not considered in these effectomes because there is no experimental evidence to support its secretion by the T3S system nor its co-regulation with the T3S system. The core type III secretomes were determined for each pathovar and compared with each other. Sizes of the core secretomes ranged from 12 for pathovar raphani to 18 for pathovar campestris (Fig. 3). Only six T3SP were conserved among all the 13 X. campestris strains. hrpW, xopA and hpaA put aside, only xopF1, xopP and xopAL1 T3E genes were detected in all genomes (Fig. 3a). The type III secretome of pathovar raphani contained three T3SP (XopAD, XopAT and AvrXccA2) absent from the core secretomes of the other pathovars and was very different from the T3SP repertoire found in the other three pathovars. Strain belonging to pathovars campestris, incanae and NP shared 15 T3SP out of 17/18 present in their core secretomes. These results suggest that breeding of disease resistance should focus on the in planta recognition of the core T3E XopF1, XopP and XopAL1 in order to achieve broad-spectrum resistance in Brassicaceae against most X. campestris strains.

Analysis of the transcriptome of strain CFBP 5828R by RNA sequencing improves genome annotation
In order to identify genes co-regulated with the T3S system, strain CFBP 5828R of X. campestris pathovar raphani was transformed either with an empty vector (pBBR1MCS-2) or the same vector expressing a constitutively active form of the hrp master regulator HrpG (HrpG*, mutation E44K). Total RNA was purified from cells growing exponentially in MOKA rich medium and derived from three independent biological experiments. After removal of rRNA by oligonucleotide-capture, these six samples were size-fractionated to discard the smallest RNA fraction (less than 200bp) and were subjected to strand-specific RNA sequencing on an Illumina HiSeq2000 platform. Sequencing yielded from 12 The CFBP 5828R genome sequence was first reannotated taking advantage of this large dataset using Eugene-P software [37]. Expression of all but 28 protein-coding genes (0.6 %) could be detected by RNA sequencing. Out of the 130 ncRNAs obtained by in silico ncRNA prediction, 35 (27 %) were not supported by RNA sequencing suggesting that these may either be artifacts or only expressed in specific conditions. This new annotation based on RNA sequencing includes 17 new coding sequences (Table 2). Based on these expression data, 1474 transcriptional starts could be determined leading to the reannotation of 95 translational start sites. One hundred forty- five new ncRNA could also be annotated. Several ncRNA have recently been shown to be important for Xanthomonas pathogenicity [38,39]. Among the 23 ncRNA identified in X. euvesicatoria strain 85-10, twelve ncRNA (sX1, sX2, sX5, sX7, sX10-14, 6S, asX1 and asX4) are detected in all X. campestris genomes analyzed in this study suggesting a biological significance [38]. Expression of all those ncRNA could be detected in X. campestris pv. raphani strain CFBP 5828R though only three reads were identified for sX5 (Additional file 1).
In conclusion, the structural annotation of proteincoding genes and ncRNA was significantly improved by the use of RNA sequencing.
Transcriptomic analysis of CFBP 5828R by RNA sequencing identifies the hrpG regulon In order to identify hrpG-regulated genes, RNA sequencing reads from the individual libraries (hrpG* or empty vector) were mapped to the re-annotated genome sequence of strain CFBP 5828R. Read counts per objects were used to calculate differential gene expression with  [40]. This approach identified 134 and seven genes the expression of which was induced and repressed more than five fold (p < 0,001), respectively, in the strain ectopically expressing hrpG* when compared to the strain containing the empty vector (Fig. 4a, Table 4, Additional file 1). Notably, biological reproducibility was extremely good though slightly more biological variability was observed among the empty vector controls (Additional file 2). To validate these data, RT-qPCR experiments were performed for 13 genes including constitutively expressed, induced and repressed genes (Fig. 4b). A positive correlation was observed between RNA sequencing and RT-qPCR results thus validating the RNA sequencing approach with an independent method. Not so surprisingly, the dynamic range of RT-qPCR was narrower than that of RNA deep-sequencing. The vast majority of genes with a hrpG-dependent expression are positively regulated by hrpG (134 out of 141). As expected, genes encoding the Hrp T3S system are included in this list as well as all genes coding for predicted T3SP (56 out of 134). Interestingly, genes of the T3S system and its T3SP are the most highly induced suggesting that T3SP candidates should be expected among genes with the highest induction ratios (Fig. 4c). In Xanthomonas, hrpG-dependent expression depends in a large part on the HrpX regulator which activates promoters containing the PIP box. PIP boxes are found in front of most operons encoding the T3S system and T3SP (39 out of 56; 70 %). In contrast, only 16 % of the 78 other genes also have a PIP motif in their promoter region. HrpG-dependent induction of gene expression without a PIP box is globally less intense (Fig. 4c) suggesting that hrpX, as in other Xanthomonas, is also a major regulator of the hrpG regulon in X. campestris pv. raphani strain CFBP 5828R. Besides "protein secretion" mediated by the T3S system, the remaining hrpG-induced genes are    significantly enriched in the following GO terms (Additional file 3): cytochrome complex assembly (GO:0017004); branched-chain amino acid biosynthetic process (GO:0009082); respiratory chain complex IV assembly (GO:0008535); heme transport (GO:0015886; GO:0015232) and proteolysis and peptidase activity (GO:0006508; GO:0070011). Because extracellular protease activity was shown to be important for pathogenicity in X. campestris pv. campestris strain 8004 [41] and to be dependent on hrpG in X. euvesicatoria strain 85-10 [13], we measured the global extracellular protease activity of strain CFBP 5828R, strains carrying hrpG* or the empty vector (Additional file 4). To this end, strains were spotted and grown on MOKA medium containing milk proteins. Surprisingly, the strain carrying hrpG* showed a reduced degradation of milk proteins compared to strains without hrpG*. HrpG does repress the expression of the arg-C endoprotease gene XCRCFBP 5828_m00114820 (nine-fold repression in the hrpG* strain, Table 4). Yet, its basal expression levels are low compared to the eight protease genes the expression of which is induced in the hrpG* strain (XCRCFBP 5828_m00116210, XCRCFBP 5828_m00116450, XCRCFBP 5828_m00116470, XCRCFBP 5828_m00117350, XCRCFBP 5828_m00119460, XCRCFBP 5828_m00120960, XCRCFBP 5828_m00123610, XCRCFBP 5828_m00126780; Additional file 1). These results suggest that extracellular protease activity is regulated post transcriptionally by hrpG.
In contrast to X. euvesicatoria strain 85-10, the expression of ncRNA sX5, sX11 and sX12 was not hrpG-dependent in X. campestris pv. raphani strain 5828R [38]. Yet, the expression of eight novel ncRNA was positively regulated by hrpG (p < 0.001) ( Table 4). Four ncRNA are encoded within the hrp gene cluster. Among the eight hrpG-regulated ncRNA, three are antisense RNA to the T3E genes xopR, xopL and xopP and one is antisense to the regulatory gene hrpG. The biological functions of those ncRNAs remain to be determined experimentally in X. campestris.
As for the seven repressed genes of the hrpG regulon, these encode an endoproteinase (locus tag 14820), the Pel3 pectate lyase (37610), a small gene cluster comprising two putative transporters (29100 and 29120) and one transcriptional regulator of the MarR family (29130) and two clustered hypothetical proteins (Table 4). With such a limited number of genes, no significant enrichment in gene ontology (GO) terms could be identified among the hrpG-repressed genes.

Discussion
Genomic diversity in the X. campestris species The 13 X. campestris genome sequences now available provide a panorama of the four major genomic groups composing this species. With a complete T3S system and 19 predicted T3E, it may not be appropriate to consider strain CFBP 5825 to be non-pathogenic as proposed [4]. The NP designation only indicates that none of the Brassicaceae tested at the time were appropriate hosts for these strains under the inoculation conditions tested. Future studies should determine if strain CFBP 5825 possesses for instance a functional T3S system and if it is virulent on at least one Brassicaceae plant.
A great genomic diversity can be observed both at the intra-and inter-pathovar levels. Previous studies already Only genes with both a fold change higher than five and an adjusted P-value lower than 0.001 are shown. demonstrated that the pathovar campestris is composed of at least three clades [32]. This study suggests that comparable diversity is expected for pathovars raphani and incanae. The lack of common CRISPR spacers between the two strains of the pathovar raphani is particularly striking. Yet, a polyphyletic origin of this pathovar is unlikely since the CRISPR repeats are identical between the two strains. Thus, it suggests that the two strains diverged a long time ago to allow the loss of all ancestral spacers in at least one strain and the likely acquisition of a significant number of new spacers since this event. Importantly, no CRISPR loci could be identified in the other strains so that genomic diversity within this species cannot be precisely determined using this tool and should thus rely on MLSA [4], AFLP [32] or MLVA (multilocus variable number of tandem repeats analysis) schemes as developed for other Xanthomonas species [42][43][44].
It is tempting to compare our results to P. syringae where a similar analysis was conducted [45]: the P. syringae core genome is composed of 3397 genes (genomes comprise from ca. 5000 to 8000 genes) which is close to the 3481 genes for X. campestris. In both instances, comparable numbers of strains were analyzed: 19 P. syringae genomes vs. 13 for X. campestris. While estimation of core genomes is rather insensitive to annotation quality or homogeneity, the pan genome size can be drastically affected. This could be one of the explanations for the rather small pan genome of the 13 X. campestris strains (6472 genes) compared to the 12,749 genes for the 19 P. syringae strains. Thus, X. campestris appears as a rather homogeneous genomic group compared to P. syringae which is considered as composed of several genospecies [46]. These observations are in agreement with the fact that P. syringae species has a larger host range compared to X. campestris that only infect plant of the Brassicaceae family.
X. campestris core type III effectome is reduced to three genes Considering the high quality of the genomes obtained in the frame of this analysis, one can be rather confident in the predicted sizes of X. campestris type III secretomes. Pan type III secretomes range from 13 to 31 for pathovars raphani and campestris respectively (Fig. 3b). Despite having a large core genome, the predicted core effectome of the 13 X. campestris strains was found to consist of only three bona fide effectors (XopF1, XopP and XopAL1) plus three T3SP (XopA, HrpW and HpaA) which are likely to be involved in the type III secretion and translocation process itself. The core effectomes of 138 X. axonopodis strains and 65 X. axonopodis pv. manihotis strains are made of ca. eight and six candidate T3Es, respectively [47,48]. Yet, these two effectomes only have XopN in common. Combined with our results, only XopF1 may still be considered as a core Xanthomonas effector though pseudogenized in many X. axonopodis strains [47]. This situation is also reminiscent of type III effectome studies performed in P. syringae and R. solanacearum. In the species complex R. solanacearum, 22 out of the 94 T3E composing the pan effectome are conserved in all eleven strains analyzed [49]. The P. syringae core effectome is limited to five T3E (HopM, AvrE, HopAA, HopAH and HopI) though none is strictly conserved and intact in all 19 strains [45]. Therefore, there is no overlap between P. syringae and X. campestris core effectomes. These results challenge the concept of core effectomes as soon as increased biodiversity is analyzed at the species level or above. It also indicates that no universal set of effectors is used to infect plants, which could suggest that pathogenic bacteria use host-specific strategies to circumvent plant immunity or promote susceptibility. Yet, the functional redundancy observed within effectomes rather suggests that bacteria may use a repertoire of unrelated effectors to target conserved plant targets. This later hypothesis is supported by the identification of important plant susceptibility hubs such as RIN4, SWEETs or RLCK VII (For review, [50]). To date, the molecular functions of XopF1, XopP T3E remain unknown. As for XopP, it was recently shown to block peptidoglycan-and chitin-triggered immunity in rice by inhibiting the U-box ubiquitin ligase OsPUB44, a positive regulator of basal immunity [51].
RNA sequencing refines the annotation of X. campestris pv. raphani strain CFBP 5828R T3E genes and ncRNA RNA sequencing approaches in plant pathogens including Xanthomonas are still in their early days and were so far used to identify regulons, transcriptional start sites or ncRNAs [38,[52][53][54]. In this study, the sequencing of the transcriptome of X. campestris pv. raphani strain CFBP 5828R produced ca. 111 million reads of 51-bp resulting in 5.66 Gb of raw data. The use of a custom Xanthomonas-optimized oligonucleotides set allowed a high ribodepletion efficacy (2-13% rRNA/tRNA/tmRNA reads) so that more than 93% of the total reads were specifically mapped to mRNAs. 1000-fold coverage of the genome was achieved which is comparable to studies in X. campestris pv. campestris [54] but significantly higher than studies in X. euvesicatoria, X. citri pv. citri and X. campestris pv. campestris (10-fold, 700-fold and 400-fold respectively) [38,52,53]. The size of this dataset is also far above the accepted limit (5-10 millions non-rRNA reads per library) for correct expression profiling, gene discovery or gene reannotation [55]. The RNA reads were first used to improve the structural annotation of the genome using the Eugene-P pipeline. One important difference of our RNA sequencing approach with most of the above-mentioned reports is the use of strandspecific libraries. Such libraries enable to assign individual reads to a specific DNA strand in the genome and therefore allow, for instance, to identify overlapping or antisense RNA molecules: 72 ncRNA overlap with CDS such as the three T3E genes xopR, xopL and xopP in X. campestris pv. raphani strain CFBP 5828R. Early functional studies in X. euvesicatoria have shown that several ncRNA contribute to pathogenicity [38,39]. Only few new mRNA were identified. Yet, the major improvement is a better characterization of transcriptional start sites (TSS). This latter point is particularly relevant for T3E genes which are often characterized by an atypical codon usage and a lack of homology with known genes, thus preventing proper annotation of their TSS (e.g. [56]). As an illustration, three T3E proteins are likely to be longer than automatically annotated. Based on their 5'-UTR, xopAD, xopAL1 and xopAT may encode N-terminal extensions of 169, 9 and 228 amino acids, respectively. Compared to the published annotation of X. campestris pv. raphani strain 756C [21], XopAT and XopAD may be 21 and 57 amino acids shorter in strain CFBP 5828R. For XopAL1, our data support the annotation of X. campestris pv. campestris strain B100 which is 27 amino acids longer than in strain 8004.
The hrpG regulon in X. campestris pv. raphani strain CFBP 5828R encompasses all predicted T3E, T3SP and T3S system genes The small effectome of X. campestris pv. raphani suggests that more effectors could be discovered. To date, the most productive strategy remains to determine genes that are co-regulated with the genes encoding the Hrp type III secretion apparatus [13,26,27]. In our study, we chose to determine the hrpG regulon by RNA sequencing using a constitutive active form of this regulator HrpG*. RNA sequencing has a higher dynamic range than micro-arrays and also offers full genome coverage. The use of the hrpG* mutant allele was previously used successfully [13] and permits the growth of all bacterial strains in a single medium thus minimizing the noise due to metabolic responses unrelated to the hrp gene regulation. For instance, RNA sequencing of X. campestris pv. campestris grown in synthetic hrp gene-inducing medium MMX vs. rich medium resulted in the identification of a regulon of more than 600 genes mostly involved in bacterial metabolic adaptation [52]. In addition, increased expression in MMX was observed in only five out of 12 T3E genes resulting in poor predictive potential for T3E gene discovery [52].
Comparing the size and composition of hrp regulons is difficult because it depends on the biological system, the experimental design, the statistical analyses and the chosen cut-off values. For X. campestris pv. raphani, we intentionally selected stringent values (>5 fold induction/repression and p < 0,001) so that the resulting regulon is limited to 141 genes (3 % of the genes), 95 % of which are induced. This regulon size is comparable to the reported R. solanacearum hrpB regulon [26] and the P. syringae hrpL regulon [57] but smaller than the regulon determined in X. campestris pv. campestris grown in synthetic hrp gene-inducing medium MMX [52]. In these two later examples, only 80 % of the genes were positively co-regulated with the T3S system genes. Genes of the hrpG regulon are well conserved in X. campestris since 74% of those are detected in the 13 genomes inspected (Table 4). Importantly, all known T3E, T3SP and T3S system genes were found to belong to the X. campestris pv. raphani hrpG regulon. For instance, 34-fold induction of avrXccA2 expression in X. campestris pv. raphani strain CFBP 5828R upon hrpG* expression (Table 4) provides the first experimental hint for AvrXccA2 being a T3E candidate. One expects to find unknown T3E genes among the genes which expression is highly upregulated by HrpG (13 genes with induction fold higher than 50, Table 4) and among those with PIP promoter motifs (12 genes, Table 4). This repertoire of 22 hrpG-induced genes, once processed with T3E prediction tools [36,58], constitutes a manageable list to mine experimentally for novel type III effectors in X. campestris pv. raphani.

Conclusions
A deep knowledge of the genomic diversity of X. campestris is needed to develop effective molecular typing schemes. This study presents a first genomic coverage of the pathovars of X. campestris. Core-and pathovarspecific proteomes were determined as well as the repertoire of Xop effector proteins that are used by bacteria to subvert plant immunity. Using RNA sequencing, a set of genes co-regulated with the T3S system including non-coding RNAs was identified which should contribute to our understanding of the virulence strategies of this important species of phytopathogens.

Bacterial strains, plasmids and growth conditions
Strains and plasmids used in this study are listed in Table 1. X. campestris strains were grown at 28°C in MOKA medium [59]. Escherichia coli cells were grown on LB (lysogeny broth) medium at 37°C. For solid media, agar was added at a final concentration of 1.5 % (w/v). Antibiotics were used at the following concentrations: 50 μg/ml kanamycin, 50 μg/ml rifampicin, spectinomycin 40 μg/ml. To select for spontaneous rifampicin-resistant X. campestris mutants, overnight cultures in liquid MOKA were plated on MOKA-Rif medium at high density. The CFBP strains are available from the CIRM-CFBP collection of plant-associated bacteria (Angers, France, http://www6.inra.fr/cirm_eng/CFBP-Plant-Associated-Bacteria).

RNA extraction, rRNA depletion and pyro-sequencing
For each genotype of X. campestris pv. raphani strain CFBP 5828R, three independent cultures in MOKA medium were harvested at mid-exponential phase (OD 600nm = 0.5) and subjected to RNA extraction as described [37]. After TurboDNAse (Ambion) treatment and quality control using Bioanalyzer RNA6000 Nano kit (Agilent Technologies Genomics), depletion of ribosomal and selected tRNA was performed as described [37] using a custom set of oligonucleotides optimized for the Xanthomonas genus (Additional file 5). Singleend RNA sequencing (51-bp reads) was performed on the Illumina HiSeq2000 platform (Fasteris SA, Geneva, Switzerland) as described [37].

Genome structural annotation
Structural annotation was done using Eugene-P software [37]. This modular software allows the integration of several sources of high-throughput data such as protein similarities, DNA homologies, predicted transcription terminators and transfer RNA genes and others. We trained Eugene-P with the public annotation of X. campestris pv. campestris strain B100 (available on NCBI website under the accession number NC_010688), X. campestris pv. campestris strain 8004 (NC_007086) and X. campestris pv. campestris strain ATCC 33913 (NC_003902) and at a lower weight with the Swissprot database (version of 04 February 2013). For strain CFBP 5828R, all RNA sequencing libraries were merged and used by Eugene-P to predict structural annotations of mRNA as described [37].

Identification of core and accessory genes in X. campestris pathovars
Identification of orthologous groups between genomes was achieved by OrthoMCL analyses (Li et al., [52]) with the 13 genomes. In order to prevent annotation biases during downstream analyses, all published genomes were re-annotated with Eugene-P as described above. OrthoMCL clustering analyses were performed using the following parameters: p-value cut-off = 1 × 10 −5 ; Percent Identity cut-off = 0; Percent match cut-off = 80; MCL Inflation = 1.5; Maximum weight = 316. We modified OrthoMCL analysis by inactivating the filter query sequence during the BLASTP pre-process. Groups of orthologs corresponding to CDS present in one copy in at least two genomes were extracted from OrthoMCL output files. For the group of strains considered, the core proteome was defined as the OrthoMCL groups represented by a single protein in each strain. The pan proteome was defined as all the OrthoMCL groups present in the group of strains considered plus single copy strainspecific proteins.

Phylogeny of X. campestris genomes
Phylogenetic analysis was performed based on OrthoMCL analyses. Only groups composed of one single protein in each strain was used to build what we defined as the core proteome. For each of these groups, we aligned the protein sequences using MAFFT software [65] and cleaned alignment using trimAl software [66] to remove all positions in the alignment with gaps in 10% or more of the sequences. Alignment files were converted in phylip format. Phylogenetic tree was constructed using PhyML [67].
Identification of candidate type III-secreted proteins in X. campestris strains CFBP 5825R, CFBP 1606R, CFBP 2527R and CFBP 5828R Protein sequence of T3SP from strains X. campestris pv. campestris 8004 and B100, X. euvesicatoria 85-10, X. campestris pv. raphani 756C or X. axonopodis pv. citri 306 (http://www.xanthomonas.org/t3e.html) were used as queries for a tblastn analysis on the genomes of the different X. campestris strains. T3SP genes were considered as present in the genome sequences when protein alignments shared at least 60% identity over the full length of the reference proteins. Core type III secretomes were defined for each pathovar as proteins present in all sequenced strains of a given pathovar.

Analysis of RNA sequencing results and statistical analyses
Mapping of RNA sequencing reads on CFBP 5828R genomic sequence was done using the Glint software (Faraut T. and Courcelle E.; http://lipm-bioinfo.toulouse.inra.fr/download/glint/, unpublished) integrated in the Eugene-P pipeline. Parameters used were the following: matches allow no gap, a minimum length of 40 nucleotides and no more than one mismatch. We kept only the reads with the best score when several reads mapped to two different positions with different scores. Reads mapping at two different positions with the same best scores were not considered. Differential expression of genes was calculated with R (v2.13.0) using DESeq (v1.4.1) [40]. Variance was estimated using the per-condition argument. p-values were adjusted for multiple testing using the Benjamini and Hochberg method [68].

Quantitative RT-PCR analyses
Two micrograms of total RNA used for the RNA sequencing were subjected to reverse transcription using Transcriptor reverse trancriptase (RT, Roche Applied Science) and 1 μM Random Primer 6 (New England BioLabs). Quantitative PCR was performed using diluted cDNA with 1 μM gene-specific oligonucleotides (Additional file 5) and LightCycler 480 SYBR Green I Master kit on a LightCycler 480 (Roche Applied Science) machine with the following cycling parameters: 9 min 95°C ; 50 cycles of 15 sec 95°C, 10 sec 60°C and 10 sec 72°C. Technical and biological triplicates were performed. Efficiencies of the primer pairs were determined on diluted genomic DNA and were greater than 1.8. Two reference genes expressed constitutively in the RNA sequencing experiment (CFBP 5828R_m00134650 and CFBP 5828R_m00117870) were used for normalization. Expression levels were calculated using the ΔΔCt method.

Identification of PIP boxes
The PIP-box motif (TTCGB-N 15 -TTCGB; B represents C, G, or T) was searched in X. campestris pv. raphani strain CFBP 5828R in JBrowse. Genes with PIP boxes were manually inspected with the following criteria: presence of the PIP-box motif in the sense orientation relative to the transcriptional unit and situated in the 500-bp region upstream of the transcriptional start. Presence of operons was estimated from RNA sequencing data.

Gene Ontology enrichment studies
Enrichment of specific Gene Ontology terms in gene lists was tested using TOPGO R package (v1.14.0) [69].

Extracellular protease assay
Extracellular protease activity of Xanthomonas strains was tested by spotting 10μl of an overnight culture (OD 600nm = 0.4) on MOKA plates supplemented with 1 % skimmed milk. Plates were incubated at 28°C and imaged three days post inoculation.
analysed the results. MFJ performed the statistical analyses. ML performed the protease assays. MAJ, OP, MA and RK conceived the study and applied for funding. LG, MAJ, LDN and RK designed the experiments. LDN supervised the study, analysed the results and drafted the manuscript. BR, SB, EG, ND, ML, MFJ, MFLS, PP, MAJ, LG, EL, MA, SC, RK and LDN revised the manuscript. All authors read and approved the final manuscript.