Use of suppression subtractive hybridisation to extend our knowledge of genome diversity in Campylobacter jejuni

Background Previous studies have sought to identify a link between the distribution of variable genes amongst isolates of Campylobacter jejuni and particular host preferences. The genomic sequence data available currently was obtained using only isolates from human or chicken hosts. In order to identify variable genes present in isolates from alternative host species, five subtractions between C. jejuni isolates from different sources (rabbit, cattle, wild bird) were carried out, designed to assess genomic variability within and between common multilocus sequence type (MLST) clonal complexes (ST-21, ST-42, ST-45 and ST-61). Results The vast majority (97%) of the 195 subtracted sequences identified had a best BLASTX match with a Campylobacter protein. However, there was considerable variation within and between the four clonal complexes included in the subtractions. The distributions of eight variable sequences, including four with putative roles in the use of alternative terminal electron acceptors, amongst a panel of C. jejuni isolates representing diverse sources and STs, were determined. Conclusion There was a clear correlation between clonal complex and the distribution of the metabolic genes. In contrast, there was no evidence to support the hypothesis that the distribution of such genes may be related to host preference. The other variable genes studied were also generally distributed according to MLST type. Thus, we found little evidence for widespread horizontal gene transfer between clonal complexes involving these genes.


Background
Infection due to Campylobacter sp. is one of the major causes of diarrheal disease worldwide and is the most common source of bacterial gastroenteritis [1]. Although transmission of Campylobacter occurs mainly through the consumption of livestock, with poultry being the most common source, C. jejuni has been isolated from diverse animal, human and environmental sources. Several recent studies have sought to determine clonal prevalences amongst isolates from these diverse sources by applying multilocus sequence typing (MLST) [2][3][4][5][6][7]. Whilst some MLST clonal complexes, such as the ST-21 complex, are widespread, others, such as the ST-61 and ST-42 com-plexes, have a more restricted distribution amongst different host animals, including humans [4,7].
The complete genome sequences have been published for the C. jejuni strains NCTC11168 [8], RM1221 [9] and C. jejuni strain 81-176 [10], including its pTet and pVir plasmids [11]. Genome sequences have also been published for strains of C. lari, C. coli and C. upsaliensis [9] and further genome sequence projects for several other Campylobacter strains are ongoing [12]. Although unpublished, the genome of C. fetus is also complete. In addition, strainspecific DNA sequences have been identified for C. jejuni ATCC43431 [13].
There has been considerable interest in characterising genetic variation between isolates of C. jejuni with a view to identifying those genes relevant to severity of disease or host colonisation. Inter-strain variations in loci such as those encoding lipooligosaccharide (LOS) [14], capsule [14] or restriction-modification (RM) systems [15] have already been characterised. Comparative genome analyses using microarrays, based largely upon the strain NCTC11168 genome sequence [16][17][18][19], indicate high levels of genome diversity but low levels of genome plasticity in C. jejuni [20]. Recently it has been suggested that this kind of approach can help to identify genetic markers predictive of the source of an infection [21].
Since these studies were mostly restricted to those genes present in a single strain, NCTC11168, further identification of such markers would be greatly facilitated by the construction of microarrays containing all potentially variable genes. Genes contributing to plasticity amongst C. jejuni populations have been identified from the genomes of strains such as RM1221, 81-176 and ATCC43431 [9,11,13,16]. However, it is not clear whether the widely accessible nucleotide and protein sequence databases are representative of the variable genes that occur in C. jejuni.
Suppression subtractive hybridisation (SSH) is a method designed to identify sequences present in one strain (the TESTER) but absent from a reference strain (the DRIVER) [22,23] and has been applied previously to identify genetic differences between two C. jejuni human isolates with different colonisation potentials [24]. In this study we used several rounds of SSH between and within C. jejuni MLST clonal complexes, using isolates from various sources, in order to gain a better understanding of the genomic variability that remains uncharacterised in populations of C. jejuni. Furthermore, we tested the hypothesis that the distribution of variable genes involved in the use of alternative terminal electron acceptors may be related to host preference.

SSH libraries
Our initial choice of restriction enzymes for the digestion of DNAs in the SSH was based on a previous study of C. jejuni [24]. A summary of all of the SSH sequences obtained is shown in Table 1. The SSH libraries obtained with AluI/DraI-digested DNAs were dominated by very short DNA sequences (92% were <400 bp). In order to obtain longer sequences we carried out two further subtractions using RsaI-digested DNAs. This reduced the proportion of sequences <400 bp in length. The majority of subtracted sequences from strains 670, 504 and 1967 matched sequences previously found in C. jejuni strains RM1221 or 81-176. Thus, in an attempt to enrich for sequences not previously reported, the final two subtractions were carried out with strain 1967 (ST-42 complex) as an additional driver to strain NCTC11168 (ST-21 complex). This proved successful in that the majority of SSH sequences no longer matched strains RM1221 or 81-176. However, the overall proportion of SSH sequences matching previously reported Campylobacter sp. was not reduced significantly. The combination of enrichment and the choice of an enzyme giving longer SSH fragments resulted in fewer SSH sequences overall (Table 1).
In total, 195 subtracted sequences were obtained. Details of these are given in Additional File 1. These SSH sequences were all confirmed as absent from the driver strain C. jejuni NCTC11168, even though occasionally the best BLASTX match in the database was against this strain. Furthermore, the second driver strain, strain 1967, was PCR-negative for nine of 11 sequences tested from the two dual driver subtractions. Of the 195 SSH sequences obtained only two (1%) had a best BLASTX match with proteins from outside the genus Campylobacter. A further three SSH sequences (2%) had no significant BLASTX match. Thus, the vast majority of SSH sequences matched within the genus Campylobacter. However, it should be noted that eight SSH sequences (4%), although having a best BLASTX match with a Campylobacter sp. protein, shared less than 80% identity with the matching protein.
Those SSH sequences matching outside the genus Campylobacter, matching Campylobacter sequences with <80% identity, or used in distribution analysis are shown in Table 2.
Both the mean and the median % GC content for the 195 subtracted sequences was 29%, with a range of 18-42%. The mean % GC contents varied little between the subtractions (28.9-30.8%). Thus, the average % GC contents for the subtracted sequences was only slightly below the values reported for the genomes of the C. jejuni strains NCTC11168 (30.6%) [8] or RM1221 (30.3%) [9].
Of the three subtractions conducted using AluI/DraIdigested DNAs and a single driver strain (NCTC11168), two were intra-clonal complex and one was between clonal complexes. The numbers of subtracted sequences obtained did not vary significantly between these subtractions (Table 1). However, there were some variations in the putative functions of subtracted sequences. Plasmid and bacteriophage-associated sequences accounted for 73% of those sequences differing between the two ST-21 complex strains 670 and NCTC11168, and 16% of those sequences differing between the two ST-21 complex strains 504 and NCTC11168, but only 2% of those sequences differing between the ST-42 complex strain 1967 and strain NCTC11168. The largest group of subtracted sequences for the inter-clonal complex subtraction were those associated with metabolism/biosynthesis (29%). The SSH data suggest that there are both intra-and inter-clonal complex variations in genes associated with LOS, capsule, flagella/motility, membrane/transport and metabolism (Table 1).
It has been demonstrated that SSH is an effective method for analysing genetic differences between related strains. In a previous study using SSH, Agron et al. [22] were able to detect most of the 7% of genomic differences between two closely related, fully sequenced strains of Helicobacter pylori. Unlike in our study, the authors used four parallel subtractions with different restriction enzymes. However, they further demonstrated that as tester-specific sequences became limiting the proportion of repeat fragments The number of isolates fitting into each category is presented for each of the five subtractions. ID; protein sequence identity increased [22]. Because of our use of single restriction digestion conditions for each subtraction, we cannot claim the extent of coverage achieved by Agron et al. [22], but the increased proportion of repeat sequences obtained with the dual driver subtractions is indicative of a reduction in the overall pool of tester-specific sequences in these subtractions. It is our belief that we have detected a significant proportion of the genetic variation between the strains and that our data indicate that there are very few genes present in C. jejuni strains with best BLASTX matches outside the genus rather than with already sequenced genes/proteins from within the genus.

SSH sequences matching outside the genus Campylobacter
The two sequences matching outside the genus Campylobacter were a putative transport protein (SSH629-D8) and a putative phosphodiesterase (SSH629-D9) ( Table 2). These SSH sequences have been submitted to GenBank (EF076761 and EF076762). The predicted protein sequence derived from SSH629-D8 matched a transport protein from Rhodopseudomonas palustris, predicted to be a cytoplasmic component of an ABC-type sugar transport system. The gene encoding the phosphodiesterase enzyme from Delftia acidovorans matching SSH629-D9 has been cloned and expressed in Escherichia coli. It shares sequence similarity to cyclic AMP (cAMP) phosphodiesterase and cyclic nucleotide phosphodiesterases and exhibited activity on cAMP in vivo [25]. Using dot-blot hybridization, we determined the distribution of these two SSH sequences amongst a panel of C. jejuni isolates representative of common MLST clonal complexes and various sources ( Table 3). The sequences SSH629-D8 and SSH629-D9 were found only in the ST-45 complex (Tables 3 and 4). It should be noted that SSH629-D8 appears to be part of a pseudogene containing a frame-shift mutation. We used PCR amplification to amplify this region from four of the C. jejuni isolates testing positive for SSH629-D8, including strain 629, and confirmed that this mutation is genuinely present and not an artefact of the SSH procedure.
Using dot-blot hybridization we determined the distribution of the sequence 670-D7 as an indicator of the prevalence of the pTet/pCC31 plasmid amongst our strain panel ( Table 1). The sequence was found in three isolates from the ST-21 complex and four isolates from the ST-61 complex.

Plasticity regions and variable genes
Most of the subtractions identified variations in loci previously reported as variable amongst C. jejuni strains. It has been reported that variable genes (absent or highly divergent) in the Campylobacter genome map to discrete areas, termed variously plasticity regions, hypervariable regions or plasticity zones [16,17,21,29]. Indeed the genotypes of C. jejuni strains have been shown to be inherently unstable, and recombination events occur between unrelated strains both in vitro and in vivo even in the absence of selective pressure [30]. Pearson et al. [16] identified seven hypervariable plasticity regions (PR1-7) in the genome of C. jejuni NCTC11168. PR6 contains the capsule biosynthesis locus, flanked by capsule transport genes. Twelve SSH sequences in all matched capsulerelated proteins. PR7 contains numerous putative outer membrane proteins but also carries a divergent gene encoding a probable flagellar hook protein [16]. Other motility-related genes, including flagellin and flagellinglycosylation genes, are carried by PR5, along with LOS biosynthesis genes. SSH sequences matching flagellar hook proteins (FlgE), flagellins, flagellin glycosylation proteins and LOS biosynthesis proteins were all identified amongst the SSH sequences in our study. PR4 includes glycosyltransferases and galactosyltransferases of unknown functions as well as genes involved in the sialylation of LOS. SSH sequences matching proteins with similar putative functions were found with both the strain 961 and strain 1967 subtractions. In addition, the 629 subtracted library included one sequence matching both the NCTC11168 gmhA gene (from PR4) and a polysaccharide-related protein (SSH629-21).
RM systems protect bacteria from foreign DNA and may impact on the transfer of genes responsible for virulence or host colonisation. It has also been demonstrated previously that RM genes vary in Campylobacter spp. [15] and RM genes were amongst those identified in a previous study using SSH [24]. The first three subtractions in our study all identified RM-related SSH sequences. Ahmed et al. [24] also identified subtracted sequences relating to arsenite-metabolising genes as present in strain 81-116 but absent from strain NCTC11168, noting that phenylarsonic compounds have been used in poultry feed and may have contaminated agricultural lands on which poultry litter has been used as manure. We identified an SSH matching the arsenical resistance protein of RM1221 in the subtracted library of strain 629 (SSH629-23), a rabbit isolate. In addition, Ahmed et al. [24] identified a putative γ-glutamyl transpeptidase gene in their subtraction.
Equivalent genes have been implicated in a role in colonization of the gastric mucosa by Helicobacter pylori [31,32]. We also identified SSH sequences matching a γ-glutamyl transpeptidase protein in strain 629. It has been reported that C. jejuni 81-176 carries a putative gene encoding a serine protease belonging to the autotransporter family [10]. The SSH indicates that this gene is also present in the cattle isolate 1967 (SSH1967-A5 and SSH1967-G6). C. jejuni 81-176 and isolates 1967 are both members of the ST-42 clonal complex. We determined the distribution of SSH1967-A5 amongst our strain panel. All ST-42 isolates were positive for SSH1967-A5. Of the other isolates, only one ST-45 cattle isolate contained SSH1967-A5 (Tables 3  and 4).
We also determined the distribution of the SSH sequence SSH961-A9, which matches an enzyme from C. jejuni subsp. doylei (Table 2). With the exception of three isolates, the distribution was also restricted to the clonal complex of the tester strain (ST-61 complex; Tables 3 and  4).

Metabolic genes
When oxygen levels are low C. jejuni has the capacity to utilise a wide range of electron acceptors, including fumarate, nitrate, nitrite, sulfite, trimethylamine-N-oxide (TMAO) and dimethyl sulfoxide (DMSO) [33][34][35]. In strain NCTC11168 PR1 contains genes encoding the transport apparatus for molybdenum, which has a putative role in the reduction of nitrate as an alternative terminal electron acceptor [33]. Just upstream of PR1 lies the gene (Cj0264c) encoding the reductase responsible for reduction of TMAO and DMSO under oxygen limiting conditions [33]. Whilst lacking the Cj0264c gene or its close homologues, C. jejuni 81-176 carries an alternative DMSO reductase gene cluster (dmsABC-torD), and an additional cytochrome C biogenenesis gene cluster (cytC locus; cju02-09) [10]. Cytochrome C may be an important link between the menaquinine pool and alternative terminal electron acceptors such as DMSO and TMAO [33]. It has been suggested that the capacity of Campylobacter isolates to utilise alternative electron acceptors may contribute to selective advantages in specific ecological niches [16], and that the presence of additional respiratory capabilities may contribute to the efficiency of colonisation of highly pathogenic strains such as C. jejuni 81-176 [10]. Thus, it is possible that the presence or absence of particular genes or islands contributing to growth in microaerophilic environments may influence host preference. Hofreuter et al. (2006) recently provided some evidence for this when demonstrating that a dmsA mutant of C. jejuni 81-176 colonised mice less well than its wildtype equivalent in a mixed infection model.
We identified several strain-variable SSH sequences matching putative enzymes with roles in electron transport using alternative terminal electron acceptors. The SSH library constructed from strain 629 (rabbit isolate) included SSH sequences with best BLASTX matches against genes in the cytochrome C biogenesis cluster of C. jejuni 81-176 (including SSH629-C10). The SSH library constructed from strain 1967 (cattle isolate) included numerous SSH sequences (including SSH1967-D9) with a best BLASTX match against the dmsABC genes of C. jejuni 81-176. The SSH libraries constructed from strains 1967, 670 and strain 504 each contained SSH sequences (SSH1967-C2, SSH670-B10 and SSH504-C10) matching a pyridine nucleotide disulfide oxidoreductase from C. jejuni 84-25 belonging to a family of enzymes that can play a role in electron transport. SSH sequences SSH670-B10 and SSH504-C10 were identical to each other.
In order to test the hypothesis that the distribution of such metabolic genes may be related to host preference, we chose to screen a panel of isolates varying in MLST type and source for the presence or absence of four representative sequences (Tables 3 and 4). SSH1967-H9 and SSH1967-C10 were chosen to represent the alternative DMSO reductase gene cluster (dmsABC-torD), and the additional cytochrome C biogenenesis gene cluster (cytC locus; cju02-09), of C. jejuni 81-176 respectively. SSH1967-C2 was chosen to represent the pyridine nucleotide disulfide oxidoreductase described in C. jejuni 84-25. In addition, the Cj0264c gene of strain NCTC11168, encoding a reductase, was included. There was a strong correlation between the distribution of the two reductase-related sequences (SSH1967-H9 and 1967-C2) or the cytochrome C biogenesis-related sequence (SSH629-C10) and the MLST clonal complex. It has been reported previously that the C. jejuni NCTC11168 ORF Cj0264c, which encodes the sole TMAO and DMSO reductase in this strain [33], was absent or highly divergent in 10 strains amongst a panel of 18 C. jejuini strains from diverse sources [16]. Our data indicated that this gene was widespread amongst all clonal complexes in our panel of strains, with the exception of the ST-257 complex (Tables 3 and 4).
In our study, there was no evidence for an association between the presence of a particular gene associated with metabolism using alternative electron acceptors and the source of the isolate, suggesting that the presence or absence of these alternative metabolic genes does not play a significant role in niche preferences. However, it should be noted that our strain panel was dominated by isolates from cattle and human sources. Thus we cannot completely rule out such associations with other host species. In a study of gene expression variations by different variants of C. jejuni NCTC11168 it was reported that many of the differences in gene expression were in respiration and metabolism genes [36]. The authors suggested that adaptation to different oxygen tensions may influence colonisation potential. The gene expression profiles compared were those of C. jejuni NCTC 11168-GS, the genomesequenced isolate, and NCTC 11168-O, the original isolate from which NCTC11168-GS was derived. Of the two, isolate NCTC 11168-O is a much better coloniser of chicks and invades tissue culture cells far more efficiently [36]. Under microaerobic and severely oxygen-limited conditions there were marked difference in the expression of genes associated with metabolism and respiration. Although ORF Cj0264c was not amongst those loci expressed differently between the two variants of NCTC 11168, such observations support the notion that rather than the presence or absence of metabolic and respiratory genes per se, variations in expression may be more relevant to niche preference.

Conclusion
In this study we have demonstrated that by broadening the range of clonal complexes and host sources of C. jejuni isolates submitted to genetic interrogation, we did not greatly increase the pool of identified strain-variable genes. This suggests that the current database already con-tains most of the diversity within this species. However, it should also be noted that genes currently associated with other closely related species, such as C. coli, can also make a contribution to the diversity within C. jejuni.
It is clear from our subtractions between ST-21 complex isolates that variation in gene content occurs within as well as between clonal complexes. However, our survey of strain-variable sequences, including four associated with genes involved in the use of alternative terminal electron acceptors, indicated a distribution according to clonal complex rather than host source. Thus, we found no evidence that the presence or absence of such genes plays a role in the host preferences of C. jejuni strains.

Bacterial strains
The bacterial strains used in this study are listed in Table  3 and were isolated in a previous study [7]. Strains for SSH were chosen to enable comparisons within (ST-21 complex) and between MLST clonal complexes. In addition, isolates from diverse animal host sources were chosen (bird, cattle and rabbit; Table 1). These differed from the sources of previously sequenced strains of C. jejuni (human or chicken). A panel of isolates representing common MLST clonal complexes (ST-21, ST-42, ST-45, ST-48, ST-257 and ST-61) and different sources were used to study the distribution of subtracted sequences ( Table 3). The bacteria were cultured on blood agar at 37°C under microaerophilic conditions.

MLST
MLST alleles, STs and clonal complexes were assigned using the Campylobacter PubMLST database [37] with sequences submitted for allele designation as appropriate.

Construction and screening of subtraction libraries
Genomic DNA for SSH was isolated from C. jejuni strains 670, 504, 1967, 629, 961 and NCTC11168 as described previously [38]. SSH was carried out using the CLON-TECH PCR-Select™ Bacterial Genome Subtraction Kit (Clontech) essentially as recommended by the supplier but with a hybridisation temperature of 58°C. In the first three hybridisations, DNAs from C. jejuni strains 670, 504 and 1967 respectively were used as tester and DNA from C. jejuni NCTC11168 was used as the driver. All DNAs were digested with AluI and DraI. A further two hybridisations, using RsaI-digested DNA from C. jejuni strains 629 and 961 respectively as tester, were carried out with dual RsaI-digested driver DNAs from strains 1967 and NCTC11168. PCR amplicons obtained following SSH were cloned into pGEM-T (Invitrogen). The subtraction libraries of AluI/DraI or RsaI fragments thus constructed were screened by sequencing of plasmid DNA extracted from individual clones using M13 forward and reverse vector primers (Lark Technologies). BLAST searches at the C. jejuni NCTC11168 genome project web site [39] were used to determine the presence or absence of sequences in the NCTC11168 genome. Sequences absent from the genome of C. jejuni NCTC11168 were further analysed using BLASTN and BLASTX searches of the general database using the NCBI website [40] (last accessed 13 th October 2006).

PCR amplification and dot-blot screening of strains
Oligonucleotide primers (Sigma-Genosys) for PCR amplifications are listed in Table 5  Dot blot hybridisation of genomic DNA was carried out as described previously [38]. Digoxigenin-11-2'-dUTP (DIG) (Roche)-labelled probes were made by carrying out PCR amplification in the presence of 60 µM DIG using vector or internal primers. Hybridisation and subsequent detection of DIG was carried out following the manufacturer's instructions (Roche).
All SSH sequence distributions were determined by dotblot analysis, with the exception of 629-C10, for which a combination of PCR screening and Southern blots was used. This was because background hybridisation made the 629-C10 dot-blots difficult to interpret. A probe for flaA was used in dot blots to confirm the presence of DNA for each strain tested.