- Research article
- Open Access
Complete genome sequence of the Clostridium difficile laboratory strain 630Δerm reveals differences from strain 630, including translocation of the mobile element CTn5
BMC Genomicsvolume 16, Article number: 31 (2015)
Clostridium difficile strain 630Δerm is a spontaneous erythromycin sensitive derivative of the reference strain 630 obtained by serial passaging in antibiotic-free media. It is widely used as a defined and tractable C. difficile strain. Though largely similar to the ancestral strain, it demonstrates phenotypic differences that might be the result of underlying genetic changes. Here, we performed a de novo assembly based on single-molecule real-time sequencing and an analysis of major methylation patterns.
In addition to single nucleotide polymorphisms and various indels, we found that the mobile element CTn5 is present in the gene encoding the methyltransferase rumA rather than adhesin CD1844 where it is located in the reference strain.
Together, the genetic features identified in this study may help to explain at least part of the phenotypic differences. The annotated genome sequence of this lab strain, including the first analysis of major methylation patterns, will be a valuable resource for genetic research on C. difficile.
Clostridium difficile is a Gram-positive, anaerobic bacterium that can asymptomatically colonize the intestine of humans and other mammals. It was originally identified as part of the intestinal microbiota of healthy infants . However, when the normal flora is disturbed – for instance as a result of antibiotic treatment – C. difficile can overgrow and cause potentially fatal disease [2,3]. The main virulence factors are toxins A and B, that are encoded on a chromosomal region called the pathogenicity locus (PaLoc) , but other factors are also likely to play a role . Recent years have seen an increase in the incidence and severity of C. difficile infections, for reasons that are only partially understood [6,7].
In 2006, the first genome sequence of a C. difficile strain was published . This multi-resistant strain, designated 630, was isolated from a patient with severe pseudomembranous colitis and caused an outbreak of diarrheal disease in a Swiss hospital . Analysis of the 630 genome sequence revealed that approximately 11% consists of mobile genetic elements . The majority of these elements are conjugative transposons of the Tn916 and Tn1549 families called CTns, which have the ability to excise from their genomic target sites and transpose intra- or intercellularly [8,10]. Exchange of mobile elements occurs frequently and contributes to the plasticity of the genome of C. difficile [8,11,12]. Functions encoded on conjugative transposons can contribute to environmental adaptation and antimicrobial resistance [10,13]. In C. difficile, transfer of the conjugative elements CTn1, CTn2, CTn4, CT5 and CTn7 from strain 630 into a non-toxogenic strain has been shown . Transfer of CTn3 (Tn5397), harboring a tetracycline resistance gene, has been demonstrated between species [14,15]. CTn1, CTn3, CTn6 and CTn7 are related to Tn916, based on their conjugation module [8,13]. CTn2,CTn4 and CTn5 are all part of the Tn1549 family, based on DNA sequence homology, and their accessory modules code for uncharacterized ABC-transporters [8,10]. Recently it has been shown that these CTn’s may also be responsible for transfer of the PaLoc on large chromosomal fragments .
After the demonstration of conjugative transfer from DNA from Escherichia coli to C. difficile , genetic tools were developed for C. difficile. To facilitate the genetic manipulation, an erythromycin sensitive variant was derived from strain 630 by serial passaging . This strain is particularly useful for generation of insertional mutants using ClosTron that employs a retrotransposition activated erythromycin resistance marker (ermRAM ). Recently, allelic exchange methods have been developed for C. difficile [20,21]. The efficiency of both methods depends on the accuracy of the genome sequence for selection of target sites and recombination events. However, no comprehensive mapping of differences between the lab- and reference strains has been published to date.
The most notable phenotypic difference between 630 and 630Δerm, erythromycin resistance, was found to be the result of a 2.4 kb deletion in the mobile genetic element Tn5398 that eliminates an ermB gene . This explains at least in part the different behavior of the two strains in a Golden Syrian hamster model of acute disease , as animals are generally sensitized to C. difficile with a clindamycin treatment (ermB is an rRNA adenine N-6-methyltransferase that also confers resistance to clindamycin). At a genetic level, another difference between the two strains reported to date is a duplication in the master regulator of sporulation, spo0A, that is apparently without phenotypic consequences .
In another Gram-positive bacterium, Bacillus subtilis, phenotypic differences between the ancestral strain NCIB3610 and widely used laboratory strains have been linked to specific genetic differences [24-26]. A detailed map of the genetic differences between the C. difficile strains 630 and 630Δerm could therefore not only facilitate genetic manipulation, but also form the basis for the investigation of phenotypic differences between these strains.
Results and discussion
Reference assembly of the 630Δerm genome reveals four breakpoints
We set out to investigate differences between the laboratory strain 630Δerm and reference strain 630 by performing short-read next generation sequencing on the Illumina HiSeq platform. Based on the report that the erythromycin sensitivity of strain 630Δerm is due to a 2.4 kb deletion in Tn5398, we examined this region of the reference alignment. The analysis revealed the absence of reads mapping to the CD2007A and CD2008 genes which are located in the expected deletion . Reads that mapped to CD2007 (erm2(B)/ermB1), the main erythromycin resistance determinant in strain 630  are likely due to the fact that this gene shares 100% nucleotide identity with CD2010 (erm1(B)/ermB), which is still present. This is supported by the observation that the coverage of both these genes is approximately 2-fold lower than the immediate surrounding regions (Figure 1A). Notably, the reference assembly failed to identify the previously identified duplication in spo0A  (data not shown).
A further analysis of the reference assembly against a linearized 630 genome revealed four breakpoints (regions with discordantly mapped read-pairs). The first breakpoint is consistent with a deletion of ~70 bp. The remaining breakpoints are consistent with a transposition event, in which the transposed sequence is re-inserted elsewhere in the genome and in the inverse orientation compared to the reference (Figure 1B).
De novo assembly of the 630Δerm genome using third generation sequencing
Based on the identification of a potential transposition event, and our previous finding that indels may have occurred that are difficult to detect using short reads, we decided to perform an unbiased, de novo, assembly of the 630Δerm genome using single-molecule real-time sequencing. The Pacific Biosciences RSII system is capable of generating large reads, and with sufficient coverage, can generate high quality single contigs for bacterial genome sequences. We sequenced a genomic library of strain 630Δerm on two SMRT cells, and validated the resulting single contig with a third SMRT cell. The resulting genome consists of 4,293,049 basepairs, with an average GC content of 29.08% and an estimated coverage of 158× (Figure 2A). We generated an annotated version of this genome by transferring the most recent version of the 630 annotation [EMBL:AM180355] , updating it with recent gene annotations from literature and incorporating qualifiers in the file to indicate specific features of 630Δerm. The annotated sequence has been deposited under accession number EMBL:LN614756.
Satisfyingly, our unbiased approach identified the 18-bp duplication in the spo0A gene, encoding the master regulator of sporulation, which we previously found  (Figure 2B). This demonstrates that the third generation sequencing approach is superior to Illumina in identifying this type of difference. In addition, we could confirm the expected 2.4 kb deletion in Tn5398 (Figure 2C). The sequence of Tn5398∆E which we determined shows 4 Single Nucleotide Polymorphisms (SNPs) compared to an in silico generated theoretical sequence of Tn5398∆E (based on Hussain et al.) . As a result of these differences, a progressiveMAUVE  alignment of the Tn5398ΔE element from our strain with Tn5398 of strain 630 demonstrates the deletion of CD2010 (ermB1/erm(1)B), CD2009A (ORF3), CD2009 (fragment of a putative topoisomerase), CD2008 (ORF298) and most of CD2007A. This effectively removes the region between the two copies of ermB. The most likely scenario by which this occurred is through recombination between the two ermB genes or their immediate surrounding region; the sequence information is unable to determine the exact site of recombination, as these regions are identical, and the copies of ermB and ORF3 in 630Δerm may therefore represent hybrids of CD2007/CD2010 or CD2006A/CD2009A, respectively. To reflect the results of the alignment as well as the mechanism described above, we have chosen to rename the ermB gene of strain 630Δerm CD2007B/ermB (locustag: CD630Derm_20072) and ORF3 as CD2006B (locustag: CD630Derm_20062). The resulting arrangement suggests that CD2007B is potentially expressed, as it is fused to the promoter region of CD2010/ermB1 at the exact same location, though the strain remains erythromycin sensitive. This discrepancy has been noted since the isolation of 630Δerm , and cannot be resolved using the sequence information from our study.
We also identified short tandem repeats (>90% nucleotide identity) up to 500 bp. Strikingly, the genome analysis revealed two regions of high repeat density (Figure 2A). The first region (approximately 0.6 Mb-0.9 Mb) includes the PaLoc that encodes toxins A and B. This region was found to be capable of transfer by a conjugation like mechanism  and it is tempting to speculate that the high repeat density may contribute to this phenomenon. The second region (approximately 3.6 Mb-3.75 Mb) contains many genes involved in sugar metabolism, but does not seem to be associated with annotated or characterized mobile elements. Large repeats (>95% identity and >500 bp in length) generally coincide with regions of high-GC content, and mainly reflect ribosomal gene clusters.
Analysis of m6A and m4C methylation patterns of C. difficile
In bacteria, post-replicative addition of a methyl group to a base by a DNA methyltransferase can result in the formation of N6-methyladenine (m6A), C5-methylcytosine (m5C) and N4-methylcytosine (m4C) [29,30]. These modified bases play a role in restriction/modification systems, or may regulate cellular processes (reviewed in [30-33]).
There is little information on methylation of chromosomal DNA in C. difficile. Five methylases have been identified in C. difficile 630 , but in vivo methylation patterns have not been characterized. We took advantage of the pulse profiles of the Pacific Biosciences RSII reads that hold information about base modifications [35,36] to generate the first comprehensive analysis of methylation patterns in C. difficile (Figure 3A).
m6A modifications can be identified with high confidence and the vast majority of the these modifications (7288/7687 = 95%) were associated with the motif CAAAAA, in which the last adenine residue is modified (Figure 3B). Previous studies identified a single methylase, M.Cdi25 (corresponding to CD2758) with homology to adenine specific methylases, but failed to identify its target site in restriction protection experiments . We postulate that CD2758 recognizes and methylates last adenine residue the CAAAAA motif and that this is possibly the only adenine-methylase in C. difficile 630Δerm.
The pulse profiles of the Pacific Biosciences RSII reads also identify modified cytosines. Only a fraction of these are positively identified as m4C, in part due effect of modifications that are in close proximity to each other on the pulse profiles [36,37]. We did not further investigate m5C modifications, as they can only reliably be detected on the Pacific Biosciences platform after Tet1-treatment, by preparation of shorter library fragments that are not ideal for genome de novo assembly, and with much higher coverage than obtained in our experiment . Unspecified modifications may therefore represent m4C, and possibly m5C or other modifications.
The SMRT Portal identified the motif GCAGCAGC, in which the first cytosine residue is modified, as overrepresented in the methylcytosine dataset (Figure 3B). This motif is remarkably similar to the GCWGC motif identified for the M.Cdi1226 methylase (CD3147) . We could identify 146 instances of m4C methylation and 16 of those contained the motif (11%). When a DREME search was performed  using 41 bp sequences centered on m4C only, a highly similar motif (GCAGCR) was found in 33 instances. Moreover, none of the other motifs (see below) were specifically linked to m4C modifications, suggesting that many if not all of the m4C modifications are due to CD3147.
m4C and m6A methylations that were not associated with the overrepresented motifs seemed to correspond to regions of high GC-content, including the mobile elements CTn1, CTn2 and CTn4 (Figure 3).
We also evaluated motifs previously identified as putative target sites for the other three cytosine specific methylases of C. difficile, M.Cdi633 (CD0935), M.Cdi587 (CD0927) and M.Cdi824 (CD1109) . CD0935 conferred partial protection against digestion with BalI (target site: TGGCCA). Our data did not show any modifications on cytosine or adenine residues of this motif anywhere in the genome (n = 396). Considering that we cannot reliably detect m5C modifications in our setup, it is possible that M.Cdi633is an m5C specific methylase. CD0927 could confer protection against Sau96I (target site: GGNCC) in E. coli, but C. difficile chromosomal DNA is only partially resistant to Sau96I digestion . We found only very low levels (~0.1%) of modified cytosines for this motif (n = 3824) in 630Δerm, which together with the earlier observations suggests that CD0927 is either minor m4C or a m5C methylase. CD1109 conferred protection against SmaI (which recognizes CCCGGG). We found that 6/60 (10%) of the motifs contained a modified cytosine at the third position. These modifications are likely m4C’s that cannot be positively identified as m4C due to adjacent modified bases.
C. difficile chromosomal DNA is wholly resistant to TseI (target site: GCWGC) and SmaI (target site (CCCGGG), though we only detected modifications for ~10% of the occurrences of these motifs. This may be due to only a fraction of the methylcytosine modifications being called by the Pacific Biosciences SMRT platform in our analyses.
The function of the methylases of C. difficile is unknown. None seem associated with an endonuclease, indicating they are not likely to be part of a restriction-modification system. Consistent with this, no effect on conjugation efficiency was observed . CD0927 and CD0935 are part of prophage 1, and CD1109 is present on the CTn4 element, suggesting they may play a role in the biology of mobile elements.
Comparison of the complete genome of 630Δerm with strain 630 reveals SNPs, indels and rearrangements
It is likely that more than the two previously identified differences (Δerm deletion and 18 bp duplication in spo0A) exist between strain 630 and strain 630Δerm. We therefore compared our de novo assembled genome to the reference sequence.
We identified 71 differences between the two strains. These encompass 8 deletions (including the Δerm mutation) , 10 insertions (including the duplication in spo0A) , 2 insertion-deletions, 50 substitutions and 1 region of complex structural variation (Additional file 1). Of these, 23 were located intergenically. This includes a 102 bp deletion which likely corresponds to the breakpoint at 0.68 Mb identified in the short read next generation sequencing (Figure 1B). A complete list of identified structural variants is available as (Additional file 1).
23 of the identified differences are associated with rRNA sequences. We found that strain 630Δerm has acquired an extra ~5 kb rRNA/tRNA cluster that is inserted between CD0011 and CD0012 compared to strain 630 (Table 1, Figure 4). Copy number variations in rRNA operons have previously been noted for C. difficile  and may reflect an adaptation to favorable growth conditions in the laboratory. Similar to rRNA operon 6, this operon contains tRNALeu and tRNAMet genes downstream of the 23S rRNA gene, but the intergenic spacer region (ISR) between the 16S and 23S rRNA genes does not contain a tRNAAla. A detailed comparison of the ISRs of the different rRNA operons is provided as Additional file 2. A striking number of differences were found in rRNA operon 11 (Figure 4). As observed previously , the sequence variations cluster in the 3’ region of the 16S rRNA and 5’ of the 23S rRNA genes.
We focused our further analysis on the 26 variants that are associated with annotated pseudogenes or open reading frames (Table 1). A 24 bp deletion in CD0632, a conserved protein of unknown function, shortens the arginine-alanine repeat in this protein by 8 amino acids. In two cases, a single basepair insertion restores a pseudogene (CD1388 and CD3156A). This was confirmed by assembling the short read Illumina sequences against both the 630 reference genome and the de novo assembled 630Δerm genome, as a variant was identified in the former but not the latter. CD1388 encodes a putative regulatory protein with a helix-turn-helix motif and CD3156A a conserved protein of unknown function. Interestingly, both proteins encoded by these genes were previously identified in a proteomic analysis , indicating that they are expressed in strain 630Δerm. Two in-frame insertions were identified (an extra alanine residue in CD0514 and the published duplication in spo0A/CD1214). Out of 18 identified nucleotide substitutions, 9 were synonymous. These include SNPs in the gene encoding elongation factor Tu (tuf1/CD0058), ribosomal protein L50 (rplC/CD0073) and the putative aminotransferase CD2532. Strikingly, the CD0514 gene, encoding the cell wall protein cwpV [41,42], contains an unusually high density of mutations. In addition to the insertion and 5 synonymous mutations, it contains 2 non-synonymous but conservative mutations.
Other non-synonymous mutations are located in the putative ferric uptake regulator CD0826, the putative acyl-CoA N-acyltransferase CD1190, predicted glyceraldehyde-phosphate dehydrogenase CD1767 (gapB), ethanolamine utilization protein CD1907 (eutG), the hypothetical protein CD2627, the phosphotransferase system protein CD2667 (ptsG-BC) and the transcriptional regulator CD3565. In all these cases, the de novo assembly of the 630Δerm genome was clearly supported by the short read Illumina data.
CTn5 is present in the rumA gene in both 630Δerm (LUMC) and 630Δerm(UCL)
In an attempt to visualize the proposed transposition event (Figure 1B), we generated a dotplot of the genome sequence of our strain versus the reference (Figure 5A). It is immediately evident that the CTn5 element seems to have excised from its original location in CD1844 (encoding a putative cell wall adhesin) and has inserted in an inverted manner in rumA (CD3393) in our isolate of 630Δerm, for clarity hereafter referred to as 630Δerm(LUMC).
To exclude that the finding represents a misassembly in the original 630 genome sequence, and confirm the presence of CTn5 in rumA in 630Δerm (LUMC), we performed various control PCRs (Figure 5B). In strain 630, we found CTn5 inserted in CD1844 and confirmed an intact rumA gene. In contrast, in 630Δerm (LUMC), we detected no product for the left and right junctions of CTn5 in CD1844/CD1878A, indicating that the element is not present at this location. We readily amplified fragments corresponding to the left and right junction of CTn5 when inserted in rumA in C. difficile 630Δerm(LUMC), but not 630, chromosomal DNA. Interestingly, we observed a faint band corresponding to intact rumA even in strain 630Δerm (LUMC). This indicates that a subpopulation of cells does not contain CTn5 at this location, either because it has not inserted yet, or retains the ability to excise spontaneously as previously observed for 630 .
The CTn5 insertion site identified here is located immediately downstream of CTn7. A similar tandem arrangement has previously been observed in two clinical PCR ribotype 001 isolates [10,43]. In another clinical isolate (RT027), which lacked a CTn7-like element, a CTn5-like element was found to be integrated at a site homologous to the target site of CTn7 in 630 .
The annotation of CD3393 as rumA in C. difficile is based on homology of the predicted protein to E. coli RumA (also known as RlmD). This enzyme methylates a uracil nucleotide of the ribosomal RNA [44-46]. E. coli rumA mutants perform similarly compared with the wild type strain, in terms of cell growth, antibiotic resistance, and fidelity of translation. However, ΔrumA cells are outcompeted by wild type cells in growth competition assays, which may imply that ribosome function is moderately affected .
The translocation of CTn5 to rumA has two major consequences. First, the CD1844 gene, encoding a putative adhesin is restored. Second, the rumA open reading frame is fused to the CD1844A open reading frame resulting into a hybrid protein (CD3393A). CD1844A shows very high similarity (e-value 1e-62, 97% identity) to the C-terminus of an Enterococcus faecalis rumA homolog [EMBL:EOK00135.1]. However, the homology of C. difficile rumA to this gene is limited to the N-terminal TrmA-like domain (COG2265) (Figure 5B). Thus, a link between these open reading frames is also found in other organisms than C. difficile. In order to determine what the phenotypic consequences are of the transposition of CTn5 further experiments are required.
To further our understanding of the origin of the transposition event, we compared the location of CTn5 by PCR in different related strains; a non-passaged isolate of the original 630Δerm , hereafter referred to as 630Δerm (UCL), and another erythromycin sensitive derivative of 630, 630E/JIR8094 . We found that in strain 630E the element is present in CD1844/CD1878A, identical to the reference strain, suggesting that the transposition event is not linked to the loss of erythromycin resistance. The 630Δerm (UCL) strain shows prominent bands corresponding to CTn5 at its CD1844/CD1878A location, but also a weak signal for CTn5 at rumA (Figure 5C). Therefore, this isolate likely contains a subpopulation of cells with the transposition identified in this study. It is possible that CTn5 is stable at either location and the stock of the 630Δerm (UCL) is non-clonal, or that CTn5 in 630Δerm (UCL) is highly mobile. During redistribution of the strain, isolates with either insertion could have been selected.
In summary, our data show that integration of CTn5 can occur in at least two different sites in the C. difficile 630Δerm genome, and that the element can switch between these locations during repeated passaging.
The work presented here provides the first reference genome for the widely used C. difficile laboratory strain 630Δerm, including the first analysis of major methylation patterns for any C. difficile strain. Our work reveals that in addition to insertion, deletions and SNPs, the CTn5 element has moved from its original location within CD1844 to the rumA gene in our isolate. The observation of such a dramatic rearrangement has important implications for the redistribution of strains with highly mobile genomes and argues for complete resequencing of common lab strains in each laboratory.
Bacterial strains and growth conditions
Our isolate of strain 630Δerm was initially obtained from the Minton lab (University of Nottingham, Nottingham, UK), that in turn received it from the Mullany lab in which it was generated. For the purpose of resequencing the strain was cultured on prereduced CLO plates (Biomerieux), after which it was ented to BHI medium (Oxoid) supplemented with 0.5% yeast extract (Fluka).
Strain 630 was originally obtained from the Mastrantonio lab (Instituto Superiore di Sanità, Rome, Italy) and its use in our lab has been described before . The 630Δerm strain from the Mullany lab (UCL Eastman Dental Institute, London, UK), 630Δerm(UCL), was transported as a glycerol stock on dry ice. Strain 630E was a kind gift of Robert Britton (Michigan State University, East Lansing, MI, USA). All strains were cultured as described for our isolate of strain 630Δerm, which is referred to as 630Δerm (LUMC) where appropriate.
Isolation of chromosomal DNA
For PCR analysis, chromosomal DNA was isolated using the QiaAmp Blood&Tissue kit (Qiagen) according to the manufacturer’s instructions from growth obtained after streaking out the strain directly from the glycerol stock onto CLO plates (Biomerieux). For SMRT sequencing, high molecular weight DNA was isolated from 30 mL of an overnight culture, using the Qiagen GenomicTip 500/G, according to the manufacturer’s instructions. The quality of the DNA was checked on a Nanodrop ND-200 machine (ThermoFisher), the integrity by agarose gel electrophoresis, and the DNA was quantified on a Qubit instrument (Invitrogen).
Illumina sequencing and analysis
For Illumina sequencing, chromosomal DNA was isolated by Baseclear (Leiden, The Netherlands) from a pellet of bacterial cells derived from 50 mL culture. Data from 50 cycle 500 Mb paired-end read was delivered by Baseclear as 2 fastq files. Sequence reads have been deposited in the ENA Sequence Read Archive (EMBL:ERS550098). A preliminary analysis of the data was performed by aligning the paired-end reads to the reference genome of C. difficile strain 630 [GenBank:AM180355] using Geneious R7 (Biomatters, http://www.geneious.com). A more detailed analysis was performed using Stampy  and BWA . In a routine quality control (QC) procedure on verifying the alignment, QC metrics including insert-sizes, mapped reads, unmapped reads and reads that align with a deviated pattern (DP; discordant read alignments) were examined. The case where a significant amount of reads cannot align to the reference genome indicates an undefined sequence region in strain 630Δerm or a contamination of the library. In our case, a few regions with discordantly mapped read pairs (DP > 9) were identified (Additional file 3) and validated automatically (Additional file 4). Of the validated breakpoints, the first has matches with the end of the reference assembly and is therefore an artefact of assembling the reads against a linearized genome. This was confirmed by artificially breaking the circular chromosome at a different position and repeating the procedure. Visual inspection in the Integrative Genome Viewer tool  on the alignment track (BAM file) was used to determine the nature of the Structural Variations).
Pacific biosciences RSII sequencing and de novo assembly
For single molecule real-time sequencing, a SMRTbell DNA template library with an insert size of ~20 kb was prepared according to the manufacturer’s specification. To this end, chromosomal DNA was fragmented with G-tubes (Covaris). Subsequently, fragmented DNA was end-repaired and ligated to hairpin adapters. SMRT sequencing was carried out on the Pacific Biosciences RSII machine according to standard protocols (Magbead loading, 1×180 min). Sequence reads have been deposited in the ENA Sequence Read Archive (EMBL:ERS550016). Sequencing reads were corrected using the HGAP pipeline . Assembly was performed using Celera Assembler 8.1. We observed unbalanced coverage of two regions of approximately 18.5 kb of the reference genome. These regions were found to be nearly identical phages , and the unbalanced coverage therefore likely represents an artefact of the unsupervised assembly procedure using the default settings. To correct for this, the assembly was artificially broken into three contigs at these regions and was rejoined using the gap closure software PBJelly . The edited assembly was then validated using reads from a third SMRT cell and polished using Quiver, a consensus algorithm that is part of the SMRT Portal. Subsequently, the consensus sequence was circularized based on the reference sequence of the ancestral 630 strain. We noted that the Pacific Biosciences consensus caller struggles with homopolymeric stretches of adenines and thymines. Therefore a correction was carried out by performing a reference assembly of the short reads from the Illumina sequencing against the reclosed genome, yielding the final genome sequence. This sequence is available from EMBL (EMBL: LN614756).
In silico analysis of the 630Δerm genome sequence
To annotate the de novo assembled genome sequence, we first updated the most recent version of the C. difficile 630 genome sequence [EMBL:AM180355.1]  in Artemis [54,55]. Next, we imported the flat genome sequence of strain 630Δerm into Geneious R7 (Biomatters, http://www.geneious.com) and transferred the annotation using the “Live Annotate and Predict” function. The annotation track was manually curated to remove duplicate or missed annotations. The resulting file was saved as a GenBank file, further polished in a text editor and Artemis and submitted to the ENA archive. Genome wheel representations were prepared using Circos . Indels and single nucleotide polymorphisms were identified using the Pacific Biosciences variant caller using the genome of C. difficile strain 630  as a reference and further validated by MUMmer 3.0  and progressiveMAUVE . Subsequently a list of detected structural variants was manually curated (consensus between the alignment of Illumina and PacBio reads to the reference strain and the variants identified by MUMmer and progressiveMAUVE) as concordant description of differences in complex genomic regions could not be achieved by different methods. In addition, for all large structural variants dotplots were generated using Gepard 1.30  using FASTA formatted genome sequences of strains 630 and 630Δerm.
To identify modified bases, kinetic signals were processed for all genomic positions after aligning sequencing reads to the final single chromosome sequence of strain 630Δerm. In order to accurately identify the methylated bases, a threshold of 45 for log-transformed P values was used after optimizing according to its distribution and minimizing the false positive rate. Genomic positions and identity of the modifications were exported as a GFF file, and imported as a separate track in the genome sequence in Geneious R7. Subsequently, the identification of sequence motifs was performed using the SMRT Portal and sequence logos were prepared using Weblogo (http://weblogo.berkeley.edu/)  with 20 bp sequence flanking the modified base.
Analysis of CTn5 translocation
Translocation of CTn5 was confirmed by PCR using primers (Table 2) designed to amplify the left and right junctions of CTn5 as present in the C. difficile strain 630, as well as the rumA gene (Table 1) using Q5 polymerase (New England Biolabs). Cycling conditions were: initial denaturation 98°C 30 sec, 25 cycles 98°C 10 sec/60°C 30 sec/72°C 1 min 30 sec, and a final extension 72°C for 2 mins. Products were purified (GeneJet PCR purification kit, ThermoScientific) and run on a 0.5×TAE/1.2% agarose gel with a 1 kb + ladder (Fermentas). After staining with ethidium bromide, the DNA bands were visualized on a Geldoc system (Biorad).
Hall IC, O’Toole E. Intestinal flora in new-born infants: with a description of a new pathogenic anaerobe, Bacillus difficilis. Am J Dis Children. 1935;49:390–402.
Rupnik M, Wilcox MH, Gerding DN. Clostridium difficile infection: new developments in epidemiology and pathogenesis. Nat Rev Microbiol. 2009;7:526–36.
Viswanathan VK, Mallozzi MJ, Vedantam G. Clostridium difficile infection: an overview of the disease and its pathogenesis, epidemiology and interventions. Gut Microbes. 2010;1:234–42.
Shen A. Clostridium difficile toxins: mediators of inflammation. J Innate Immun. 2012;4:149–58.
Vedantam G, Clark A, Chu M, McQuade R, Mallozzi M, Viswanathan VK. Clostridium difficile infection: toxins and non-toxin virulence factors, and their contributions to disease establishment and host response. Gut Microbes. 2012;3:121–34.
He M, Miyajima F, Roberts P, Ellison L, Pickard DJ, Martin MJ, et al. Emergence and global spread of epidemic healthcare-associated Clostridium difficile. Nat Genet. 2013;45:109–13.
Smits WK. Hype or hypervirulence: a reflection on problematic C. difficile strains. Virulence. 2013;4:592–6.
Sebaihia M, Wren BW, Mullany P, Fairweather NF, Minton N, Stabler R, et al. The multidrug-resistant human pathogen Clostridium difficile has a highly mobile, mosaic genome. Nat Genet. 2006;38:779–86.
Wust J, Sullivan NM, Hardegger U, Wilkins TD. Investigation of an outbreak of antibiotic-associated colitis by various typing methods. J Clin Microbiol. 1982;16:1096–101.
Brouwer MS, Warburton PJ, Roberts AP, Mullany P, Allan E. Genetic organisation, mobility and predicted functions of genes on integrated, mobile genetic elements in sequenced strains of Clostridium difficile. PLoS One. 2011;6:e23014.
Stabler RA, Gerding DN, Songer JG, Drudy D, Brazier JS, Trinh HT, et al. Comparative phylogenomics of Clostridium difficile reveals clade specificity and microevolution of hypervirulent strains. J Bacteriol. 2006;188:7297–305.
He M, Sebaihia M, Lawley TD, Stabler RA, Dawson LF, Martin MJ, et al. Evolutionary dynamics of Clostridium difficile over short and long time scales. Proc Natl Acad Sci U S A. 2010;107:7527–32.
Roberts AP, Mullany P. Tn916-like genetic elements: a diverse group of modular mobile elements conferring antibiotic resistance. FEMS Microbiol Rev. 2011;35:856–71.
Mullany P, Wilks M, Lamb I, Clayton C, Wren B, Tabaqchali S. Genetic analysis of a tetracycline resistance element from Clostridium difficile and its conjugal transfer to and from Bacillus subtilis. J Gen Microbiol. 1990;136:1343–9.
Jasni AS, Mullany P, Hussain H, Roberts AP. Demonstration of conjugative transposon (Tn5397)-mediated horizontal gene transfer between Clostridium difficile and Enterococcus faecalis. Antimicrob Agents Chemother. 2010;54:4924–6.
Brouwer MS, Roberts AP, Hussain H, Williams RJ, Allan E, Mullany P. Horizontal gene transfer converts non-toxigenic Clostridium difficile strains into toxin producers. Nat Commun. 2013;4:2601.
Purdy D, O’Keeffe TA, Elmore M, Herbert M, McLeod A, Bokori-Brown M, et al. Conjugative transfer of clostridial shuttle vectors from Escherichia coli to Clostridium difficile through circumvention of the restriction barrier. Mol Microbiol. 2002;46:439–52.
Hussain HA, Roberts AP, Mullany P. Generation of an erythromycin-sensitive derivative of Clostridium difficile strain 630 (630Deltaerm) and demonstration that the conjugative transposon Tn916DeltaE enters the genome of this strain at multiple sites. J Med Microbiol. 2005;54:137–41.
Heap JT, Pennington OJ, Cartman ST, Carter GP, Minton NP. The ClosTron: a universal gene knock-out system for the genus Clostridium. J Microbiol Methods. 2007;70:452–64.
Ng YK, Ehsaan M, Philip S, Collery MM, Janoir C, Collignon A, et al. Expanding the repertoire of gene tools for precise manipulation of the Clostridium difficile genome: allelic exchange using pyrE alleles. PLoS One. 2013;8:e56051.
Cartman ST, Kelly ML, Heeg D, Heap JT, Minton NP. Precise manipulation of the Clostridium difficile chromosome reveals a lack of association between the tcdC genotype and toxin production. Appl Environ Microbiol. 2012;78:4683–90.
Bakker D, Buckley AM, De JA, Van Winden VJ, Verhoeks JP, Kuipers OP, et al. The HtrA-like protease CD3284 modulates virulence of Clostridium difficile. Infect Immun. 2014;82:4222–32.
Rosenbusch KE, Bakker D, Kuijper EJ, Smits WK. C. difficile 630Δerm Spo0A regulates sporulation, but does not contribute to toxin production, by direct high-affinity binding to target DNA. PLoS One. 2012;7:e48608.
Zeigler DR, Pragai Z, Rodriguez S, Chevreux B, Muffler A, Albert T, et al. The origins of 168, W23, and other Bacillus subtilis legacy strains. J Bacteriol. 2008;190:6983–95.
Srivatsan A, Han Y, Peng J, Tehranchi AK, Gibbs R, Wang JD, et al. High-precision, whole-genome sequencing of laboratory strains facilitates genetic studies. PLoS Genet. 2008;4:e1000139.
McLoon AL, Kolodkin-Gal I, Rubinstein SM, Kolter R, Losick R. Spatial regulation of histidine kinases governing biofilm formation in Bacillus subtilis. J Bacteriol. 2011;193:679–85.
Pettit LJ, Browne HP, Yu L, Smits WK, Fagan RP, Barquist L, et al. Functional genomics reveals that Clostridium difficile Spo0A coordinates sporulation, virulence and metabolism. BMC Genomics. 2014;15:160.
Darling AE, Mau B, Perna NT. ProgressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One. 2010;5:e11147.
Marinus MG, Casadesus J. Roles of DNA adenine methylation in host-pathogen interactions: mismatch repair, transcriptional regulation, and more. FEMS Microbiol Rev. 2009;33:488–503.
Collier J. Epigenetic regulation of the bacterial cell cycle. Curr Opin Microbiol. 2009;12:722–9.
Ratel D, Ravanat JL, Berger F, Wion D. N6-methyladenine: the other methylated base of DNA. Bioessays. 2006;28:309–15.
Wion D, Casadesus J. N6-methyl-adenine: an epigenetic signal for DNA-protein interactions. Nat Rev Microbiol. 2006;4:183–92.
Lobner-Olesen A, Skovgaard O, Marinus MG. Dam methylation: coordinating cellular processes. Curr Opin Microbiol. 2005;8:154–60.
Herbert M, O’Keeffe TA, Purdy D, Elmore M, Minton NP. Gene transfer into Clostridium difficile CD630 and characterisation of its methylase genes. FEMS Microbiol Lett. 2003;229:103–10.
Flusberg BA, Webster DR, Lee JH, Travers KJ, Olivares EC, Clark TA, et al. Direct detection of DNA methylation during single-molecule, real-time sequencing. Nat Methods. 2010;7:461–5.
Detecting DNA base modifications: SMRT analysis of microbial methylomes. [http://www.pacb.com/pdf/TN_Detecting_DNA_Base_Modifications.pdf]
Detecting DNA base modifications using single molecule, real-time sequencing. [http://www.pacificbiosciences.com/pdf/WP_Detecting_DNA_Base_Modifications_Using_SMRT_Sequencing.pdf].
Bailey TL. DREME: motif discovery in transcription factor ChIP-seq data. Bioinformatics. 2011;27:1653–9.
Sadeghifard N, Gurtler V, Beer M, Seviour RJ. The mosaic nature of intergenic 16S-23S rRNA spacer regions suggests rRNA operon copy number variation in Clostridium difficile strains. Appl Environ Microbiol. 2006;72:7311–23.
Gurtler V, Grando D. New opportunities for improved ribotyping of C. difficile clinical isolates by exploring their genomes. J Microbiol Methods. 2013;93:257–72.
Reynolds CB, Emerson JE, de la Riva L, Fagan RP, Fairweather NF. The Clostridium difficile cell wall protein CwpV is antigenically variable between strains, but exhibits conserved aggregation-promoting function. PLoS Pathog. 2011;7:e1002024.
Emerson JE, Reynolds CB, Fagan RP, Shaw HA, Goulding D, Fairweather NF. A novel genetic switch controls phase variable expression of CwpV, a Clostridium difficile cell wall protein. Mol Microbiol. 2009;74:541–56.
Brouwer MS, Roberts AP, Mullany P, Allan E. In silico analysis of sequenced strains of Clostridium difficile reveals a related set of conjugative transposons carrying a variety of accessory genes. Mob Genet Elements. 2012;2:8–12.
Agarwalla S, Kealey JT, Santi DV, Stroud RM. Characterization of the 23S ribosomal RNA m5U1939 methyltransferase from Escherichia coli. J Biol Chem. 2002;277:8835–40.
Madsen CT, Mengel-Jorgensen J, Kirpekar F, Douthwaite S. Identifying the methyltransferases for m(5)U747 and m(5)U1939 in 23S rRNA using MALDI mass spectrometry. Nucleic Acids Res. 2003;31:4738–46.
Persaud C, Lu Y, Vila-Sanjurjo A, Campbell JL, Finley J, O’Connor M. Mutagenesis of the modified bases, m(5)U1939 and psi2504, in Escherichia coli 23S rRNA. Biochem Biophys Res Commun. 2010;392:223–7.
O’Connor JR, Lyras D, Farrow KA, Adams V, Powell DR, Hinds J, et al. Construction and analysis of chromosomal Clostridium difficile mutants. Mol Microbiol. 2006;61:1335–51.
van den Berg RJ, Schaap I, Templeton KE, Klaassen CH, Kuijper EJ. Typing and subtyping of Clostridium difficile isolates by using multiple-locus variable-number tandem-repeat analysis. J Clin Microbiol. 2007;45:1024–8.
Lunter G, Goodson M. Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads. Genome Res. 2011;21:936–9.
Li H, Durbin R. Fast and accurate short read alignment with burrows-wheeler transform. Bioinformatics. 2009;25:1754–60.
Thorvaldsdottir H, Robinson JT, Mesirov JP. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform. 2013;14:178–92.
Chin CS, Alexander DH, Marks P, Klammer AA, Drake J, Heiner C, et al. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat Methods. 2013;10:563–9.
English AC, Richards S, Han Y, Wang M, Vee V, Qu J, et al. Mind the gap: upgrading genomes with pacific biosciences RS long-read sequencing technology. PLoS One. 2012;7:e47768.
Rutherford K, Parkhill J, Crook J, Horsnell T, Rice P, Rajandream MA, et al. Artemis: sequence visualization and annotation. Bioinformatics. 2000;16:944–5.
Carver T, Berriman M, Tivey A, Patel C, Bohme U, Barrell BG, et al. Artemis and ACT: viewing, annotating and comparing sequences stored in a relational database. Bioinformatics. 2008;24:2672–6.
Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, et al. Circos: an information aesthetic for comparative genomics. Genome Res. 2009;19:1639–45.
Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, et al. Versatile and open software for comparing large genomes. Genome Biol. 2004;5:R12.
Krumsiek J, Arnold R, Rattei T. Gepard: a rapid and sensitive tool for creating dotplots on genome scale. Bioinformatics. 2007;23:1026–8.
Crooks GE, Hon G, Chandonia JM, Brenner SE. WebLogo: a sequence logo generator. Genome Res. 2004;14:1188–90.
We thank Nigel Minton, Robert Britton and Paolo Mastrantonio for strains and Leon Mei of the Sequence Analysis Support Core (LUMC) for facilitating the initial Illumina analysis. Furthermore, we want to thank the Geneious team for helpful discussions. This work was supported, in part, by a Veni and a Vidi fellowship from the Netherlands Organization for Scientific Research and a Gisela Thier Fellowship from the Leiden University Medical Center to WKS.
The authors declare that they have no competing interests.
EVE, JF, AMS, WKS performed experiments. SYA and APR contributed reagents and tools. EVE, SYA,HPB, JF, AMS, APR, WYL and WKS analyzed data. EVE, SYA, and WKS wrote the manuscript. All authors read and approved the final manuscript.
Erika van Eijk and Seyed Yahya Anvar contributed equally to this work.
Table summarizing structural variants identified between strain 630 and strain 630Δerm (LUMC).
ClustalW alignment of the 16S-23S regions in the 630Δerm (LUMC) genome.
Table summarizing discordantly mapped read-pairs in the Illumina HiSeq reference alignment of C. difficile 630Δerm (LUMC) versus 630.
Table summarizing validated discordantly mapped read-pairs in the Illumina HiSeq reference alignment of C. difficile 630Δerm (LUMC) versus 630.