Analysis of genomic differences among Clostridium botulinum type A1 strains
© Fang et al. 2010
Received: 24 May 2010
Accepted: 23 December 2010
Published: 23 December 2010
Skip to main content
© Fang et al. 2010
Received: 24 May 2010
Accepted: 23 December 2010
Published: 23 December 2010
Type A1 Clostridium botulinum strains are a group of Gram-positive, spore-forming anaerobic bacteria that produce a genetically, biochemically, and biophysically indistinguishable 150 kD protein that causes botulism. The genomes of three type A1 C. botulinum strains have been sequenced and show a high degree of synteny. The purpose of this study was to characterize differences among these genomes and compare these differentiating features with two additional unsequenced strains used in previous studies.
Several strategies were deployed in this report. First, University of Massachusetts Dartmouth laboratory Hall strain (UMASS strain) neurotoxin gene was amplified by PCR and sequenced; its sequence was aligned with the published ATCC 3502 Sanger Institute Hall strain and Allergan Hall strain neurotoxin gene regions. Sequence alignment showed that there was a synonymous single nucleotide polymorphism (SNP) in the region encoding the heavy chain between Allergan strain and ATCC 3502 and UMASS strains. Second, comparative genomic hybridization (CGH) demonstrated that the UMASS strain and a strain expected to be derived from ATCC 3502 in the Centers for Disease Control and Prevention (CDC) laboratory (ATCC 3502*) differed in gene content compared to the ATCC 3502 genome sequence published by the Sanger Institute. Third, alignment of the three sequenced C. botulinum type A1 strain genomes revealed the presence of four comparable blocks. Strains ATCC 3502 and ATCC 19397 share the same genome organization, while the organization of the blocks in strain Hall were switched. Lastly, PCR was designed to identify UMASS and ATCC 3502* strain genome organizations. The PCR results indicated that UMASS strain belonged to Hall type and ATCC 3502* strain was identical to ATCC 3502 (Sanger Institute) type.
Taken together, C. botulinum type A1 strains including Sanger Institute ATCC 3502, ATCC 3502*, ATCC 19397, Hall, Allergan, and UMASS strains demonstrate differences at the level of the neurotoxin gene sequence, in gene content, and in genome arrangement.
Clostridium botulinum is a Gram-positive, spore-forming anaerobic bacterium that causes the severe neuroparalytic illness in humans and animals known as botulism. There are seven serologically distinct types of botulinum neurotoxin -types A, B, C, D, E, F, and G. Comparison of 16 S rRNA sequences  showed that C. botulinum strains forms four distinct clusters that correspond to four physiological groups (I-IV), which supported the historical classification scheme based upon biochemical and biophysical parameters. Group I (proteolytic C. botulinum) strains produce one or sometimes two toxins of type A, B or F; Group II (non-proteolytic C. botulinum) strains produce toxins of type B, E, or F; Group III strains produce toxins of type C or D; and Group IV strains produce toxin of type G [2, 3]. Furthermore, the toxinotypes are divided into many subtypes, which have been defined as toxin sequences differing by at least 2.6% identity at amino acid level . Botulinum type A neurotoxins are divided into five subtypes termed A1, A2, A3, A4, and more recently to A5 ; botulinum type B neurotoxins are divided into five subtypes termed B1, B2, B3, bivalent B, and non-proteolytic botulinum B neurotoxin; botulinum type E neurotoxins are classified into six subtypes: E1, E2, E3, E4, E5, and E6 ; and botulinum type F neurotoxins are separated into F1 through F7 subtypes . There are no known subtypes from types C, D, and G .
The strains of C. botulinum used for the production of type A therapeutic toxin (recently referred to as botulinum neuromedicine or BoNEM; Singh, 2009) are likely to originate from those isolated and preserved by Ivan C. Hall in the early 1900s. These strains, which include include several type A strains from botulism cases in the western United States  and both type A and B strains from isolated wounds , were distributed to colleges and universities throughout the world and deposited in various culture collections [11, 12]. As a result, many subcultures were performed, and the strains designated as "Hall" strains may not be identical to or may differ from the original isolates as a result of long term passage.
In this communication, the genetic diversity of C. botulinum was further explored by comparing the genomic differences among several C. botulinum strains including Sanger Institute ATCC 3502 [Hall 174, GenBank: AM412317], CDC ATCC 3502 (ATCC 3502*), ATCC 19397 [GenBank: CP000726], Hall [GenBank: CP000727], Allergan, and University of Massachusetts Dartmouth laboratory Hall strain (UMASS strain), all belonging to subtype A1. The results indicated that genetic diversity existed among these subtype A1 strains including those designated simply as "Hall".
The C. botulinum type A1 neurotoxin complex genomic cluster spans 11719 bp in ATCC 3502 [GenBank: AM412317, positions 901881 through 913599]. The identical genomic cluster was also found in four other genome sequences, whose GenBank accession numbers are CP000727 (C. botulinum A strain Hall), CP000726 (C. botulinum A strain ATCC 19397), DQ409059 (Hall A BoNT/A cluster), and AF461540 (Hall A- hyper BoNT/A cluster and its flanking regions).
The botulinum type A1 neurotoxin complex consists of six genes, namely, ha70, ha17, ha33, botR, ntnh, and bont/A, whose coding regions in aggregate consist of 11215 bp out of 11719 bp botulinum A neurotoxin genomic cluster (Sanger Institute ATCC 3502). Sequence alignment of this cluster with each individual gene sequence [GenBank: AF488745-AF488750] from the Allergan Hall strain  revealed that only two base pairs were different: one was in the region encoding the heavy chain of neurotoxin botulinum type A1, the other was in botR region (position 9 in AF488750, data not shown). Both are synonymous single nucleotide polymorphisms (SNP), which are not predicted to result in an amino acid change.
The botulinum neurotoxin type A1 gene (bont/A) from the UMASS strain was also sequenced and compared to that of ATCC 3502 and Allergan Hall strains. The UMASS sequence was identical to ATCC 3502 but different from Allergan Hall strain by one base pair (position 3591 of the neurotoxin gene, data not shown). We did not sequence the botR region in UMASS strain, therefore, it is unclear whether the SNP in botR region exists or not.
The three genomes were divided into two organizational patterns. ATCC 3502 and ATCC 19397 share the same pattern, while the positions of block 2 and block 3 were translocated in Hall, suggesting a genomic rearrangement event may have occurred among these strains. Moreover, within the same pattern and between genomes of ATCC 3502 and ATCC 19397, many regions inside the comparable blocks were different, as shown in areas that are completely white in Figure 3. Interestingly, two such regions (positions 1822680 through 1864850 and positions 2466354 through 2523055) in the ATCC 3502 genome are prophages that are absent in two other fully sequenced C. botulinum A1 strains: Hall and ATCC 19397 (data retrieved through NCBI Blast server). These observations are in agreement with previous reports [3, 5] and also confirmed by our CGH findings (data not shown).
In the other set of PCR reactions, using UMASS strain genomic DNA as template, none of the PCR reactions containing one of four different primer combinations generated a product for ATCC 3502 and ATCC 19397 pattern (data not shown); however, every PCR reaction containing one of four different primer combinations amplified the predictable size of PCR products for Hall pattern (Figure 5 Panel B, lanes 5 to 8). The largest PCR product derived from PCR reaction with upstream primer (Puo) and downstream primer (P2do) combination and using UMASS strain genomic DNA as template was cloned and sequenced. The sequence from this product was 100% identical to the corresponding region in the Hall strain genome, confirming that the genomic organization of UMASS strain belonged to Hall type.
Genetic diversity has been described in other pathogenic bacterial species . In one study, 73 C. difficile strains isolated from different resources were analyzed by CGH with microarrays containing coding sequences from C. difficile strains 630 and QCD-32g58. Startlingly, only about 16% of the genes in strain 630 were highly conserved among all strains . In another study, comparison of the laboratory strain Escherichia coli K12 to both uropathogenic and enterohemorrhagic strains revealed that less than 40% of the total number of genes present were shared by these three strains . Quite recently, CGH was performed on a relatively large scale to compare 61 strains of proteolytic C. botulinum and C. sporogenes using ATCC 3502 as reference strain . Approximately 63% of the coding sequences (CDSs) present in reference strain ATCC 3502 were common to all 61 strains. Even within the toxin gene cluster, a typically conserved region, the gene arrangement could be different between different serotypes or subtypes of the same serotype [18, 19]. The differences in the genome organization of the ATCC 3502* strain and ATCC 3502 (Sanger Institute), as shown in this report, further substantiated the dynamic nature of botulinum strain genome.
Lateral (or horizontal) gene transfer, through transformation, transduction, and conjugation, is a major mechanism for the generation of genetic diversity in pathogenic bacteria [20, 21]. In C. botulinum, the neurotoxin cluster has been shown to be present within plasmids or on the chromosome in strains of the same or different serotypes, which is consistent with horizontal gene transfer . None of the subtype A1 strains, whose genomes were sequenced, harbor the toxin gene on a plasmid. One plasmid, pBOT3502, existing in the ATCC 3502 Sanger Institute strain, was not found in the ATCC 19397 and Hall strains [3, 5] and, even more strikingly, not in the ATCC 3502* strain genome sequences . Further work is required to determine whether and if so, at what rate, loss of this plasmid occurs during laboratory passage.
The subtype A1 strain genetic diversity was also evidenced by the different location of genome block 3 when strains ATCC 3502 and ATCC 19397 are compared with strain Hall. Although this 20728 bp block contained two inverted repeat sequence fragments, we were unable to find direct repeat sequences or any gene that encodes a transposase. Therefore, we are unable to ascribe the genomic block switch observed in this study to a transposon-related mobile element mechanism [22–24]. Whether such differences in genomic arrangement among the subtype A1 strains examined has an effect on botulinum neurotoxin production remains to be elucidated.
In this report, the botulinum type A1 neurotoxin complex gene sequences of several strains were compared. There are at least five neurotoxin complex clusters from C. botulinum type A1 strains which have been fully sequenced and deposited into public databases. Sequence analysis showed that the sequences of five fully sequenced neurotoxin complex clusters were identical, and their gene coding regions and toxin gene complex from Allergan Hall strain displayed two synonymous single nucleotide polymorphisms: one is in the region encoding toxin heavy chain, the other in botR. These findings are quite different from those in an earlier report which showed that there were 93%, 94%, and 97% identities in the genes ntnh, botR, and ha70 at amino acid level, respectively . The apparent discrepancy of these findings is likely due to different versions of genomic sequence that were used: version 16-Apr-2002 (GenBank accession number is unclear) of ATCC 3502 Hall strain was used in Allergan's report, while version 21-Nov-2006 (AM412317, which is one of the live versions) of ATCC 3502 Hall strain was used in this report.
In summary, genetic diversity exists among the botulinum subtype A1 strains examined in this study. The neurotoxin gene of the UMASS strain exhibited the same nucleotide sequence as that of other published subtype A1 strains, except for the Allergan Hall strain. At the whole genome level, UMASS strain, ATCC3502*, and Sanger Institute 3502 strains, ATCC 19397, and Hall demonstrated differences in both gene content and genome arrangement.
C. botulinum strains were grown anaerobically at 37°C in Trypticase-peptone-glucose-yeast extract (TPGY) medium. Stock cultures were stored in bovine brain medium at 4°C.
PCR primers were designed to amplify the C. botulinum type A1 neurotoxin gene from UMASS strain. The PCR product was cloned and sequenced. The UMASS botulinum A1 neurotoxin nucleotide sequence, its counterpart regions in the Allergan Hall strain  and ATCC 3502 strain [GenBank: AM412317] were aligned by using EBI ClustalW2 http://www.ebi.ac.uk/Tools/clustalw2/index.html.
Genomic DNA extraction was performed as described previously . A custom C. botulinum type A1 strain ATCC 3502 comparative genomic hybridization arrays was used as described previously . Genomic DNA from the UMASS test strain was labeled with Cy3 random primers and the reference strain, ATCC 3502*, was labeled with Cy5. The data were visualized with SignalMap version 1.9 (Nimblegen, Madison, WI) and are presented as normalized log2 ratios of the fluorescence intensity of the reference strain/test strain. The CGH microarray data were deposited in the NCBI Gene Expression Omnibus (GEO) [Accession: GSE21241]
Multiple genome alignments were performed by using Mauve . Specifically, we analyzed the genome sequences of C. botulinum type A str. ATCC 3502 complete genome [GenBank: AM412317]; C. botulinum type A str. ATCC 19397 complete genome [GenBank: CP000726]; and C. botulinum type A str. Hall complete genome [GenBank: CP000727].
Based on the genome organizational patterns observed by multiple genome alignment, PCR primers were designed in the way that the expected PCR fragment will span the boundaries between the rearranged block 3 (ATCC 3502 block number) and its surrounding blocks 2 and 4 for both patterns. The common upstream primers, which are inside the rearranged block 3, are 5'-GAA GGC CTC CGG TGG CGA TAT C-3' (outsider primers, Puo) and 5'-GTG TAG AGA ATC GAA ACA AAA TCA TCC ACA TC-3' (inside primer, Pui). The downstream primers inside the block 4 of ATCC 3502 and ATCC 19397 are 5'-CTT GAA TGG CTT GGC ATA TTA AGT GGG-3' (inside primer, P4di) and 5'-AGT TGG CTT TAT AAT CCC TTG GAT TTC AGG-3' (outsider primers, P4do). The downstream primers inside block 2 of Hall are 5'-CAG AAT TAG CAG ACA GAC TAC TTT CTA CC-3' (inside primer, P2di) and 5'-ATA GCC TTA TTT GGA GGC GGT CAG G-3' (outsider primers, P2do). Eight PCR reactions containing different upstream and downstream primer combinations were set up using genomic DNA isolated from either ATCC 3502* strain or UMASS strain. The PCR product amplified with primers Puo and P2do from the UMASS strain was cloned into pCR4-TOPO (Invitrogen, Carlsbad, CA) and sequenced, and the sequencing results were used to search the GenBank database.
The authors want to express their thanks to Dr. Haihong Wang from UMASS for her expertise in gene cloning experiments and Lavin Joseph from CDC for his expertise in CGH experiments. This work is partially supported by NIH grant 5R21AI070787-02 and 1U01A1078070-01. The findings and conclusions in this report are those of the authors and do not necessarily represent the views of the Centers for Disease Control and Prevention.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.