C. botulinum A1 neurotoxin gene complex
The C. botulinum type A1 neurotoxin complex genomic cluster spans 11719 bp in ATCC 3502 [GenBank: AM412317, positions 901881 through 913599]. The identical genomic cluster was also found in four other genome sequences, whose GenBank accession numbers are CP000727 (C. botulinum A strain Hall), CP000726 (C. botulinum A strain ATCC 19397), DQ409059 (Hall A BoNT/A cluster), and AF461540 (Hall A-hyper BoNT/A cluster and its flanking regions).
The botulinum type A1 neurotoxin complex consists of six genes, namely, ha70, ha17, ha33, botR, ntnh, and bont/A, whose coding regions in aggregate consist of 11215 bp out of 11719 bp botulinum A neurotoxin genomic cluster (Sanger Institute ATCC 3502). Sequence alignment of this cluster with each individual gene sequence [GenBank: AF488745-AF488750] from the Allergan Hall strain [13] revealed that only two base pairs were different: one was in the region encoding the heavy chain of neurotoxin botulinum type A1, the other was in botR region (position 9 in AF488750, data not shown). Both are synonymous single nucleotide polymorphisms (SNP), which are not predicted to result in an amino acid change.
The botulinum neurotoxin type A1 gene (bont/A) from the UMASS strain was also sequenced and compared to that of ATCC 3502 and Allergan Hall strains. The UMASS sequence was identical to ATCC 3502 but different from Allergan Hall strain by one base pair (position 3591 of the neurotoxin gene, data not shown). We did not sequence the botR region in UMASS strain, therefore, it is unclear whether the SNP in botR region exists or not.
Comparative genomic hybridization of UMASS strain
The comparative genomic hybridization (CGH) microarray featured overlapping probes covering the entire C. botulinum A1 strain ATCC 3502 genome sequence [GenBank: AM412317]. The ATCC 3502* strain was used as reference and the UMASS strain as test strain. The hybridization results indicated the presence of several regions that were different between UMASS strain and Sanger Institute ATCC 3502 strain (Figure 1 and Figure 2), and, in some cases, were even different between ATCC 3502* and Sanger Institute ATCC 3502 strain (Figure 2). The nature of the deleted sequence (27409 bp) in Figure 1 is unclear. The same block sequence was also found in ATCC 19397 genome but not in Hall strain genome, as retrieved through NCBI Blast server.
Genome organizations of C. botulinum A1 strains
As mentioned above, three fully sequenced C. botulinum A1 strain genomes are deposited in GenBank: ATCC 3502, ATCC 19397, and Hall. Mauve software [14] was used to compare and analyze the organization of these genomes. At the gross level, based on the ATCC 3502 genome organization, all three genomes were divided into four blocks: blocks 1, 2, 3, and 4, sequentially (Figure 3).
The three genomes were divided into two organizational patterns. ATCC 3502 and ATCC 19397 share the same pattern, while the positions of block 2 and block 3 were translocated in Hall, suggesting a genomic rearrangement event may have occurred among these strains. Moreover, within the same pattern and between genomes of ATCC 3502 and ATCC 19397, many regions inside the comparable blocks were different, as shown in areas that are completely white in Figure 3. Interestingly, two such regions (positions 1822680 through 1864850 and positions 2466354 through 2523055) in the ATCC 3502 genome are prophages that are absent in two other fully sequenced C. botulinum A1 strains: Hall and ATCC 19397 (data retrieved through NCBI Blast server). These observations are in agreement with previous reports [3, 5] and also confirmed by our CGH findings (data not shown).
Characterization of the ATCC 3502* and UMASS Hall strain genome organizational patterns
To characterize whether the genome organizational pattern of ATCC 3502* and UMASS Hall strain fits into either of above two patterns, a PCR strategy was utilized. Primers were designed to span the boundary between block 3 and block 4 for the ATCC 3502 and ATCC 19397 pattern or between block 3 and block 2 for the Hall pattern (Figure 4). In one set of PCR reactions, using ATCC 3502* genomic DNA as template, the expected PCR product was generated from every PCR reaction with different combinations of upstream and downstream primers for ATCC 3502 and ATCC 19397 pattern (Figure 5 Panel A, lanes 1 to 4) but not from those with different combination of primers for Hall pattern reactions (data not shown). These results demonstrated that the genomic organization of ATCC 3502* indeed was identical to the ATCC 3502 Sanger Institute and ATCC 19397 strain genome organizations.
In the other set of PCR reactions, using UMASS strain genomic DNA as template, none of the PCR reactions containing one of four different primer combinations generated a product for ATCC 3502 and ATCC 19397 pattern (data not shown); however, every PCR reaction containing one of four different primer combinations amplified the predictable size of PCR products for Hall pattern (Figure 5 Panel B, lanes 5 to 8). The largest PCR product derived from PCR reaction with upstream primer (Puo) and downstream primer (P2do) combination and using UMASS strain genomic DNA as template was cloned and sequenced. The sequence from this product was 100% identical to the corresponding region in the Hall strain genome, confirming that the genomic organization of UMASS strain belonged to Hall type.
Analysis of the region containing the sequenced PCR product demonstrated that the region is further divided into F1 fragment (167 bp) that is located in 3'-end of block 3 of Hall strain genome and F2 fragment (587 bp) that is located in 5'-end of block 2 of Hall strain genome. As shown in Figure 6, the F1 and F2 fragments, a continuous region in Hall strain genome, were split into two separate fragments in ATCC 3502 and ATCC 19397 strain genomes, although each remained within their individual rearranged blocks.
Further analysis of block 3 (20728 bp in strain ATCC 3502) revealed that virtually identical sequences are found in strain ATCC 19397 (20726 identities out of 20728 bp) and strain Hall (20710 identities out of 20714 bp). The GC content (27.3%) of block 3 in strain ATCC 3502 is not significantly different from 28.2% of whole genome GC content for each sequenced subtype A1 strain genome. Within block 3, we identified two 314 bp inverted repeat sequences (93% identities, Figure 7) that are located before the first gene in the block (CBO0526) and after the last gene in the block (CBO0542). Notably, no genes encoding a transposase or a direct repeat sequence (characteristic of transposon mobile element) was found in the region. In addition, the downstream inverted repeat has no overlapping sequence with F1 fragment mentioned above.