Genomic and proteomic analyses of Mycobacterium bovis BCG Mexico 1931 reveal a diverse immunogenic repertoire against tuberculosis infection

Background Studies of Mycobacterium bovis BCG strains used in different countries and vaccination programs show clear variations in the genomes and immune protective properties of BCG strains. The aim of this study was to characterise the genomic and immune proteomic profile of the BCG 1931 strain used in Mexico. Results BCG Mexico 1931 has a circular chromosome of 4,350,386 bp with a G+C content and numbers of genes and pseudogenes similar to those of BCG Tokyo and BCG Pasteur. BCG Mexico 1931 lacks Region of Difference 1 (RD1), RD2 and N-RD18 and one copy of IS6110, indicating that BCG Mexico 1931 belongs to DU2 group IV within the BCG vaccine genealogy. In addition, this strain contains three new RDs, which are 53 (RDMex01), 655 (RDMex02) and 2,847 bp (REDMex03) long, and 55 single-nucleotide polymorphisms representing non-synonymous mutations compared to BCG Pasteur and BCG Tokyo. In a comparative proteomic analysis, the BCG Mexico 1931, Danish, Phipps and Tokyo strains showed 812, 794, 791 and 701 protein spots, respectively. The same analysis showed that BCG Mexico 1931 shares 62% of its protein spots with the BCG Danish strain, 61% with the BCG Phipps strain and only 48% with the BCG Tokyo strain. Thirty-nine reactive spots were detected in BCG Mexico 1931 using sera from subjects with active tuberculosis infections and positive tuberculin skin tests. Conclusions BCG Mexico 1931 has a smaller genome than the BCG Pasteur and BCG Tokyo strains. Two specific deletions in BCG Mexico 1931 are described (RDMex02 and RDMex03). The loss of RDMex02 (fadD23) is associated with enhanced macrophage binding and RDMex03 contains genes that may be involved in regulatory pathways. We also describe new antigenic proteins for the first time.


Background
Tuberculosis (TB) remains a major health problem worldwide; the World Health Organisation (WHO) estimates that there were 9.4 million new cases and 1.7 million deaths from TB in 2009 [1]. Bacillus Calmette-Guérin (BCG) is currently the only available vaccine against tuberculosis. This vaccine protects against the most severe forms of the disease, milliary and meningeal tuberculosis; however, it is highly variable in its ability to protect against pulmonary tuberculosis (0-80%). There are several reasons for this variability, including differences between BCG substrains, exposure to nontuberculous mycobacteria (NTMs), the nutritional or genetic background of the population, differences in trial methods and variations between different clinical Mycobacterium tuberculosis strains [2][3][4][5][6].
Use of BCG in the early 1920s proved effective in protecting against TB, leading to distribution of the vaccine in many countries. This distribution process and subsequent preservation resulted in the generation of numerous BCG substrains with different morphological, biochemical and immunological features [7,8]. Several studies on BCG substrains have demonstrated changes at the genetic level, and comparative analyses of M. tuberculosis, M. bovis and M. bovis BCG have identified region of difference (RD) and tandem duplication (DU) markers in these strains [9][10][11][12].
Regions of difference are DNA regions that are deleted in the M. bovis and M. bovis BCG genomes compared to M. tuberculosis. The RD1 region is involved in BCG attenuation [7,13]. It has been shown that deletion of this region in M. tuberculosis H37Rv leads to attenuation of the strain [14]; however, complementation of BCG Pasteur with RD1 does not fully restore virulence to wild-type levels [15]. BCG strains can be sub-classified according to the presence or absence of RD2 in early and late strains, respectively. Recently, Kozak et al. reported that BCG Pasteur, a strain that lacks RD2, exhibits decreased immunogenicity compared to BCG Russia, a strain that has retained RD2 [16]. Importantly, these two strains show no difference in their level of protection against pulmonary tuberculosis. Additionally, Castillo-Rodal et al. have shown that the RDs described to date do not correlate with the protective efficacy of BCG substrains in a murine model [17]. The differences observed among BCG strains suggest that additional attenuating mutations may be involved in the attenuation of individual BCG strains.
Analysis of the BCG Pasteur 1173P2 genome sequence has made it possible to construct a detailed genealogy of BCG vaccines. BCG substrains are classified into four groups (I-IV) based on RD and DU2 markers [9]. Furthermore, single-nucleotide polymorphisms (SNPs) that are unique to particular BCG substrains or shared among substrains have been identified. Some of these SNPs have functional implications for the affected genes. For example, a SNP in mma3 (BCG0692c) is responsible for the lack of methoxymycolate production in late BCG substrains [18].
The evidence presented above supports further characterisation of BCG substrains to improve our understanding of the mechanisms and impact of attenuation to rational design of new vaccines and therapeutics for tuberculosis [2,19].
Even though it was one of the most widely used substrains for vaccination in Mexico, BCG Mexico 1931 has not been included in any previous comparative proteomic or genomic study of BCG strains. Characterisation of BCG Mexico 1931 will permit again its use for BCG vaccine production in Mexico. This BCG strain will be used to develop a new recombinant BCG vaccine. Recently, Hayashi et al. described the biochemical characteristics of 14 BCG strains (including a BCG Mexico substrain), as well as M. bovis, M. tuberculosis, M. avium and M. smegmatis strains. Interestingly, BCG Mexico presented a biochemical profile more similar to that of M. bovis than any other BCG strain [20].
Historical records show that the Pasteur Institute sent several shipments of BCG strains to Mexico between 1926 and 1927 (Pasteur Institute records, personal communication). In 1928, small-scale production of BCG vaccine began in Mexico. In 1949, a BCG vaccine production laboratory was opened, and the vaccine was distributed throughout Mexico and Latin America [21][22][23]. Since 1931, the BCG Mexico substrain has been maintained by the Laboratorios de Biológicos y Reactivos de Mexico, a state-owned company that produces biological agents in Mexico. The BCG Mexico substrain was used as the vaccine seed for many years [24]. In 1970, this strain was replaced with the BCG Danish 1331 strain for vaccine production [25]. In 1998, BCG vaccine production ended in Mexico; since then, the country has depended on imported vaccine. These changes in vaccine production have caused confusion regarding the identity of BCG Mexico. For this reason, we characterised three representative strains used for BCG vaccine production in Mexico, which are designated BCG Mexico 1931Mexico , 1988Mexico and 1997 according to the production period in which they were used. In this report, we describe the genomic and proteomic features of BCG Mexico 1931.

RD and DU Profile of BCG Mexico strains
Our RD and DU profile analysis of BCG Mexico 1931 demonstrated the presence of the RD8, RD14, RD16 and RD Danish/Glaxo regions and the absence of the RD1, RD2 and N-RD18 regions, as well as a single copy of the insertion sequence IS6110. These properties are similar to those observed for BCG Phipps and BCG Tice (Table 1). In contrast, BCG Mexico 1988 and BCG Mexico 1997 exhibited identical RD and DU profiles, with the RD1, RD2 and RD Danish/Glaxo regions and one copy of IS6110 missing (Table 1). This profile is identical to that of BCG Danish. The absence of the RD Danish/Glaxo region, which is specific to BCG Danish, in BCG Mexico 1988 and 1997 confirms this result and is consistent with historical records indicating that BCG vaccine production in Mexico utilised the BCG Danish 1331 strain beginning in 1970.
The amplification pattern of DU regions in BCG Mexico 1931 indicated duplication of DU2-IV, in contrast to those of BCG Mexico 1988 and BCG Mexico 1997, which showed duplication of DU2-III (Table 1). These differences in RDs and DUs confirm that BCG Mexico 1931 is a different strain from BCG Mexico 1988 and 1997, which are related to BCG Danish.
The above results and subsequent sequencing of the BCG Mexico 1931 genome place this strain in DU2 group IV within the genealogy of BCG strains ( Figure  1). These results differ from findings of previous studies The symbol (-) indicates the loss of a region in the genome of a given strain, whereas the symbol (+) indicates the presence of the region. RD = region of difference; IS6110 = insertion sequence IS6110 (presence of a single copy); DU = tandem duplication. Figure 1 Genealogy of BCG vaccines, adapted from Brosch et al. [9]. The BCG Mexico 1931 strain was included in this study. With respect to BCG Tokyo 172, the difference in genome size can be explained by the loss of RD2, N-RD18 and one copy of insertion sequence IS6110 as well as by differences in the size of DU2 ( Figure 1). The genome of BCG Mexico 1931 contains fewer genes (3,904) than those of BCG Pasteur (3,954) and BCG Tokyo (4,033) [9,27]. This variation is due to the presence of different RDs in each strain and to differences in the criteria used for annotation of hypothetical proteins not previously described in other BCG strains   (Table 2). Interestingly, a PCR screen of these regions in nine BCG strains (Birkhaug, Connaught, Danish, Frappier, Moreau, Phipps, Tice, Tokyo and Sweden) showed that the RDMex02 and RDMex03 regions have been lost only in BCG Mexico 1931 and can therefore be used as molecular markers for this strain.
RDMex01 is an intergenic deletion located between the BCG0767 (rpsN1) and BCG0768 (rpsH) genes, which encode two subunits of the 30S ribosomal protein. The biological effect of this deletion is unknown.
RDMex02 is associated with deletion of 218 aa from BCG3889 (fadD23), affecting a conserved region of the protein that includes two transmembrane domains. This gene encodes a probable fatty-acyl CoA ligase involved in lipid degradation. Lynett et al. have reported that this protein is involved in sulpholipid production and that disruption of the gene results in increased association between bacteria and macrophages [28]. Molina et al. found that BCG Mexico 1931 associates more strongly with macrophages (THP-1) compared to BCG Danish, BCG Moreau, BCG Phipps and BCG Tokyo172 [29].
Finally, RDMex03 was the largest deletion found in the BCG Mexico 1931 genome. It affected four genes: three genes encoding hypothetical proteins (BCG3923, BCG3924 and BCG3926) and another gene encoding a putative transcriptional regulator [BCG3925c (whiB6)] belonging to the WhiB protein family (1)(2)(3)(4)(5)(6)(7). This family has been proposed to form part of a new redox system in M. tuberculosis [30]. Interestingly, this deletion is situated in the extended RD1 region.
The new RDs described in BCG Mexico 1931 may contribute to understanding of the phenotypic differences between BCG Mexico 1931 and other BCG strains.
Our SNP analysis indicated the presence of 33 SNPs in BCG Mexico 1931 compared to BCG Pasteur and 77 SNPs in BCG Mexico 1931 compared to BCG Tokyo. Among these SNPs, at least 23 have been reported in two previous studies [27,31]. Additionally, in agreement with the SNP-based phylogeny constructed by García Pelayo et al., BCG Mexico 1931 was grouped with BCG Tice in our analysis [31].
We found SNPs within BCG0510c (pcaA), BCG0532 (regX3), BCG0692c (mma3), BCG0484c (sigK) and BCG3734 (Table 3). The SNPs in the last three genes have been described in previous studies [7,31]. The SNP found in BCG0692c (mma3) causes an amino acid change with a concomitant loss of methoxymycolates in BCG strains obtained from the Pasteur Institute after 1927 [18]. This result is consistent with the findings of Hayashi et al., who described the absence of these acids in BCG Mexico 1931 [26]. An SNP in the start codon of BCG0484c (sigK) is responsible for low expression of MPB70 and MPB83 in late BCG strains, including BCG Mexico 1931 [32]. Moreover, mutations in BCG3734, a CRP homologue global regulator, have been described as specific to BCG and are responsible for increased binding of CRP to its target DNA [33]. Mutations in Rv0491 (regX3) and Rv0470c (pcaA) have been implicated in the virulence of M. tuberculosis. The pcaA gene encodes a mycolic acid cyclopropane synthetase and is important for growth, persistence in macrophages and proinflammatory activity [34,35]. Additionally, regX3 is part of a two-component system regulated by Pi (SenX3-RegX3) that is involved in the virulence of M. tuberculosis [36,37]. Interestingly, a specific nsSNP from BCG Mexico 1931 causes an amino acid change in BCG3741 (ponA2). Mutations in this gene have been associated with increased sensitivity to heat shock (24 h at 45°C) and exposure to H 2 O 2 compared to wild-type M. tuberculosis. Additionally, a ponA2 mutant was found to exhibit lower survival in mice compared to wild-type M. tuberculosis [38].
We also identified six SNPs in PE_PGRS4, PPE22, PE_PGRS41, PPE50 and PE_PGRS43b (Tables 3 and 4). These genes encode PE/PPE family proteins, which may play a role in the evasion of host immune responses, possibly via antigenic variation of mycobacteria [39]. In previous studies, it has been shown that the PPE22 protein elicits B cell responses, while PPE50 is required for mycobacterial growth in vitro [39,40]. Furthermore, we determined that PE_PGRS54 (6,285 bp) and PPE_PGRS55 (5,433 bp) correspond to a longer product in BCG Mexico compared with homologous sequences only for BCG Tokyo (6,153 and 5,088 bp, respectively). These results are consistent with data previously described [27]. Importantly, the functional implications of these size variations remain unknown.

Comparison of BCG proteomes
The protein contents of the cell fractions from four BCG substrains were analysed by two-dimensional polyacrylamide gel electrophoresis (2D-PAGE) using bacteria in mid-logarithmic phase. A total of 812, 794, 791 and 701 spots were visualised for BCG Mexico 1931, BCG Danish, BCG Phipps and BCG Tokyo, respectively   Table 3. Numbers in parentheses indicate the number of genes affected in each functional category.  (Figure 4). Previous studies have shown that BCG strains (Connaught, Tice, Danish and Phipps) differ in their protein profiles [41][42][43]. Here, we observed that late strains (BCG Mexico 1931, Danish and Phipps) had a greater number of proteins in common compared to the early strain we studied (BCG Tokyo). This difference can be explained by mutations in transcriptional regulators such as BCG3734 (crp) and BCG0484c (sigK) in late BCGs. Furthermore, the proteins unique to BCG Mexico 1931 may be useful for characterising this strain and explaining the causes of the observed phenotypic differences compared to other BCG strains.

Characterisation of the immune response by immune blotting
To identify immunogenic proteins in BCG Mexico 1931, we performed an immune blotting analysis. We detected 39 reactive spots in the immune proteome ( Figure 5A and Additional file 1: Table S1). The largest numbers of reactive spots were obtained when using serum from subjects with active TB (16) or positive tuberculin skin test results (PPD+) (14); 12 of these spots were unique to each serum type ( Figure 5B). This result indicates high variability in the proteins recognised by each type of serum. We identified 37 proteins by sequencing (Additional file 1: Table S1), the majority of which (17; 47%) corresponded to intermediary metabolism and respiration proteins ( Figure 5C). Among the identified proteins, some have been previously described as virulence proteins in different strains of M. tuberculosis: phosphoenolpyruvate carboxykinase (pckA), isocitrate lyase (icl), 3-oxoacyl synthase II (kasA), groEL, TB27.3, the 85A and 85C antigens, alkyl-hydroperoxide reductase (ahpC) and heat shock protein HspX [44][45][46][47]. AhpC are over-expressed in BCG Phipps [42]. To our knowledge, phosphoenolpyruvate carboxykinase, isocitrate lyase, 3-oxoacyl synthase II and AhpC are described as antigenic proteins for the first time in this report.

Conclusions
This study represents the first genomic and proteomic characterisation of BCG Mexico 1931. This substrain was used for BCG vaccine production in Mexico until 1970 and can now be used again for vaccine production and as a vector for the design of new second-generation vaccines against tuberculosis.
Initially, we determined the RD profiles of three BCG substrains representing different stages of vaccine production in Mexico. The RD profiles show that BCG Mexico 1931 is different from BCG Mexico 1988 and 1997, which have the same profile as BCG Danish. These dates are consistent with historical records, which indicate that BCG vaccine was produced in Mexico from the BCG Danish strain after 1970.
Based on these results, the BCG Mexico 1931 substrain was used for genomic and proteomic characterisation. According to the RD profiles and genome sequence of BCG Mexico 1931, this substrain belongs to DU2 group IV within the genealogy of BCG vaccines.
Genetic studies of BCG substrains have provided new knowledge about the genes involved in the phenotypic differences observed for these strains (for example mma3, fadD26, ppsA, phoP and whiB3), making it possible to elucidate the basis of phenotypic variation between them. The results of this investigation and Figure 5 Immunogenic proteins identified in BCG Mexico. A total of 37 immunogenic proteins were identified. Spots circled in red, green, dark blue and light blue represent proteins reactive to sera from subjects with active pulmonary tuberculosis, NTM mycobacterioses, PPD+ and PPD-, respectively, while the red, orange and green squares represent proteins shared between TB-MNT, TB-PPD+ and TB-PPD-sera, respectively. previously published studies show that the genes with the largest numbers of mutations are related to twocomponent systems and transport (esX, senX3-regX3, phoP-phoR), regulatory proteins (whiB family, crp, trcR) and lipid metabolism (fadD, ppsA). Interestingly, most of these mutations may be involved in the phenotypic differences observed between BCG substrains. For example, the lack of production of some membrane lipids (PDIMs) in BCG Moreau is caused by deletions in BCG2952 (fadD26) and BCG2953 (ppsA) [48]. In this study, we identified specific regions in BCG Mexico 1931 that may be directly involved in the phenotypic characteristics of this strain and designated them RDMex02 and RDMex03. These regions affect the proteins FadD23 and WhiB6, which are related to the interaction between the bacteria and macrophages. In addition, we have identified SNPs in genes previously determined to be involved in the virulence of Mycobacterium strains. Further studies are needed to establish the contributions of these mutations and to assess the roles of the newly identified antigenic proteins in the BCG Mexico 1931 phenotype.

Bacterial strains and DNA isolation
Mycobacterium bovis BCG substrains were grown in Sauton medium for 15 days at 37°C, harvested by centrifugation and stored at -70°C until use. The Birkhaug, Connaught, Danish 1331, Frappier, Moreau, Phipps, Tice and Sweden BCG strains were kindly provided by M. Behr (McGill University Health Centre, Canada) and BCG Tokyo 172 was provided by M. Macías (Instituto Nacional de Pediatría, México). The Mexican Instituto Nacional de Higiene provided the BCG Mexico 1931, 1988 and 1997 substrains. Genomic DNA was extracted by the phenol/chloroform method following established protocols [49].

BCG Mexico 1931 genome sequencing
The genome of M. bovis BCG Mexico 1931 was sequenced using 454 pyrosequencing, which was performed by the Sequencing Unit of CINVESTAV, Irapuato, Mexico. Additionally, a fosmid library with inserts of approximately 40 kb was constructed using the CopyControl™ pCCFOS™ system (Epicentre Technologies, USA), and the fos-end sequences of 250 clones were determined using the Sanger method (3730xl DNA Analyzer, Applied Biosystems, USA). Draft assemblies were based on 623,000 reads (36× coverage), and the Phred/Phrap/Consed software package was employed for sequence assembly and quality assessment [51] using the BCG Pasteur 1173P2 sequence as a reference [Gen-Bank: AM408590]. To close gaps and resolve duplicated regions, the complete sequences of three fosmids (approximately 40 kb) and 110 PCR end reads were obtained. Annotation was performed using the RAST Server [52,53], Artemis [54,55] and BCGList [56].

BCG Mexico 1931 genome sequence analysis
The BCG Pasteur 1173P2, BCG Tokyo 172 [GenBank: AP010918] and BCG Mexico sequences were compared using Consed software [51] and the Basic Local Alignment and Search Tool (BLAST) [57]. 2) for eight days at 37°C with shaking, harvested by centrifugation, washed and suspended in sterile water for lysis. Cellular proteins were obtained by sonication of mycobacteria (Ultrasonic Processor, Cole Parmer Corporation, USA) in the presence of a protease inhibitor (PMSF, 20 mM) at 4°C. The extracted proteins were quantified using a Bradford assay. For 2D-PAGE, approximately 100 mg of protein was solubilised, denatured, reduced in sample buffer [4% CHAPS, 2 M urea, 70 mM l-dithiothreitol (DTT), 0.001% bromophenol blue, and 0.1% 3-10 ampholyte] and used to rehydrate 11-cm pH 4-7 IPG strips (ReadyStrip™, IPG strips, Bio-Rad). Isoelectric focusing (IEF) was performed on a Multiphor II system (Amersham Biosciences, England) until reaching 52,000 VH at 17°C. The strips were equilibrated twice in a solution containing 4% urea, 30% glycerol (v/v), 50 mM Tris-HCl (pH 8.8), 2% SDS (w/v), and 0.002% bromophenol blue supplemented with 90 mM DTT for the first incubation (15 minutes) and 250 mM iodoacetamide (IAA) for the second incubation (15 minutes). Second-dimension electrophoresis was performed on a 12% polyacrylamide gel (Hoeffer SE-600, Amersham Biosciences, England) for approximately five hours with a voltage gradient of 50-200 V. Once fixed, the proteins were silver-stained, and gel images were captured in a digital format (Molecular Imager GS-800TM Calibrated Densitometer, Bio-Rad, USA). Gel analysis was performed using the program 2-D PDQuest Advance V.8.0 (Bio-Rad, USA). Duplicate gels with proteins obtained from independent cultures were included in the analysis. A master image gel was created using three replicates of each experiment and was used for comparison [42].

Immune blotting
To identify antigenic proteins in BCG Mexico 1931, we conducted an immune blot analysis from the 2D-PAGE gels. Proteins were transferred onto Hybond P polyvinylidene difluoride (PVDF) membranes (GE Healthcare, England) using a Trans-Blot SD Semi-Dry Transfer system (Bio-Rad, USA) for 1 h at 10 V. The membranes were blocked with Tris-buffered saline (TBS) containing 0.05% Tween-20 and 5% skim milk at 4°C overnight. The membranes were then incubated for 1 h at room temperature with sera from subjects with PPD+, PPD-, pulmonary TB or mycobacterioses caused by NTMs. The membranes were incubated with selected sera from each group having the highest IgG2 titres against each of the groups described above (data not shown). Then membranes were subsequently incubated with an alkaline phosphatase-conjugated mouse anti-human IgG2 antibody (1:5000; Zymed, Invitrogen, USA) for 1 h at room temperature. Immune detection was accomplished using the Inmobilon Western System (Millipore Co, USA). Chemiluminescence signals were measured using the Genius Plus system (TECAN, Switzerland).

Protein sequencing
Reactive proteins were sequenced using a 3200 QTRAP hybrid tandem mass spectrometer (3200 QTRAP, Applied Biosystems, USA) equipped with a nano-electrospray ion source (NanoSpray II) and a MicroIonSpray II head. Proteins were identified based on their MS/MS spectra datasets using the MASCOT search algorithm (Version 1.6b9, Matrix Science, London, UK). A BLAST search was conducted using the M. tuberculosis complex subset of the National Centre for Biotechnology Information (NCBI) non-redundant database (NCBI nr20070623).

Nucleotide and protein sequence accession numbers
The complete genome sequence of Mycobacterium bovis BCG Mexico 1931 has been deposited in the NCBI Gen-Bank database under accession number CP002095. The GenBank accession numbers for the identified protein

Additional material
Additional file 1: TableS1. Detection data for antigenic proteins in BCG Mexico 1931. Authors' contributions PO participated in the design and performance of the experiments, and writing of the paper; IGH and AAH performed experiments; GMH performed protein identification by sequencing; MAC, SPL and YLV participated in the study design and writing of the paper. All authors read and approved the final manuscript.