Genomics of glycopeptidolipid biosynthesis in Mycobacterium abscessus and M. chelonae

Background The outermost layer of the bacterial surface is of crucial importance because it is in constant interaction with the host. Glycopeptidolipids (GPLs) are major surface glycolipids present on various mycobacterial species. In the fast-grower model organism Mycobacterium smegmatis, GPL biosynthesis involves approximately 30 genes all mapping to a single region of 65 kb. Results We have recently sequenced the complete genomes of two fast-growers causing human infections, Mycobacterium abscessus (CIP 104536T) and M. chelonae (CIP 104535T). We show here that these two species contain genes corresponding to all those of the M. smegmatis "GPL locus", with extensive conservation of the predicted protein sequences consistent with the production of GPL molecules indistinguishable by biochemical analysis. However, the GPL locus appears to be split into several parts in M. chelonae and M. abscessus. One large cluster (19 genes) comprises all genes involved in the synthesis of the tripeptide-aminoalcohol moiety, the glycosylation of the lipopeptide and methylation/acetylation modifications. We provide evidence that a duplicated acetyltransferase (atf1 and atf2) in M. abscessus and M. chelonae has evolved through specialization, being able to transfer one acetyl at once in a sequential manner. There is a second smaller and distant (M. chelonae, 900 kb; M. abscessus, 3 Mb) cluster of six genes involved in the synthesis of the fatty acyl moiety and its attachment to the tripeptide-aminoalcohol moiety. The other genes are scattered throughout the genome, including two genes encoding putative regulatory proteins. Conclusion Although these three species produce identical GPL molecules, the organization of GPL genes differ between them, thus constituting species-specific signatures. An hypothesis is that the compact organization of the GPL locus in M. smegmatis represents the ancestral form and that evolution has scattered various pieces throughout the genome in M. abscessus and M. chelonae.


Background
Mycobacterium abscessus and M. chelonae are both species of rapidly growing mycobacteria (RGM) that have emerged as significant pathogens in humans during the last ten years: both species are major causes of skin and soft tissue infections following medical or surgical procedures [1]; M. abscessus also causes pulmonary infections and is increasingly recovered from patients with cystic fibrosis [1]. M. chelonae and M. abscessus are among the most-antibiotic resistant RGM species [1] and this has serious consequences for therapy [2].
Glycopeptidolipids (GPLs) are surface lipids found on a number of mycobacterial species including both RGM (e.g., M. smegmatis, M. chelonae, M. abscessus) and slow growing mycobacteria (e.g., M. avium subsp. avium). These molecules can make up more than 70% of the lipids exposed at the bacterial surface (for a review see [3][4][5]). They have a glycosylated lipopeptide core that is variably modified by O-methylation and O-acetylation (Fig. 1). More precisely, the GPL structure is based on a tripeptideaminoalcohol (D-Phe-D-allo-Thr-D-Ala-L-alaninol) Nlinked to a long chain fatty acyl residue. This lipopeptide core is substituted by a 6-deoxytalosyl (dTal) unit linked to the allo-Thr residue and by an O-methylated rhamnosyl unit linked to the terminal alaninol residue. The dTal residue can be O-acetylated on positions 3 and 4, further glycosylated on position 2 in M. avium, whereas the fatty acyl moiety and the rhamnosyl residue can be modified with one and three methyl groups on positions 2, 3, or 4, respectively and in some cases further glycosylated by the addition of an extra rhamnosyl unit [3,11,6] (Fig. 1).
GPLs are required for sliding motility, biofilm formation and for maintaining cell wall integrity [7,8]. They also influence bacterial aggregation [7,9], induce the release of prostaglandin E2 and interfere with the interaction between mycobacteria and human monocytes/macrophages [10,11]. Moreover, several recent studies show that natural variants of M. abscessus, which produce only small amounts of GPL are more invasive than the high-level producers [12][13][14]. Thus, GPLs appear to play an important role in both the physiology and the pathogenicity of mycobacteria.
The complete genomes of M. chelonae (CIP 104535T) and M. abscessus (CIP 104536T) have recently been sequenced to help to elucidate their molecular mechanisms of pathogenicity and antibiotic resistance. By exploiting available data concerning the genetic basis of the GPL biosynthetic pathway in the RGM model organism, M. smegmatis, we identified and analysed the genetic regions encoding enzymes involved in GPL biosynthesis in M. chelonae and M. abscessus.

Biochemical analysis of the glycopeptidolipid produced by M. abscessus and M. chelonae
Since GPL molecules may differ at a structural level, we first assessed the GPL status of the strains used in this study for comparative genomic analysis: M. smegmatis mc 2 155, M. chelonae CIP 104535T and M. abscessus CIP 104536T. Strains were grown in both early exponential and late stationary phases ( Fig. 2A) and lipids were extracted from cells [11]. The GPLs were identified by thin-layer chromatography (TLC), a method separating molecules according to their hydrophobic index, and using matrix-assisted laser desorption/ionization time of flight (MALDI-TOF) mass spectrometry analysis [11]. Since GPLs contain sugar moieties, GPL-like compounds were identified by anthrone staining [7]. Whatever the growth phase, the three strains produced similar diglycosylated GPLs, and in comparable quantities; however, triglycosylated GPLs were clearly more abundant in M. chelonae and M. abscessus than in M. smegmatis in both conditions ( Fig. 2A). The precise nature of the GPL-like compounds was resolved by MALDI-TOF mass spectrometry, a very accurate and sensitive method for detecting glycolipids. The molecular masses of all the GPL-associated pseudomolecular ions peaks detected in the mass spectra of all strains perfectly matched the previously calculated molecular weights of the GPLs in these strains. Most of the GPLs produced by M. smegmatis mc 2 155 were diglycosylated forms, confirming earlier reports [15]. M. abscessus and M. chelonae produced the same diglycosylated GPLs ([M+K+] m/z at 1273 amu), and in addition produced a higher amount of triglycosylated GPLs than M. smegmatis ([M+K+] m/z at 1419 amu) as expected from TLC analysis (Fig. 2B).

Comparative genomics of the GPL biosynthetic pathway
Comparative genomic analysis was performed by pairwise alignments of the proteins of the GPL locus of M. smegmatis and each of the complete proteomes of M. abscessus and M. chelonae [16,17]. All the genes necessary for GPL biosynthesis are clustered in a single region of the M. smegmatis genome, thus constituting a true "GPL locus" ( Figure  3). In this species, most of the genes of the locus (Fig. 1) have been identified experimentally and/or by in silico prediction ( Table 1). The tripeptide-aminoalcohol moiety is non-ribosomally assembled by the product of the mps1 and the mps2 genes [18,19], and the lipid synthesis and attachment to the tripeptide-aminoalcohol moiety probably require the concerted action of several gene products (pks, fadD23, papA3) [19,20]. The genes involved in the glycosylation of the lipopeptide core (gtf1, gtf2) were recently characterized by selectively inactivating them both and by biochemical analysis of the resulting mutants [21]. Triglycosylated GPLs, described as being produced mostly during stationary phase [22], result from the addi-tion of an extra rhamnosyl residue (gtf3) [9,21]. The genes involved in the O-methylation of the various hydroxyl groups of the rhamnosyl unit (rmt2, rmt3 and rmt4) and of the lipid moiety (fmt) of the GPLs have been characterized using similar methods [23][24][25]. The atf gene product is responsible for acetylating two particular hydroxyl residues of the dTal [26], and FadE5 probably introduces the double bond into the fatty acid. The rmlA and rmlB genes are associated with the activation and epimerisation of the sugars [27]. Two members of the mmpL family, a group of genes encoding large membrane proteins, are also required for GPL biosynthesis [19,26]. The exact function of these proteins, MmpL4a and MmpL4b, is not known in detail although a recent study performed in M. tuberculosis suggests that MmpL proteins (MmpL7) may channel the polyketide products during their synthesis by the polyketide synthase, coupling synthesis and export [28]. Finally, the transport of the GPLs to the surface of the bacilli involves the integral membrane protein Gap [19]. This locus also contains eight other genes that have not yet been experimentally characterized: two encode membrane proteins (mmpS4 and mmpL10) that may also interact with the polyketide synthase of the cluster, two encode a sigma factor and a sigma-associated protein (ecf and sap, respectively) that are believed to contribute to the regulation of GPL production, and four have no known function (Table 1 and Fig. 1) [18,19].
All the genes of the M. smegmatis GPL locus have close orthologs in both M. chelonae and M. abscessus (Table 1). These orthologs share more than 80% of identity with each other and most are more than 90% identical (data not shown). The percentage of identity between M. chelonae/M. abscessus and M. smegmatis orthologs ranges between 30 and 89%, with two-thirds of M. chelonae/M. abscessus orthologs being ≥70% identical to their M. smegmatis counterparts. Identity is less than 50% for only four orthologs: sap, ecf, Rv0926 and Rv1174c. The functions of Rv0926 and Rv1174c are not known, sap and ecf may play roles in the regulation of GPL biosynthesis, and were this the case, it would suggest that the regulatory circuits in these species have diverged.
Unlike M. smegmatis, the M. chelonae/M. abscessus GPL orthologs are not gathered in a single region ( Figure 3). In both M. chelonae and M. abscessus, there is a large region containing 19 genes (mmpS4 to gap). This region contains all genes involved in the synthesis of the tripeptide-aminoalcohol moiety, the glycosylation of the lipopeptide and the O-methylation and O-acetylation modifications (see also Fig. 1). This region is very similar to the corresponding region of the M. smegmatis GPL locus, except for the two following differences. First, there is no mobile element, either upstream of mbtH like in M. smegmatis mc 2 155 or at any other location. Second, there are two atf orthologs (we called them atf1 and atf2) in both M. chelonae and M. abscessus: atf1 is at the same location as atf in M. smegmatis whereas atf2 is inserted between rmlB and rmt2. M. chelonae atf1 and atf 2 genes are 58% identical (71% similarity), and are 76 and 60% identical to M. smegmatis atf respectively (88 and 75% similarity respectively); M. abscessus atf1 and atf 2 genes are 57% identical (72% similarity), and are 72 and 59% identical to atf respectively (83 and 74% similarity respectively). There is a smaller region forming a block of 6 genes 900 kb from this first region in M. chelonae and 3 Mb away in M. abscessus. These six genes (pks to gap-like) are probably involved in the lipid synthesis and attachment to the tripeptideaminoalcohol moiety (e.g., pks, fadD23, papA3), but pks is the only one that has been experimentally studied so far [19]. This block is part of a large region that is inverted between M. chelonae and M. abscessus. It is very similar to the corresponding part of the M. smegmatis GPL locus except that the order of the pe and the fadD23 genes is To test whether the locus organization was a particularity of the sequenced strains (CIP 104535T and CIP 104536T), 5 clinical isolates of each species were analyzed by PCR using two couples of primers (Additional file 1). All the M. abscessus and M. chelonae isolates had the same PCR pattern. This experiment shows that the genetic organization of the GPL locus depicted in Figure 3 is not strain-dependant but is probably valid for the whole species.
Schematic representation of the structure of the GPLs from M. smegmatis but have evolved specificity, being able to transfer one acetyl at once in a sequential manner.

Discussion
This study is the first addressing the genetics of GPL biosynthesis in two clinically significant RGM species, M. chelonae and M. abscessus. The major observation is that, despite producing structurally identical GPL molecules, the genes necessary for its biosynthesis are organized very differently. In M. smegmatis, the GPL locus is made up of almost 30 genes in a region of ~65 kb, and therefore does not comply with the prokaryotic rule of 1 gene/kb. This is because GPL biosynthesis involves very large multi-modular proteins, for example the non-ribosomal protein synthetases (Mps1 and Mps2) and the polyketide synthase (Pks), and consequently very long genes. Several genes appear to be organized into operons, one of which has been identified formally and contains mbtH, mps1, mps2, gap, sap and ecf [19]. Interestingly, a mobile element, IS1096, is located just upstream from mbtH in mc 2 155 strain. This upstream region corresponds to the promoter of the mbtH operon and may therefore interfere with the expression of this operon, as it does in other biological systems [29,30]. Surprisingly, M. chelonae and M. abscessus produced clearly more triglycosylated GPL than M. smegmatis. This observation argues in favour of differences in gtf3 expression in these three species.
All the genes are clustered in M. smegmatis, but are scattered in several blocks in M. chelonae and M. abscessus. The various genomic pieces correspond to blocks of function: one block corresponds to the synthesis of the tripeptideaminoalcohol moiety, the glycosylation of the lipopeptide and O-methylation/acetylation modifications, and another to lipid biosynthesis and its attachment to the tripeptide-aminoalcohol moiety. In addition, these species differ by one inversion and one duplication. An attractive hypothesis is that the compact organization of the GPL locus in M. smegmatis represents the ancestral form and that evolution has scattered various pieces throughout the genome in M. abscessus and M. chelonae. However, the opposite hypothesis in which genes involved in a metabolic pathway would have the tendency to gather during evolution cannot be excluded. The fact that both M. chelonae and M. abscessus have two nonredundant O-acetyltransferases suggests that atf2 may  have arisen from the duplication of atf1. Interestingly, atf1 is less similar to atf2 than to M. smegmatis atf, also indicating a functional divergence between atf1 and atf2. In M. smegmatis, atf mediates the O-acetylation of the dTal in both positions 3 and 4. We showed that, in M. abscessus, atf1 and atf2 are each specifically responsible for one of these two reactions and that probably act sequentially.
We showed that the GPL biosynthetic pathway is highly conserved between M. chelonae and M. abscessus, consistent with the close relatedness of these two species [31]. However, due to genomic rearrangements between the two species, the two blocks are located at different coordinates and the block of six genes is inverted in these two species with respect to that in M. smegmatis. These genomic rearrangements are consistent with the separation of the two species that were formerly parts of a single complex [31]. We showed, using a panel of clinical isolates that these differences are species-specific, and may thus be used as a discriminative marker. The genomic findings are in agreement with the biochemical data showing that the two species produce structurally identical GPL molecules [3]. However, differences in terms of regulation cannot be excluded and it is not known whether additional genes are needed for GPL biosynthesis, export and regulation in these three species.
M. chelonae and M. abscessus, like other mycobacterial species [32,33], can naturally switch from a rough (R) to smooth (S) and from a S to a R morphotype [12,13]. R strains are associated with a low GPL production, high invasive ability and a higher virulence in the mouse model [13,14]. However, despite several attempts, the genetic bases for this natural S/R switching remain obscure. Several studies using M. avium and M. smegmatis describe various genes involved in the transition between S and R morphotypes, most of which are implicated in the GPL biosynthetic pathway [18,19,34]. The identification of the genes required for the synthesis and export of these metabolites should help our understanding of the natural variation in the morphology and virulence variation of these species.

Conclusion
We showed that M. abscessus and M. chelonae contain genes corresponding to all those of the M. smegmatis "GPL locus" with an extensive conservation of the predicted protein sequences. This finding is consistent with the pro- duction of GPL molecules indistinguishable by either thin-layer chromatography or mass spectrometry. Despite, the genomic and structural homology, the GPL locus appears to be split into several parts in M. chelonae and M. abscessus. One large cluster (19 genes) comprises all genes involved in the synthesis of the tripeptide-aminoalcohol moiety, the glycosylation of the lipopeptide and O-methylation/acetylation modifications. A second smaller and distant (M. chelonae, 900 kb; M. abscessus, 3 Mb) cluster of six genes is involved in the synthesis of the fatty acyl moiety and its attachment to the tripeptide-aminoalcohol moiety. The other genes are scattered throughout the genome, including two genes encoding putative regulatory proteins. Although these three species produce identical GPL molecules, the organization of GPL genes differs between them, thus constituting species-specific signatures. An attractive hypothesis is that the compact organization of the GPL locus in M. smegmatis represents the ancestral form and that evolution has scattered various pieces throughout the genome in M. abscessus and M. chelonae, although the opposite scenario cannot be excluded.

Lipid analysis
Lipids were extracted from cells with a mixture of chloroform and methanol and further partitioned by methanol precipitation as previously described [11]. The GPLs (250 μg lipid each deposit) were identified by TLC on silica gel Durasil 25-precoated plates (Macherey-Nagel) run in chloroform/methanol (90:10 [vol/vol]) and using MALDI-TOF mass spectrometry analysis [11]. These sugarcontaining compounds were identified by spraying plates with 0.2% anthrone in concentrated sulfuric acid, followed by heating at 110°C [7].  [38]. The comparative genomic analysis was performed by pairwise alignments between the proteins of the GPL locus of M. smegmatis and each of the complete proteomes mentioned above. These comparisons were performed using the LASSAP software and Z-values were calculated as described [16,17]. The identification of the orthologous links was performed using the results of the pairwise comparisons as follows:

Computer methods
For each gene of the GPL locus of M. smegmatis, 5 bi-directional best hits (BBH) were identified. The BBH having the best Z-value was selected. When several BBH exhibited a similar Z-value (some of the genes of the GPL locus such a fadD and fadE are affected by a high degree of paralogy), the gene preserving the syntenic context was selected. Identity below 25% was not considered as significant. All the selected orthologs have a Z-value greater than 14 (except the sap gene and its orthologs).

Analysis of the M. abscessus and M. chelonae clinical isolates using PCR
The chromosomal DNA was prepared using the beadbeater-phenol extraction method. The bacterial pellet (corresponding to 50 ml culture) were suspended in 5 ml of solution I (25% sucrose; 50 mM TrisCl 1 M pH = 8; 50 mM thiourea; 10 mg/ml lysozyme). The thiourea inhibits the Tris-dependent DNAse that is present in some strains [39]. Solution II (25% sucrose; 50 mM TrisCl pH = 8; 50 mM EDTA pH = 8) was added and the bacterial cells were lysed as described by Howard and al. [40]. Proteinase K was added to the lysate at 100 μg/ml and incubated overnight at 55°C. The DNA was extracted using phenol/chlorophorm/isoamyl-alcohol (25  [41] at the unique XbaI site to generate pNIPatfsmeg. One clone having the atfsmeg gene inserted in the opposite direction of the hygromycin resistant gene was selected and sequenced. Using M. abscessus ATCC 19977 T genomic DNA as template, a similar strategy was applied to clone atf1 gene and atf2 gene (AM231618) using primers atf1abs.5/atf1abs.3 and primers atf2abs.5/atf2abs.3 (Additional file 3) into pNIP40b [41] yielding pNIPatf1absc and pNIPatf2absc, respectively. To clone the M. abscessus atf1 and atf2 genes in frame, the af1absc PCR product was digested by ClaI and the atf2 gene was amplified using new primers (atf2ClaI.5 and atf2abs.3) and digested with ClaI. The 2 PCR products were digested with XbaI, purified and ligated, with the dephosphorylated expression vector pNIP40b at its unique XbaI site to generate pNIPatf1_2absc. These plasmids were electroporated into M. smegmatis mc 2 155 atf-mutant [26] and transformants were selected on plates containing kanamycin and hygromycin. These strains are named atf-/atfsMs, atf-/ atf1Ma, atf-/atf2Ma and atf-/atf1_2Ma.