A phylogenomic analysis of the Actinomycetales mce operons

Background The genome of Mycobacterium tuberculosis harbors four copies of a cluster of genes termed mce operons. Despite extensive research that has demonstrated the importance of these operons on infection outcome, their physiological function remains obscure. Expanding databases of complete microbial genome sequences facilitate a comparative genomic approach that can provide valuable insight into the role of uncharacterized proteins. Results The M. tuberculosis mce loci each include two yrbE and six mce genes, which have homology to ABC transporter permeases and substrate-binding proteins, respectively. Operons with an identical structure were identified in all Mycobacterium species examined, as well as in five other Actinomycetales genera. Some of the Actinomycetales mce operons include an mkl gene, which encodes an ATPase resembling those of ABC uptake transporters. The phylogenetic profile of Mkl orthologs exactly matched that of the Mce and YrbE proteins. Through topology and motif analyses of YrbE homologs, we identified a region within the penultimate cytoplasmic loop that may serve as the site of interaction with the putative cognate Mkl ATPase. Homologs of the exported proteins encoded adjacent to the M. tuberculosis mce operons were detected in a conserved chromosomal location downstream of the majority of Actinomycetales operons. Operons containing linked mkl, yrbE and mce genes, resembling the classic organization of an ABC importer, were found to be common in Gram-negative bacteria and appear to be associated with changes in properties of the cell surface. Conclusion Evidence presented suggests that the mce operons of Actinomycetales species and related operons in Gram-negative bacteria encode a subfamily of ABC uptake transporters with a possible role in remodeling the cell envelope.


Background
A putative Mycobacterium tuberculosis virulence gene, named mce1A, was originally identified because its expression in Escherichia coli enabled this noninvasive bacterium to enter mammalian epithelial cells [1]. Sequencing of the M. tuberculosis genome revealed that mce1A (Rv0169) was part of an operon that encoded eight putative membraneassociated proteins: YrbEA-B, MceA-F [2,3]. This operon is present four times in the M. tuberculosis genome (mce1-4). Homologs of the genes adjacent to the mce1 locus, Rv0175-Rv0178, are located downstream of the mce3 and mce4 gene clusters ( Figure 1) [3].
Continued interest in the function of the M. tuberculosis mce operons stems from reports of the profound effect of disruption of mce operons on growth and virulence of the mutant strains in mice. Shimono et al. [4] showed that an mce1 mutant was hypervirulent when inoculated intravenously into BALB/c mice. In the first few weeks of infection, the mutant strain multiplied more rapidly than wildtype in the mice's lungs, spleen and liver. Surprisingly, Gioffre et al. [5] found that a yrbE1B mutant grew faster than wild-type in the lungs and spleens of BALB/c mice inoculated via the peritoneum, but more slowly in mice infected through the tracheal route. Sassetti and Rubin [6] reported that in competitive mixed infections mce1 mutants exhibited a growth defect in the spleens of intravenously-infected C57BL/6J mice after one week of infection. Although the exact cause of these apparently disparate phenotypes remains to be established, the observations suggest that the fate of mce1 mutants in vivo is determined by the prevailing immunological environment experienced during the first few weeks of infection.
Both mce2 and mce3 mutants replicated slower than wildtype in BALB/c mice infected via either the trachea or peritoneum [5]; however, neither mutant demonstrated a significant growth defect in competitive mixed infections [6]. In co-infected C57BL/6J mice, an mce4 mutant was attenuated relative to wild-type after two to four weeks infection, whilst an mce1-mce4 double mutant exhibited further attenuation, indicating that the mce operons perform non-redundant roles during infection [7].
The similarity of the YrbE and Mce proteins with ATPbinding cassette (ABC) transporter permeases and substrate-binding proteins, respectively, has been noted pre-viously [8,9]. ABC transporters couple the energy released by ATP hydrolysis to the translocation of a substrate across a membrane. Members of the ABC transporter family are ubiquitous in living organisms and comprise one of largest superfamilies known [10].
A functional ABC transporter system minimally contains two cytoplasmic nucleotide-binding ATPase domains and two transmembrane channel-forming permease domains. These components can be homo-or heterodimers and may be encoded on separate or fused polypeptides. Both eukaryotes and prokaryotes contain ABC exporters, whereas importers have been identified only in prokaryotes. Importers additionally require substrate-binding proteins (SBPs) that provide specificity and high-affinity. Typically, SBPs are periplasmic in Gram-negative bacilli and lipoproteins in Gram-positive bacilli [11]. SBPs share a two-lobed quaternary structure with a central cleft that undergoes a large conformational change upon ligandbinding, promoting close interaction with the cognate permease. This results in hydrolysis of ATP, which energizes translocation of the substrate [12]. In Gram-negative bacteria, SBP-dependent importers also usually require porins or specific receptors to facilitate transport across the outer membrane [11].
The genes encoding the ATPase, permease and SBP components of an ABC transporter are often contiguous in the genome and comprise an operon. Phylogenetic clustering of the individual transporter components is almost always concordant, indicating that the operons have arisen from Schematic representation of the M. tuberculosis H37Rv mce loci Figure 1 Schematic representation of the M. tuberculosis H37Rv mce loci. Proximal transcription regulators are colored in purple, yrbE genes in blue, mce genes in green, and genes encoding 'conserved mce-associated proteins' in yellow [44]. a common ancestral transporter with minimal shuffling of constituents. In addition, sequence similarity shows good correlation with substrate specificity [13][14][15].
The ATPase is the most conserved component of the system and transporter function is frequently predicted solely on the basis of ATPase orthology [10,15]. These proteins contain a homologous region, of 200 amino acids, with several characteristic motifs: Walker A and B motifs in the nucleotide-binding fold [16], as well as a signature motif found only in ABC transporter-associated, or 'traffic', ATPases [17].
The permease components and SBPs have limited primary sequence similarity, and thus their identification is not facile. They are typically identified in genome sequences by their proximity to ATPases and, for permeases, possession of predicted transmembrane regions [18][19][20]. The inference of function through sequence comparison has traditionally relied upon similarity to close homologs of known function. The advent of the genomic age has provided invaluable new methods for the elucidation of roles of proteins with unknown function. Non-homologybased methods of genome comparison use patterns of domain fusion [21], conserved chromosomal location [22], and phylogenetic profiles [23], to predict functional interactions between proteins. In addition, the availability of hundreds of complete genome sequences permits the reliable identification of orthologs, operationally-defined as reciprocal best hits [24], enabling more precise functional prediction than sequence similarity alone. These methods are non-redundant and their application can facilitate deduction of specific function [25]. Here we endeavor to further understand the function of the M. tuberculosis mce operons, and assess the likelihood that they encode ABC transporters, through sequence and genome comparisons, database mining and the application bioinformatic methods.

Distribution of mce operons in Actinomycetales
Perusal of databases of conserved domains, such as Inter-Pro [26], Pfam [27] and TIGRFAM [28], constitutes a simple method for the identification of homologous proteins. The M. tuberculosis H37Rv genome encodes 24 Mce proteins, each of which contains a conserved domain of 304 amino acids defined by the TIGRFAM family: TIGR00996 (IPR005693). Members of this family are confined to the Order Actinomycetales. The corresponding Pfam family, PF02470 (IPR003399), describes a 98 amino acid sub-region of the Mce domain that is more widely distributed (see below). The mce genes in M. tuberculosis are clustered in groups of six; each cluster is preceded by two copies of a gene termed yrbE (Figure 1). Databases of conserved domains group the YrbE proteins into a family called DUF140 (domain of unknown function). Pfam defines the family by a region approximately 150 amino acids long (PF02405; IPR003453). The corresponding TIGRFAM family (TIGR00056) describes a subfamily of DUF140, but excludes the mycobacterial homologs based on a stated extreme divergence at the amino end. For the sake of clarity, we refer to a cluster of genes encoding two YrbE and six Mce proteins as an 'mce operon'.
To assess the distribution of mce operons in completed and draft assemblies of genomes of members of the Order Actinomycetales, we surveyed the annotation of predicted proteins for members of Pfam families PF02470 and PF02405 ( Examination of the genomic location of the Mce and DUF140 homologs revealed that the mce genes were almost always found clustered in groups of six, located downstream from a pair of DUF140 genes ( Figure 2). Genes with related functions are frequently encoded within operons and thus found clustered in the genomes of prokaryotes [22]. We investigated the gene neighborhoods of selected mce-like genes with the aim of obtaining clues regarding the biological role of proteins of this family ( Figure 3). The Mce-like proteins in Gram-negative bacteria were frequently found clustered in the genome with a DUF140 family protein and an ATPase homolog (IPR003439) in an arrangement typical of an ABC transporter system [11]. The three components were found encoded in any order and in some instances either the  [29]; the presence of SBPs indicates that these operons encode substrate uptake transporters. Aravind and Koonin suggested that the nucleotide-binding activity of STAS domains, found in sulfate transporters, could regulate uptake in response to intracellular ATP or GTP concentrations [30]. Several DUF140 proteins that are N-terminally fused to STAS domains have been identified [31], implying a functional linkage between these two proteins in the mce operons [21]. The Mce transporter clusters were also frequently found associated with homologs of a surfaceexposed lipoprotein VacJ (IPR007428), and the morphoprotein BolA (IPR002634).  [32].

Mce-associated ATPases
Since ABC transporters absolutely require an ATPase to provide the energy required for substrate translocation, the genes neighboring the Actinomycetales mce operons were inspected for ATPase homologs (IPR003439). Although none of the mycobacterial mce operons neighbors an ATPase, a candidate gene was identified immediately upstream of a single mce operon in the genome of every non-mycobacterial Actinomycetales species that possesses mce genes ( Table 2). BLASTP analyses demonstrated that the corresponding protein sequences were reciprocal best hits with the mce-linked ATPases in Gram-negative bacteria, indicating orthology [24]. A phylogenetic analysis of ABC transporter ATPases reported by Dassa and Bouige groups these Actinomycetales and Gram-negative bacterial ATPases into a family termed Mkl [8].
The sequences of the N. farcinica and Streptomyces mcelinked ATPases (nfa51100, SAV5902 and SCO2422) were used as BLASTP queries in order to identify additional Mkl-like ATPases. The best hits from each of the completed Actinomycetales genomes (Table 1) were retrieved for further evaluation. Phylogenetic analysis of the protein sequences revealed that each Mycobacterium species contained a single ATPase that clustered with the Mkl family, providing strong evidence of orthology ( Figure 4, Table 2). In addition, a paralog was identified in the N. farcinica genome (nfa20200); this ORF is annotated in The Institute of Genome Research (TIGR) database as MetN, a D-methionine ABC transporter ATPase, but it does not cluster with other putative MetN orthologs (Figure 4).
Comparison of the most closely related ORFs in other Actinomycetales revealed that only those genomes that contained mce operons possessed an orthologous ATPase ( Figure 4). Congruency of the phylogenetic profiles of the Mkl ATPases with YrbE and Mce proteins provides further evidence of functional association [23].
Each of the mce-linked ATPases and mycobacterial orthologs contain the conserved Walker A and B motifs required for ATP binding, as well as the ABC transporter family signature (LSGGQ) with no more than one mismatch [16,33]. In a published analysis of M. tuberculosis ABC transporters, the putative Mce ATPase, Rv0655, segregated with importers but did not fall into any of the previously described families with known substrates [20]. Similarly, in a more expansive study, the Mkl family ATPases fell into the SBP-dependent importer clade, but clustered separately from those with established specificity [8].
The mycobacterial Mkl ATPases and nfa20200 and are not genomically located near any other ABC transporter components and appear to be transcriptionally-isolated. The M. leprae ortholog is located adjacent to RNA polymerase rpo genes leading to speculation that this ATPase was involved in ribonucleotide uptake [34]. Consequently, Mkl ATPases are sometimes annotated as ribonucleotide uptake systems.

The Mce proteins
Comparison of the amino acid sequences of the Mce proteins encoded in the genomes of Mycobacterium bovis and the M. tuberculosis strains H37Rv, CDC1551 and 210, revealed that each of the M. tuberculosis genomes contained 24 Mce ORFs, whilst, as noted previously, the mce3 operon is deleted in M. bovis [35]. A number of genes were found to contain frameshift mutations: mce1F in strain 210; mce2B in strains H37Rv and CDC1551; mce2C in strain CDC1551; and mce2D and mce2E in M. bovis. The truncated ORFs thus conspicuously clustered within the mce2 operon.  additional putative Mce homologs that were included in this analysis (Table 3).

Gamma-
Multiple alignment and phylogenetic analysis of the Mce homologs revealed six distinct branches, which corresponded exactly to the encoding genes in the respective operons (that is mceA-F; Figure 5). Within each of the six major branches, the clustering of sequences was essentially the same. This pattern indicates that each mce gene cluster duplicated from an ancestral operon that contained six mce genes and that no shuffling between or within operons has occurred.
We have classified the operons as mce1-8 according to the clustering observed (Table 3). The mce1 and mce2 operons are the most closely related and duplication may have occurred after divergence of the fast-and slow-growing mycobacteria, since M. smegmatis contains a single copy. Although the orthology of the M. smegmatis operon cannot be deduced from the phylogenetic tree, we infer from synteny that it is orthologous to the M. tuberculosis mce1 operon. Thus, mce1 is the sole operon that is found in all, and in only, the Mycobacterium species examined. The Streptomyces operons fall into a cluster, termed mce6, that does not contain any mycobacterial orthologs, but is found in N. farcinica. The Mkl-like ATPase is located upstream of yrbEA6 in all three of these operons. In several cases operon orthology could not be deduced from the branching pattern observed, presumably due to recent duplication events. Thus, it appears that M. paratuberculosis and M. smegmatis possess two copies of the mce5 operon; M. paratuberculosis and N. farcinica have two copies of the mce7 operon; and N. farcinica has two copies of the mce8 operon. The M. paratuberculosis Mce5E protein (MAP2193) seems to have diverged significantly from its paralog (MAP0764); examination of the encoding sequences revealed that this is a consequence of a 40bp deletion, which results in a frameshift of the N-terminal 120 amino acids.
One and two extra copies of Mce1A were found in M. paratuberculosis (MAP3289) and M. smegmatis (MSMEG5783, MSMEG6500), respectively; whilst N. farcinica contained a second copy of Mce4A (nfa25900). Each of the encoding genes appeared to be transciptionally isolated, with the exception of MSMEG5783, which is located within a four-gene operon that includes pyridoxamine 5-phosphate oxidase and a putative lipoprotein.
Secondary structure predictions, through the JPred server, revealed the consensus structure of the conserved Pfam region folded into five β-strands; the central region of Actinomycetales Mce proteins, included in the conserved TIGRFAM region, contains eight α-helices. The C-terminal region varies in length from 10-250 amino acids, has predicted low complexity and is rich in proline residues ( [36,37]. Each of the Mce proteins contained a hydrophobic stretch at the N-terminus, likely to be a transmembrane helix. Using a neural network trained on Gram-positive bacteria the program SignalP predicted a signal peptide cleavage site for 98 of 161 of these proteins [38]. There was no correlation between prediction of secretion and Mce-type (A-F) or bacterial species. Although the Mce anchor regions frequently contained a pair of arginine residues, characteristic of Twin-arginine transporter (Tat) motifs, few (12 of 161) are recognized as Tat substrates [39]. A lipoprotein attachment site (PS00013) was present in 22 of 27 MceE proteins. The highly conserved operon structure containing six mce genes suggests that they associate to form a heteromeric complex [22,40], which is therefore likely to remain tethered to the cell membrane even if some proteins are cleaved. Indeed, Mce1A-1F have been shown to localize to the cell envelope of M. tuberculosis [4].  Phylogenetic tree showing relationship between mce-linked ATPases and mycobacterial orthologs Figure 4 Phylogenetic tree showing relationship between mce-linked ATPases and mycobacterial orthologs. ATPases encoded within mce operons in Actinomycetales species are colored blue; those in Gram-negative bacterial mce operons are colored green. The sequences most similar to nfa51100, SAV5902 and SCO2422 (indicated in bold), in the Actinomycetales genomes listed in Table 1, were identified by BLASTP searches and included in the tree. All of the best hits from mycobacterial species cluster within the Mkl family and are colored red. For comparison, sequences of all M. tuberculosis H37Rv ATPases of ABC uptake transporters were included [20]. All of the top hits from Actinomycetales that do not possess mce operons are rooted among these non-mce-linked ATPases, as are all of the second hits from mycobacterial species. ORFs are designated by (UniProt gene name | protein name).     Phylogenetic tree of Actinomycetales Mce proteins    M c e C cinica, S. coelicolor and S. avermilitis were selected for further analysis. In several cases the ORF downstream of yrbEA was either not annotated or annotated in the reverse direction; however, translation of the genomic sequence revealed a YrbEB homolog encoded in the expected direction (Table 3).

The YrbE proteins
Phylogenetic analysis showed deep branching between the YrbEA and YrbEB sequences (Figure 7). Within each clade the clustering of sequences was almost identical demonstrating that the yrbEA-yrbEB genes have evolved as a pair. The clustering was comparable to that seen in the Mce protein tree, with members of the mce1/2 and mce3 to mce8 operons easily distinguishable. Thus, it appears that all of the operons examined evolved from a common ancestral eight-gene cluster without shuffling of genes within or between operons.
ABC permeases typically contain six transmembrane segments with the C-terminus located on the cytoplasmic side of the membrane [11]. The consensus TMHMM-predicted structure of Actinomycetales YrbE homologs found in mce operons suggests the presence of five or six transmembrane helices with the C-terminus outside ( Figure  8a). The presence of the N-terminal transmembrane helix was equivocal, and therefore the N-terminus may be cytoplasmic or outside. Further topological predictions using the programs HMMTOP and TopPred confirmed this model, but were unable to verify or refute the existence of the N-terminal transmembrane segment.
Dassa and colleagues [41,42] have described a highly-conserved sequence, the EAA motif, in the final cytoplasmic loop of some SBP-dependent ABC permeases that is proposed to interact with the cognate ATPase [43]. Examination of the multiple alignment of YrbE proteins revealed a conserved sequence motif located in the penultimate cytoplasmic loop. The consensus deduced from 50 Actinomycetales YrbEA and YrbEB sequences is shown in Figure  8b. Alignment of Gram-negative bacterial DUF140 proteins revealed that this region was highly conserved in all family members. The consensus sequence we have deduced does not appear to be homologous to the published motifs, but does contain the common invariant glycine residue and is predicted to adopt the typical α-helical structure [42]. The consensus 47 amino acid YrbE sequence, that we have termed the EExDA motif, was able to specifically retrieve Actinomycetales and Gram-negative DUF140 proteins from the National Center for Biotechnology Information (NCBI) microbial proteomes database.
In one case (Rhodopirellula baltica, RB3287) a DUF140 domain is fused to an ABC ATPase domain providing evidence that the function of DUF140 proteins requires ATP hydrolysis [21].

The Mas proteins
The four genes downstream of the M. tuberculosis mce1 operon, as well as two each downstream of the mce3 and mce4 operons, are annotated in TubercuList [44] as 'conserved mce-associated proteins' (herein termed Mas). The mce1 operon transcript has been empirically demonstrated to include the associated mas genes (Rv0175-78) [45]. Examination of a multiple alignment of the protein sequences revealed that they were not conserved along Illustration of conserved regions and predicted secondary structure of Actinomycetales Mce proteins  Table 3 were submitted to JPred and the consensus secondary structure prediction estimated manually. White boxes represent α-helices and grey arrows β-strands. The C-terminal proline-rich region had low complexity and varied in length from 10-250 amino acids. Signal sequences were identified by SignalP and lipid attachment sites matched the ProSite motif PS00013.  YrbEA YrbEB their entire length but shared a similar C-terminal region of approximately 160 amino acids. Pairwise sequence identity scores, generated by ClustalX, for the conserved region ranged from 12 to 25%.
To determine whether homologous domains were present in other genomes, we used each of the eight Mas C-terminal sequences as a PSI-BLAST query against the NCBI nonredundant database. A total of 137 sequences were retrieved; of these, 124 sequences were hit by all eight query sequences, and all 137 were hit by more than two queries. The proteins identified belonged to six genera: Amycolatopsis, Janibacter, Mycobacterium, Nocardia, Nocardiodes and Streptomyces. Thus, the phylogenetic profile for the putative Mas homologs in Actinomycetales genera exactly matches that of the Mce, DUF140 and Mkl proteins. Mas homologs in the M. smegmatis genome, which was not covered by the NCBI database, were identified by exhaustive BLAST querying of the TIGR proteome. Nine-teen putative Mas homologs were thus identified (P < 0.00001).  0   1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45 46 47 R while the corresponding S. coelicolor operon carried four. In M. paratuberculosis, a pair of mas homologs was located in the regions both upstream and downstream of the mce5 operon, but transcribed from the opposite strand (MAP0750-51c, MAP0767-68c). The 23 non-mce operonassociated Mas homologs were generally located in pairs in isolated operons. An exception was Rv2390c, which TIGR predicts is part of a three-gene operon including a resuscitation promoting factor (rpfD, Rv2389c) and an Fe-S enzyme involved in porphyrin biosynthesis (hemN, Rv2388c).

Sequences of the putative
The Mas region is not currently recognized as a conserved domain in the databases. However, within this region, InterPro recognized a lipocalin family motif (IPR002345) in Rv3492c, and a partial C2 domain signature (IPR000008) in Rv0199 and ML2614. Notably, the corresponding Pfam families (PF00061 and PF00168) did not include these sequences as members. Nonetheless, it may be worthy of mention that the lipocalin and C2 domains share a lipid-binding function, as well as an eightstranded anti-parallel beta sandwich structure [46,47].
The majority of pairwise identity scores for the 66 Mas domains were 10-20%. This low level of sequence similarity resulted in multiple sequence alignments that were extremely sensitive to input parameters. Exclusion of the 13 non-mycobacterial sequences produced a much more robust alignment. A phylogenetic tree generated from this alignment is shown in Figure 9. Examination of the tree revealed that the Mas proteins encoded by the first and second genes in each pair formed phylogenetically dis-  The mce7-associated Mas proteins also cluster together. Several pairs of non-mce associated Mas homologs were conserved between mycobacterial species (Figure 9; Cluster I and Cluster II).
The mycobacterial mce-associated Mas orthologs have greater than 50% pairwise identity. In contrast, the Nocardia and Streptomyces mce6-associated Mas proteins are highly divergent (15-20% identity). This suggests that, unlike the mce and yrbE genes, the mas genes have either diverged more rapidly or were independently recruited to the operons.
Comparison of JPred secondary structure predictions for orthologous clusters revealed the consensus structure of the conserved domain was α 1 α 2 α 3 α 4 β 1 β 2 β 3 β 4 . Prediction of transmembrane helices indicated that all 66 protein sequences harbored a transmembrane segment located about 140-180 amino acids from the C-terminus and corresponding to α 1 . Topology prediction programs, TMHMM, HMMTOP and TopPred, suggested the C-terminus was extracellular for 41, 56 and 42, of the 66 submitted sequences, respectively. In no case did all three programs predict an extracellular N-terminus for a single protein. Thus, it seems likely that all N-termini are intracellular, while the C-terminal Mas domains are located on the external side of the cytoplasmic membrane.
The length of the N-terminal region preceding the Mas domain ranged from 7 to 325 amino acids. In the majority of proteins in which the N-terminal segment was less than 30 amino acids (11 of 16), α 1 was predicted to be a signal peptide by SignalP ( Figure 10). Consensus topology predictions indicated that the four Mas1B orthologs and three Cluster IIB proteins contained two N-terminal transmembrane helices (oriented in-out, out-in). In the Mas1B orthologs, the two N-terminal transmembrane segments correspond to an RDD domain (IPR010432). Examination of a multiple alignment revealed that although M. smegmatis Mas1B does not actually have the N-terminal signature RD residues, the Cluster IIB proteins do. It has been proposed that the RDD domain is involved in transport [31]; however, to date, no empirical evidence has been published to support this claim. In MSMEG0879 the 325 amino acid N-terminal region encodes a protein kinase domain (IPR000719) containing the Ser/Thr kinase active site motif (PS00108). Coiled-coils, which are known to mediate protein-protein interactions [48], were identified in the N-terminal region of each Cluster IA sequence by the Lupas COILS algorithm.

Discussion
In this study we sought to gain insight into the function of the M. tuberculosis mce operons using genome comparisons and bioinformatic methods.
The YrbE and Mce proteins, encoded by the M. tuberculosis mce operons, have homology to the permease and SBP components of ABC transporters, respectively [29]. However, sequence similarity within these protein families is notoriously low, and confirmation that the mce operons encode ABC importers has required identification of the necessary cognate ATPase. Dassa and Bouige [8] have proposed that Rv0655, an ATPase named Mkl, might supply this function and here we provide substantial evidence that this is indeed the case.
Firstly, Mkl orthologs are encoded immediately upstream of the mycobacterial-like mce operons in species of Nocardia, Janibacter, Nocardioides, Amycolatopsis and Streptomyces. Secondly, orthologs of Mkl are found in all, and in only, those Actinomycetales species that also contain Mce and DUF140 homologs. The presence of an intact mkl gene in the M. leprae genome, which has undergone extensive reductive evolution [49], is significant in this respect. Thirdly, in Gram-negative bacteria, operons containing DUF140 and mce homologs invariably include the orthologous mkl gene. Recently, Joshi et al. [7] observed that in competitive mouse infections an Rv0655 mutant was attenuated relative to wild-type M. tuberculosis, whereas an Rv0655-mce1 double mutant showed no attenuation relative to the mce1 mutant, providing evidence that Rv0655 and the Mce1 proteins are functionally linked. It is notable that in the Mycobacterium species examined, the mkl gene is located within the genomic region that encodes the majority of ribosomal proteins; this is generally the most conserved region in prokaryotic genomes and could facilitate high level expression of mkl [40].
It is widely accepted that the direction of substrate transport of ABC transporters can be predicted on the basis of ATPase homology [10]. In phylogenetic analyses, Mkl ATPases fall into the importer clade [8,20]; this prediction is consistent with the proposed role of Mce proteins as SBPs, which are found exclusively in substrate import systems.
The results of topology prediction indicated that the YrbE proteins contained five to six transmembrane segments, with the C-terminal five the most conserved and the C-terminus outside. In support of this model, the periplasmic location of the C-terminus of E. coli YrbE has been demonstrated empirically [50]. In general, ABC permeases Phylogenetic tree of mycobacterial Mas domain sequences Figure 9 Phylogenetic tree of mycobacterial Mas domain sequences. The conserved Mas domains of mycobacterial proteins listed in Table 4 were aligned and an unrooted neighbor-joining tree was computed by MEGA. Coloring corresponds to the classification scheme specified in Table 3  show the highest level of sequence similarity over the Cterminal five transmembrane regions, and this is considered to be the minimal functional unit [11]. In compiled alignments of ABC permease sequences, the most conserved region localizes to the final cytoplasmic loop [42]. This motif, termed the EAA loop, likely interacts with the cognate ATPase [43]. A highly conserved motif, predicted to localize to the penultimate cytoplasmic loop, was identified in YrbE proteins from both Actinomycetales and Gram-negative bacteria. We propose that this motif, named the EExDA loop, serves as the site of interaction with the putative cognate Mkl ATPase, in a manner analogous to the EAA loop.
Conservation of the 'two yrbE plus six mce' operon structure suggests that these components comprise the functional unit of the canonical Actinomycetales Mce transporter [22,40]. We have found that mutation of either the yrbE1A, mce1A or mce1E genes of M. tuberculosis results in undetectable levels of all the Mce1 proteins, implying that these proteins are part of a hetero-octomeric complex and its formation is necessary for stability of the Mce proteins [4] (L. Morici, personal communication). It is interesting that many Proteobacteria contain membrane proteins with multiple Mce domains (PqiB proteins) that could potentially interact forming a quaternary structure analogous to the putative Acinomycetales Mce complex. The permease components of ABC transporters, that form a channel across the cytoplasmic membrane, are frequently heterodimers; however, although present in stoichiometric excess, SBPs are generally encoded by one or two genes [11]. The presence of six SBPs is, thus far, a unique characteristic of the Actinomycetales Mce transporters. Using computational methods, Pajon et al. [51] found that the β-sheet region of eight of the M. tuberculosis Mce proteins contained patterns typical of transmembrane βstrands and suggested that this region could promote penetration of the outer lipid layer. Thus, it is tempting to speculate that the Mce proteins are designed to form a channel that crosses this lipid bilayer. Chitale et al. [52] have previously shown that Mce1A is indeed exposed on the surface of M. tuberculosis.
Proteins encoded downstream of three of the four M. tuberculosis mce operons exhibit significant sequence homology. Similarity is confined to the 160 amino acid C- Figure 10 Representative architectures of Mas domain-containing proteins. Membrane topology predictions for the 66 Mas proteins listed in Table 4 indicated that the conserved domain was located on the extracellular side of the cytoplasmic membrane. The Mas domain was predicted to remain anchored in the majority of proteins (A), but cleaved in eight (B). Three transmembrane segments were identified in seven proteins and four of these were classified as RDD domains (C, D). Five proteins contained an N-terminal coiled-coil region (E), and one, a serine-threonine protein kinase domain (STPK; F). Several studies have shown that the γ-proteobacterial mce loci play a role in determination of structural properties of the cell envelope, which in pathogenic species affects invasive activity. In Pseudomonas putida, a transposon insertion within the DUF140-Mce-associated ttg2A ATPase (PP0958) renders the cells sensitive to toluene [53]. In addition to toluene degradation and efflux, toluene tolerance is known to be mediated by increased cell membrane rigidity resulting from changes in fatty acid and phospholipid composition [54]. In Shigella flexneri, mutations in the vpsABC locus (S_3453-51), encoding an ABC transporter with the ATPase-DUF140-Mce configuration, result in a defect in intercellular spread through epithelial cell monolayers, altered colony morphology, increased sensitivity to detergent lysis and hypersecretion of both Sec-dependent and TypeIII-dependent virulence proteins [55]. Carvalho et al. have reported that in Campylobacter isolates, presence of iamA, the ATPase gene of the mce operon (Cj1646-48), correlated with an invasive phenotype [56], although, this association remains controversial [57][58][59]. In Neisseria meningitidis the mce-like operon, gltT (NMB1966-64), belongs to the GdhR regulon, which is expressed at higher levels in invasive versus commensal isolates, and is particularly elevated in hypervirulent lineages [60]. that possess a somewhat analogous outer lipid bilayer raises the possiblity that the mce operons are involved in maintenance of outer membrane integrity. However, their presence in other Actinomycetales with typical Gram-positive type cell envelopes appears to preclude this hypothesis. In addition, the absence of mce operons in Corynebacterium species indicates that their function is not essential for maintenance of an outer lipid bilayer.

Representative architectures of Mas domain-containing proteins
Based on a stated similarity of the ATPase component to GluA of Corynebacterium glutamicum, Meidanis et al. [62] proposed that the Xylella fastidiosa mce-like operon (XF0421-19) encoded a glutamate importer. It was subsequently shown that a mutation within the homologous N. meningitidis gltT operon resulted in impaired glutamatespecific uptake at low sodium concentrations [63]. Glutamate is a prominent constituent of peptidoglycan; thus, disruption of its uptake in the proteobacterial mce operon mutants could perhaps account for the observed effect on cell envelope properties. Also relevant in this respect, is the conserved location of the peptidoglycan biosynthetic gene, murA, downstream of the Mce transporter genes in γ-Proteobacteria.
Homologs of the Mkl, Mce and DUF140 proteins have also been identified in plants [64]. The Arabidopsis homologs of DUF140 (TGD1, At1g19800) and Mce (TGD2, At3g20320) both localize to the inner plastid membrane, with the Mce domain located in the intramembrane space. Lipid binding studies demonstrated that TGD1 specifically bound 1,2-diacyl-sn-glycerol 3phosphate (phosphatidic acid). TGD1 and TGD2 mutants exhibited identical phenotypes consistent with disruption of transport of ER-derived phosphatidic acid into chloroplasts, suggesting the TGD proteins form part of a lipid translocator [65][66][67].
Orthologous ABC transporters are expected to be functionally equivalent [13][14][15], thus the proposal of both phosphatidic acid and glutamate as possible substrates of the Mce transporters is puzzling. It is noteworthy that in sequence analyses, by us and others, the Mkl-like ATPases are not closely related to GluA [8]. If the bacterial Mce homologs have phospholipid binding function, equivalent to TGD1, this might enable interaction with host cell membranes and explain the invasive phenotype associated with the mce loci. It is generally accepted that hostderived lipids are the primary source of carbon utilized by M. tuberculosis in vivo [68]; however no mechanism of lipid import has been identified. Thus it is enticing to hypothesize that the Mce transporters might perform this role. Inclusion of the fatty-acyl CoA synthetase, fadD5, in the mce1 operon and repression of the operon by a FadRlike regulator, lends some support to this conjecture [45].
The canonical eight-gene mce operon has undergone extensive proliferation and deletion events within certain Actinomycetales lineages, most notably in Mycobacterium and Nocardia species. The simplest explanation for the presence of multiple mce operons is that it facilitates elevated expression. However, evidence from transcriptional analyses of M. tuberculosis suggest that, at least in this organism, the operons are not co-regulated [69][70][71][72]; in addition, three of the four operons are associated with transcriptional regulators [45,73]. In competitive mouse infections, Sassetti and Rubin [6] found that an mce1 mutant exhibited a growth defect during the first 1-2 weeks of infection, whilst an mce4 mutant showed attenuation 3-4 weeks after inoculation. These observations support the proposition that the operons function at different stages of infection. Differential expression of the individual Mce transporters may reflect optimization for substrate uptake under differing conditions, such as in the low sodium intracellular environment; alternatively, they might have varying substrate specificities.
The number of mce operons in individual species appears to reflect the variety of environmental niches inhabited. Thus, the fast-growing, typically soil-dwelling, Mycobacterium species possess the greatest number, with polycyclic aromatic hydrocarbon-degrading species, isolated from bioremediation sites, containing the most [74]. In contrast, the host-specialized, slow-growing pathogenic species possess fewer operons, and the obligate intracellular pathogen, M. leprae, encodes a single complete mce operon. A high degree of sequence similarity indicates that the mce1 operon duplicated to create mce2 relatively recently. In M. tuberculosis complex strains, mce frameshift mutations are found conspicuously in these two operons: of the five described in this paper, four are in mce2 and the fifth is in mce1. This pattern may reflect the functional divergence of the mce1 and mce2 operons.
With the exception of mycolic acids, the distribution of morphological and chemotaxonomic traits within the Actinomycetales is polyphyletic [75]. Given the incongruent taxonomic distribution of the mce operons and their proposed role in integrity of the cell envelope, it is pertinent to note that presence of mce operons does not correlate with type of peptidoglycan, menaquinones, phospholipids or fatty acids in the cell envelope [75,76]. In addition, there is no correlation with oxygen requirement, habitat or pathogenicity.

Conclusion
The available evidence suggests that the mce operons encode a novel subfamily of ABC transporter uptake systems comprised of DUF140 permease components, Mcelike substrate-binding proteins, and Mkl-type ATPase domains. Disruption of mce operons, in both Actinomyc-etales and Gram-negative bacteria, affects properties of the cell envelope and associated virulence phenotypes of pathogenic species. Empirical studies have implicated both glutamate and phosphatidic acid as substrates of mce-like transporters; thus, although the precise substrate specificity of the M. tuberculosis Mce transporters remains uncertain, we conclude that it is likely to be an organic acid precursor of cell envelope biogenesis.

BLAST analyses
Sequence similarity searches were performed by BLASTP against complete microbial genome sequences deposited in the TIGR-CMR and NCBI Microbial Genome Project databases [79,81,86]. To determine whether the EExDA motif identified in YrbE proteins was uniquely characteristic of the DUF140 family, we performed a BLASTP search of NCBI Microbial Genome Project with the Actinomycetales YrbE consensus motif (PLVTGLALAGAGGAAITADLGARRIREEIDALEVMGID-PISRLVVPR) using the default parameters, except with no filter and expect threshold of 100. To identify homologs of the M. tuberculosis Mas domain, each of the eight sequences was used in a PSI-BLAST query against the NCBI non-redundant database [87]. We used an inclusion threshold of P < 10 -5 and the scores were adjusted with composition-based statistics; these parameters resulted in convergence after 6-8 iterations.

Multiple alignment and phylogenetic analyses
Phylogenetic analyses were conducted using the MEGA version 3.1 suite of programs [88]. Multiple alignments were constructed by CLUSTAL-W using the Gonnet weight matrix and default gap penalties [89]. Unrooted trees were computed by the neighbor-joining method. The consensus tree, after 500 bootstrap replicates, was displayed graphically with Tree Explorer. In addition, CLUSTAL-W alignments were converted to PHYLIP format and trees computed by the maximum likelihood method implemented by PROML using default parameters [90]. In all cases this resulted in a tree with topology that was essen-tially the same as the neighbor-joining tree generated by MEGA. Percentage pairwise similarity scores were calculated by CLUSTAL-X [91].

Identification of conserved motifs
The MEME server was used to discover highly conserved sequence motifs within groups of homologous proteins [92,93]. Motifs were displayed graphically using WebLogo [94,95].

Secondary structure and topology prediction
Groups of aligned orthologs were submitted to JPred [96], a consensus secondary structure prediction server, that provides improved accuracy over single sequence prediction methods [97]. Comparison of predictions between orthologous clusters by visual inspection allowed estimation of the consensus structure for a homologous family. Coiled-coils were predicted using the Lupas COILS algorithm through the JPred server [98].
Protein sequences were analyzed by SignalP and TatP to identify Sec-and Tat-dependent signal sequences [38,39,99]. The reliability of prediction of transmembrane helices and topology of proteins increases when different methods are combined [100]. Hence, we submitted sequences to TMHMM [101,102], HMMTOP [103,104] and TopPred [105,106], and determined the consensus prediction by manual comparison.