Is the C-terminal insertional signal in Gram-negative bacterial outer membrane proteins species-specific or not?
© Paramasivam et al.; licensee BioMed Central Ltd. 2012
Received: 6 July 2012
Accepted: 25 September 2012
Published: 26 September 2012
In Gram-negative bacteria, the outer membrane is composed of an asymmetric lipid bilayer of phopspholipids and lipopolysaccharides, and the transmembrane proteins that reside in this membrane are almost exclusively β-barrel proteins. These proteins are inserted into the membrane by a highly conserved and essential machinery, the BAM complex. It recognizes its substrates, unfolded outer membrane proteins (OMPs), through a C-terminal motif that has been speculated to be species-specific, based on theoretical and experimental results from only two species, Escherichia coli and Neisseria meningitidis, where it was shown on the basis of individual sequences and motifs that OMPs from the one cannot easily be over expressed in the other, unless the C-terminal motif was adapted. In order to determine whether this species specificity is a general phenomenon, we undertook a large-scale bioinformatics study on all predicted OMPs from 437 fully sequenced proteobacterial strains.
We were able to verify the incompatibility reported between Escherichia coli and Neisseria meningitidis, using clustering techniques based on the pairwise Hellinger distance between sequence spaces for the C-terminal motifs of individual organisms. We noticed that the amino acid position reported to be responsible for this incompatibility between Escherichia coli and Neisseria meningitidis does not play a major role for determining species specificity of OMP recognition by the BAM complex. Instead, we found that the signal is more diffuse, and that for most organism pairs, the difference between the signals is hard to detect. Notable exceptions are the Neisseriales, and Helicobacter spp. For both of these organism groups, we describe the specific sequence requirements that are at the basis of the observed difference.
Based on the finding that the differences between the recognition motifs of almost all organisms are small, we assume that heterologous overexpression of almost all OMPs should be feasible in E. coli and other Gram-negative bacterial model organisms. This is relevant especially for biotechnology applications, where recombinant OMPs are used e.g. for the development of vaccines. For the species in which the motif is significantly different, we identify the residues mainly responsible for this difference that can now be changed in heterologous expression experiments to yield functional proteins.
KeywordsOuter membrane β-barrel protein biogenesis Clustering Hellinger distance CLANS Species specificity Short linear motifs GLAM2 C-terminal β-strand BamA β-barrel assembly machinery Gram-negative bacteria Outer membrane Principal component analysis Frequency plots
In Gram-negative bacteria, the cytoplasm is surrounded by inner membrane (IM) and outer membrane (OM), which are separated by an inter-membrane space, called the periplasm. Most of the newly synthesized proteome remains in the cytoplasm, but in addition, different machineries are involved in the translocation of non-cytoplasmic proteins to different subcellular localizations, including the inner or outer membrane, the periplasmic space, or the extracellular space. Some of these machineries recognize their substrate proteins by an N-terminal signal peptide (SP) for the translocation process, while other machineries are SP-independent. The IM, which is a phospholipid lipid bilayer, is mostly occupied by transmembrane α-helical proteins, by inner membrane lipoproteins on its periplasmic side, and by other membrane associated proteins on both sides of the membrane. In contrast, the asymmetric OM, which consists of phospholipids only in the inner leaflet of the membrane and lipopolysaccharides in the outer leaflet, is mostly occupied by transmembrane (outer membrane) β-barrel proteins, and by outer membrane lipoproteins on its periplasmic side.
The biogenesis of an outer membrane β-barrel protein (OMP) begins with the translocation of the newly synthesized, unfolded protein across the IM into the periplasm via the Sec translocation machinery, which requires a cleavable general SP. Once the unfolded OMP reaches the periplasm, it uses the SurA or Skp-DegP pathway to reach the OM. SurA, Skp and DegP are periplasmic chaperones, which interact with unfolded OMPs by protecting them from aggregation and thus help them to reach the OM[2, 3]. It has been shown that the SurA pathway and the Skp/DegP pathway can work in parallel, but that the SurA pathway plays an important role when the cell is under normal growth conditions, while under stress conditions, the Skp-DegP pathway plays the major role[4, 5].
Once periplasmic chaperones deliver the OMPs to the OM, the folding and insertion of the protein into the membrane is mediated by the β-barrel assembly machinery (BAM), without an external energy source such as ATP or ion gradients. This machinery involves an essential multi-domain protein, BamA (Omp85), which consists of a 16-stranded transmembrane β-barrel domain, and of a large periplasmic part that consists of five POTRA (polypeptide transport-associated) domains. BamA is highly conserved in Gram-negative bacteria and also has homologues in mitochondria (Sam50) and chloroplasts (Toc75-V). In addition, the BAM complex, at least in E. coli, consists of four lipoproteins, BamB, BamC, BamD and BamE, among which only BamD is essential and conserved in most Gram-negative bacteria. Recent HMM-based sequence analysis by Anwari et al. showed that BamB and BamE are mainly present in α-, β- and γ-proteobacteria, while BamC is present only in β- and γ-proteobacteria. They also found a new lipoprotein subunit in the BAM complex, named BamF, which is present exclusively in α-proteobacteria.The BAM complex recognizes OMPs as its substrates via binding to an amphipathic C-terminal β-strand of the unfolded β-barrel, but the exact binding mode is still not clear. It was suggested that C-terminal β-strand binds to BamD, once the unfolded OMPs are delivered to the BAM complex by periplasmic chaperones. But a recent BamC and BamD subcomplex crystal structure shows that the unstructured N-terminus of BamC binds to the proposed substrate binding site of BamD. The C-terminal β-strand of an OMP β-barrel domain typically contains an aromatic residue at its C-terminus. It has been reported that deletion or substitution of this C-terminal residue negatively affects the biogenesis of OMPs[10, 11]. Also, in vitro studies showed that the E. coli OM porin PhoE, when lacking its C-terminal Phe residue, fails to open the Omp85/BamA channel. In both studies, overexpression of the mutant OMP was lethal to the cells. At lower concentration, the mutant protein was tolerated and got inserted into the membrane. This leads to the suggestion that a weak insertion signal other than the C-terminal residue or β-strand is present.
Robert et al. observed that the N. meningitidis OM porin PorA or its C-terminal β-strand did not open the E. coli Omp85/BamA channel, and the comparison of the C-terminal β-strands from N. meningitidis and E. coli OMPs showed a high preference of positive amino acids at the penultimate (+2) position in neisserial OMPs. When they mutated E. coli PhoE or its C-terminal β-strand, changing Gln for Lys at the +2 position, it did not open the channel any more; in contrast, a Neisseria PorA peptide with Gln instead of Lys increased the channel activity considerably. These studies and the fact that high concentrations of neisserial OMPs were lethal in E. coli cells, lead to the conclusion that the C-terminal insertion signal is species-specific and that the residues at the +2 position were important for this phenomenon. The number of peptides/proteins used in the comparison in the study was very low, compared to the total number of OMPs present in the E. coli or N. meningitidis genomes; moreover, the phenomenon was only compared between two organisms, one β- and one γ-proteobacterial species. Since neisserial OMPs could be expressed in E. coli at low expression rates, either the neisserial C-terminal insertion signal is weakly recognized by E. coli BAM complex, or other β-strands in the full length protein might act as a weak insertion signal.
Thus, there seems to be at least some overlap in the peptide recognition. The intention of this study was to use computational methods to quantify this overlap, and to find out whether the observed (partial) species specificity of the insertion signal is exhibited by all Gram-negative bacterial organisms.
Results and discussion
Dataset classified based on OMP class
# of β-strands
Total # of peptides
OMP class found in # of organisms in different proteobacteria class
Integral membrane enzymes
Long chain fatty acid transporter
Substrate specific porins
Clustering of organisms based on C-terminal β-strands
The pairwise comparison of the overlap between sequence spaces should help us to predict the similarity between the C-terminal insertion signal peptides, and how high the probability is that the protein of one organism can be recognized by the insertion machinery of another organism. When there is a complete overlap of sequence space between two organisms, we assume that all C-terminal insertion signals from one organism will be recognized and functionally expressed by another organism’s BAM complex and vice-versa. When there is only little overlap between the sequence spaces of two organisms, we assume that only a small number of C-terminal insertion signals from one organism will be recognized by another organism’s BAM complex. When there is no overlap, we assume that there is a general incompatibility.
Control experiments for clustering: randomly shuffled peptide sequences lose the signal for clustering
We noticed that the organisms seen at the periphery of the cluster map had a lower overall number of peptides, while organisms with more peptides are typically seen at the center of the circle. The cluster map in Figure1B is colored based on the number of extracted peptides per organism. In Figure1B, there are 99 organisms which have ≤ 30 peptides (colored in pink), 77 organisms with 31 to 40 peptides (colored in blue), 136 organisms with 41 to 60 peptides (colored in green), 66 organisms with 61 to 80 peptides (colored in red), and 59 organisms with more than 80 peptides (colored in brown). Even though H. pylori strains have a comparably high number of peptides (43 to 51 peptides), they still form a separate cluster in the periphery of the cluster map; therefore there must be an underlying organism-specific signal from the contributing peptides at least in this case.
High preference of positively charged residues at the +2 position in Neisseria species
High preference of Histidine at the +3 position in porins (16-stranded OMPs) from β-proteobacteria
High preference of Tyrosine at the +5 position in Helicobacter species
OMP class-specific and taxonomy class-specific signals
In our study, we were able to reproduce the difference between E. coli and Neisseria C-terminal β-strands as found by Robert et al., which suggests a species-specific insertion signal for OMPs. But in contrast to the earlier report, we show that positively charged amino acids at the +2 position can not be the reason for the experimentally observed species specificity between these organisms, as Escherichia also contains C-terminal β-strands with positively charged amino acids at the +2 position. Moreover, there is experimental evidence which shows the functional expression of a heterologous OMP, YadA of Yersinia enterocolitica, with a positively charged amino acids at the +2 position, in E. coli. The neisserial PorA protein and the neisserial C-terminal β-strands used by Robert et al. contained His at the +3 position, which is common for many OMP.16 proteins from β-proteobacteria and is not found in Escherichia OMPs; this might be the true difference in the recognition of C-terminal β-strands by the Escherichia BAM complex. Furthermore we found that Helicobacter strains form a distinct cluster in the cluster map, which is due to their very different composition of C-terminal β-strands. There is experimental evidence showing that expression of H. pylori OMPs in E. coli is lethal, and that this lethality can be suppressed by removing the C-terminal strand. When we looked at the frequency motifs from Helicobacter strains we did not notice a strong preference of any amino acid at the +2 or the +3 position, however we observed a strong preference of Tyr at the +5 position, which is not common in Escherichia or other Proteobacteria. We assume that this position may play an important role in the rejection of these C-terminal β-strands by the E. coli BAM complex. The examples of Neisseria and Helicobacter show that different positions in the C-terminal recognition motif can be relevant for heterologous expression of OMPs. We predict that in certain group of species the highly preferred residues in certain positions of the C-terminal insertion signals are responsible for the inadequate recognition of the C-terminal insertion signals by the E. coli BAM complex. In the future, mutation studies will have to be performed to prove the importance of these residues in the recognition step in the OMPs biogenesis.
As a result of our study, we have shown that there is a large overlap between the signals from C-terminal insertion peptides of different organisms, which suggests that in most cases, heterologous expression should be possible. OMPs can fold in vitro even without the help of any other proteins. The BAM complex is an enzyme that makes the folding of OMPs into the outer membrane more efficient by increasing the reaction rate of a natural process. Enzymes modify reaction rates by changing the reaction route to lower the activation energy, and binding/recognition is part of this changed route. Thus, it is also important to consider expression rates: poor recognition might still lead to properly folded OMPs in the outer membrane of a heterologous host at low expression rates. But under overexpression conditions, the BAM machinery can probably not cope with poorly recognized signals that would lead to lower overall folding rates (considering that recognition is the first and probably in some cases rate-limiting step of the folding process). Different classes of OMPs have different folding rates, where small OMPs fold faster and more efficiently (again in vitro) than larger ones, which might explain why large OMPs seem to depend more heavily on an intact BAM machinery than small ones[26, 27].
Since there are two different signals that contribute to the observed average motifs, from OMP class and from taxonomy, it is problematic to use averaged motifs or sequence logos to determine the compatibility of a given protein-organism pair. The main problem here is the overrepresentation of certain OMP classes in some organism groups; this overrepresentation shifts the average signals. It is more useful to determine for an individual C-terminal motif form a protein to be expressed, whether it is also present in any of the OMPs of the host organism.
The taxonomy-based specificity we observed here based on sequence space depends upon the entire peptide sequence, but at the functional level, these peptides are recognized based on the interacting residue positions in the C-terminal insertion signal peptide. The PDZ domain of the bacterial periplasmic stress sensor, DegS, also recognizes the C-terminal YxF motif in the last β strand of misfolded OMPs. This leads to the activation of the proteolytic pathway and the expression of DegP, which degrades misfolded OMPs[28, 29]. Since the C-terminal β-strand is recognized by both the PDZ domain of the DegS protein and by the BAM complex, studying the co-evolution of interacting residues in both cases would help in understanding the divergence of the C-terminal β-strands between different Gram-negative bacterial organisms. Unfortunately, co-crystal structures of the BAM complex with its substrates are not available yet. With more experimental evidence about the substrate recognition sites for the C-terminal insertion signal peptide in the BAM complex, the co-evolution of the interacting amino acids can hopefully be studied in the future, which may shed more light on into the evolution of the BAM machinery in different Proteobacteria, and on its ability to recognize heterologous substrates for biotechnology applications.
Predicting outer membrane β-barrel proteins
In a previous study to annotate the subcellular localizations (SCLs) for the proteomes of 607 Gram-negative bacteria, we developed the program/database ClubSub-P, in which we used programs like CELLO, PSORTb and HHomp to annotate OMPs. CELLO and PSORTb use support vector classifiers to annotate different SCLs of query sequences and are much faster than HHomp which uses HMM-HMM-based search algorithms to predict and classify OMPs. Thus we used CELLO and PSORTb to scan all the sequences in the clusters of the ClubSub-P database. A random protein was selected from a cluster where CELLO or PSORTb had a positive hit for an outer membrane protein, and the sequence was analyzed with HHomp. When HHomp predicted a protein with more than 90% probability to be an OMP, we considered all the proteins in the cluster to be OMPs. We in addition selected all singleton sequences with positive prediction from CELLO or PSORTb and analyzed them with HHomp.
Finding the C-terminal β-strands
HHomp annotates/classifies OMPs based on the number of β-stands present in them. HHomp calculates/predicts this from homologous structures of OMPs. We transferred this annotation from the best hit in HHomp runs to the query sequences. HHomp also annotates secondary structure and β-barrel strand predictions using PSIPRED and ProfTMB, which was used to extract the C-terminal (last) β-strand/motif for each OMP. The last β-strand predicted by ProfTMB was extracted as the C-terminal motif from representative sequences and singletons, and further filters were applied to reduce the false positive rate; 1) 70% of the amino acids in the motif should have a β-strand prediction from PSIPRED, 2) If the C-terminal of the protein is more than 4 residues away from the C-terminus of the motif, we extended the predicted motif by up to 4 amino acids to find an aromatic hydrophobic residue [F,Y,W], else we extended the C-terminus of the motif to the end of the protein itself. 3) Additionally, if the motif length was less than 10 residues, we extended the motif towards its N-terminus. 4) Furthermore with the regular expression.
[^C][YFWKLHVITMADGRE][^C][YFWKLHVITMADGRE][^C][YFWKLHVITMADGRE][^C].[^C][YFWHILM] (an updated version of BOMP C-terminal pattern), we searched for the existence of the alternating hydrophobic pattern in the motif which is typical for transmembrane β-strands.
Using the information from this representative C-terminal motif, we extracted C-terminal motifs from the rest of the sequences in the clusters. We used MAFFT to align the sequences from the cluster, and used the start and end coordinates of the C-terminal motif discovered above in the representative sequences randomly selected from the clusters. Motifs were extended on the both sides, in cases where we encountered gaps in the alignment. The gaps were removed and then resulting motifs were subjected to alternating hydrophobic pattern matching.
The peptides we collected vary in length from 10 to 21 residues (only six of the peptides were longer than 21). We then applied GLAM2, a gapped motif discovery algorithm, to find the strongest motif with a length of 10 from this dataset. We found 24,626 motif instances in 25,454 sequences, and only 232 motifs in this alignment had gaps. The gapped motifs were removed before further analysis. 20,135 of the motif instances were C-terminal to the protein itself (which means there were no additional domains at the C-terminal end of the β-barrel proteins). 437 organisms had more than 20 unique C-terminal β-strands, ranging from 21 to 171 peptides in different organisms. In total, the 437 organisms yielded 22,447 peptides, of which 12,949 are unique peptides.
Sequence based clustering
Since all of the peptides are 10 amino acids in length by default, we used the PAM30 substitution matrix for an all-against-all BLAST, with an E-value cut-off of 1000 and used the pairwise P-values to cluster the sequences in CLANS.
PSSM profile-based hierarchical clustering
The relative frequencies of the 20 amino acids were calculated for all 10 positions in the peptides from an organism. To obtain odds scores, the relative frequencies were simply divided by each residue’s background frequency, which was calculated by shuffling the amino acid sequence in all the peptides from all organisms, and log base 2 was applied to obtain a PSSM matrix. The 20 x 10 PSSM matrices obtained for each organism were stored in a single 437x200 PSSM matrix, and correlation distances were calculated between each organism and agglomerative hierarchical clustering (average method) was performed via the pvclust, which calculates two types of p-values, AU (Approximately Unbiased) p-value and BP (Bootstrap Probability) value to indicate the likelihood of the cluster formation.
Peptide sequence space-based clustering
To generate a peptide sequence space, each amino acid in the peptide sequences was represented by five chemical descriptors that are the first five principal components derived from 26 physiochemical descriptor variables using dimensionality reduction techniques. The initial 26 physiochemical descriptor variables include the molecular weight, experimentally determined retention values from seven thin-layer chromatography runs, van der Waals volume of the side chain, three nuclear magnetic resonance shift variables, log P, six variables for semiempirical molecular orbitals, three variables for total, polar and nonpolar surface area, two variables for side chain charge and two variables for hydrogen bond donor and acceptor. The five principal components derived from these 26 variables contain the maximal variations in the data set and they can be interpreted as the size, polarizability, and the lipophilic, steric, and electronic properties of all the amino acids. The amino acid descriptors were originally derived for use as design variables in peptide design, and in the construction of combinatorial libraries to effectively search chemical property space. Here we used them to describe the space occupied by the C-terminal β-strands and to measure how strongly peptide sequences of different organisms overlap. Using the chemical descriptors, each amino acid in the peptide was converted into a 5-dimensional vector; thereby, each 10aa peptide was represented as a 50-dimensional vector. Thus, the whole set of 22,447 peptides were converted to a 22,447 x 50 matrix.
Principal component analysis
Since the dimensionality of the data set (50) is larger than the sample size (minimum 21 peptides per organism), the dimensionality of the peptide vectors had to be reduced below the sample size (i.e., below 21 in our dataset) for further statistical analysis. Principal component analysis (PCA) is a mathematical technique to reduce the dimensionality of data sets, while retaining most of the variation in the data set. This is achieved by projecting the original data vectors along the directions of maximal variation, called principal components (PCs). The first PC captures the maximum variation; the variation associated with consecutive PCs decreases rapidly. Thus, the original data set can be mapped into a lower dimensional space by projecting the original data on those PCs representing most of the variation[36, 37]. We used PCA to reduce the dimensionality of our peptide sequences (22,447 x 50 matrix) by projecting the 50 dimensional chemical descriptor vectors onto the first 12 principal components, which represent 69.05% of the total variation in the data. We thereby obtained a 22,447 x 12 matrix that did not suffer from any problems in sample size.
Multivariate Gaussian fitting and Hellinger distance
Next, we fit a multivariate Gaussian distribution for each individual organism by calculating a 12-dimensional mean vector and covariance matrix, (e.g., for E. coli 536 which has 66 unique peptides, the Gaussian will be fitted based on a 66 x 12 matrix).
Next, the Hellinger distance was used to define a dissimilarity matrix for all pairs of organisms. The dissimilarity matrix was converted to P-values, which were then used as input in CLANS to compute a cluster map showing all organisms. CLANS is a graph-based clustering method that represents sequences as nodes. All nodes are connected by weighted edges where the pairwise similarity between the sequences determines the strength of the weight. In our study, individual organisms were considered as nodes and the weight of the edges connecting the nodes was based on the pairwise Hellinger distance (pairwise overlap of sequence space) between the organisms. Hence stronger connections represent a larger overlap/similarity between the peptide sequence spaces, while organisms with high divergence in their C-terminal motifs are only weakly connected or completely disconnected in the cluster map. Initially the nodes are randomly placed in a 2D space and experience attraction forces according to how strongly they are connected with the other nodes. In an iterative refinement scheme, nodes move towards similar nodes with an attractive force proportional to the similarity between them. A small, overall repulsive force is applied to all pairs of nodes to keep them from collapsing into a single node. Since CLANS uses non-deterministic dynamics, each run performed with the same dataset will result in a similar but not necessarily identical clustering. Thus, multiple clustering runs were performed to check the reproducibility of the final clustering. Because initial tests showed that with the default attraction and repulsion values nodes (organisms) were collapsing, we used very small attraction values (up to 0.1) and high repulsion values (up to 500) to avoid collapse of nodes and to obtain visually better clusters.
The WebLogo online tool was used to create the frequency plots, using custom colors. Only unique peptide sequences were used to generate all the frequency plots. The amino acid percentage plots were created using R version 2.13.1.
We are grateful for helpful discussions with Vikram Alva, Iwan Grin, Jack Leo and other department members; continuing support by the Max Planck Society, and specifically by Andrei Lupas, is gratefully acknowledged.
- Silhavy TJ, Kahne D, Walker S: The bacterial cell envelope. Cold Spring Harb Perspect Biol. 2010, 2: a000414-10.1101/cshperspect.a000414.PubMed CentralView ArticlePubMed
- Knowles TJ, Scott-Tucker A, Overduin M, Henderson IR: Membrane protein architects: the role of the BAM complex in outer membrane protein assembly. Nat Rev Microbiol. 2009, 7: 206-214. 10.1038/nrmicro2069.View ArticlePubMed
- Bos MP, Robert V, Tommassen J: Biogenesis of the gram-negative bacterial outer membrane. Annu Rev Microbiol. 2007, 61: 191-214. 10.1146/annurev.micro.61.080706.093245.View ArticlePubMed
- Kim KH, Aulakh S, Paetzel M: The bacterial outer membrane β-barrel assembly machinery. Protein Sci. 2012, 21: 751-768. 10.1002/pro.2069.PubMed CentralView ArticlePubMed
- Sklar JG, Wu T, Kahne D, Silhavy TJ: Defining the roles of the periplasmic chaperones SurA, Skp, and DegP in Escherichia coli. Genes Dev. 2007, 21: 2473-10.1101/gad.1581007.PubMed CentralView ArticlePubMed
- Hagan CL, Kim S, Kahne D: Reconstitution of outer membrane protein assembly from purified components. Science (New York, NY). 2010, 328: 890-892. 10.1126/science.1188919.View Article
- Anwari K, Webb CT, Poggio S, Perry AJ, Belousoff M, Celik N, Ramm G, Lovering A, Sockett RE, Smit J, Jacobs-Wagner C, Lithgow T: The evolution of new lipoprotein subunits of the bacterial outer membrane BAM complex. Mol Microbiol. 2012, 84: 832-844. 10.1111/j.1365-2958.2012.08059.x.PubMed CentralView ArticlePubMed
- Robert V, Volokhina EB, Senf F, Bos MP, Van Gelder P, Tommassen J: Assembly factor Omp85 recognizes its outer membrane protein substrates by a species-specific C-terminal motif. PLoS Biol. 2006, 4: e377-10.1371/journal.pbio.0040377.PubMed CentralView ArticlePubMed
- Sandoval CM, Baker SL, Jansen K, Metzner SI, Sousa MC: Crystal Structure of BamD: An Essential Component of the β-Barrel Assembly Machinery of Gram-Negative Bacteria. J Mol Biol. 2011, 409: 348-357. 10.1016/j.jmb.2011.03.035.PubMed CentralView ArticlePubMed
- Struyvé M, Moons M, Tommassen J: Carboxy-terminal phenylalanine is essential for the correct assembly of a bacterial outer membrane protein. J Mol Biol. 1991, 218: 141-148. 10.1016/0022-2836(91)90880-F.View ArticlePubMed
- Hendrixson DR, De La Morena ML, Stathopoulos C, St Geme Iii JW: Structural determinants of processing and secretion of the Haemophilus influenzae Hap protein. Mol Microbiol. 1997, 26: 505-518. 10.1046/j.1365-2958.1997.5921965.x.View ArticlePubMed
- Yu NY, Wagner JR, Laird MR, Melli G, Rey S, Lo R, Dao P, Sahinalp SC, Ester M, Foster LJ, Brinkman FSL: PSORTb 3.0: improved protein subcellular localization prediction with refined localization subcategories and predictive capabilities for all prokaryotes. Bioinformatics. 2010, 26: 1608-1615. 10.1093/bioinformatics/btq249.PubMed CentralView ArticlePubMed
- Yu C-S, Chen Y-C, Lu C-H, Hwang J-K: Prediction of protein subcellular localization. Proteins. 2006, 64: 643-651. 10.1002/prot.21018.View ArticlePubMed
- Remmert M, Linke D, Lupas AN, Söding J: HHomp–prediction and classification of outer membrane proteins. Nucleic Acids Res. 2009, 37: W446-W451. 10.1093/nar/gkp325.PubMed CentralView ArticlePubMed
- Koebnik R, Locher KP, Van Gelder P: Structure and function of bacterial outer membrane proteins: barrels in a nutshell. Mol Microbiol. 2000, 37: 239-253. 10.1046/j.1365-2958.2000.01983.x.View ArticlePubMed
- Hritonenko V, Stathopoulos C: Omptin proteins: an expanding family of outer membrane proteases in Gram-negative Enterobacteriaceae (Review). Mol Membr Biol. 2007, 24: 395-406. 10.1080/09687680701443822.View ArticlePubMed
- van den Berg B, Black PN, Clemons WM, Rapoport TA: Crystal Structure of the Long-Chain Fatty Acid Transporter FadL. Science. 2004, 304: 1506-1509. 10.1126/science.1097524.View ArticlePubMed
- Bigelow HR, Petrey DS, Liu J, Przybylski D, Rost B: Predicting transmembrane beta-barrels in proteomes. Nucleic Acids Res. 2004, 32: 2566-2577. 10.1093/nar/gkh580.PubMed CentralView ArticlePubMed
- Jones DT: Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol. 1999, 292: 195-202. 10.1006/jmbi.1999.3091.View ArticlePubMed
- Frickey T, Lupas A: CLANS: a Java application for visualizing protein families based on pairwise similarity. Bioinformatics (Oxford, England). 2004, 20: 3702-3704. 10.1093/bioinformatics/bth444.View Article
- Remmert M, Biegert A, Linke D, Lupas AN, Söding J: Evolution of outer membrane beta-barrels from an ancestral beta beta hairpin. Mol Biol Evol. 2010, 27: 1348-1358. 10.1093/molbev/msq017.View ArticlePubMed
- Lehr U, Schütz M, Oberhettinger P, Ruiz-Perez F, Donald JW, Palmer T, Linke D, Henderson IR, Autenrieth IB: C-terminal amino acid residues of the trimeric autotransporter adhesin YadA of Yersinia enterocolitica are decisive for its recognition and assembly by BamA. Mol Microbiol. 2010, 78: 932-946. 10.1111/j.1365-2958.2010.07377.x.View ArticlePubMed
- Fischer W, Schwan D, Gerland E, Erlenfeld GE, Odenbreit S, Haas R: A plasmid-based vector system for the cloning and expression of Helicobacter pylori genes encoding outer membrane proteins. Mol Gen Genet. 1999, 262: 501-507. 10.1007/s004380051111.View ArticlePubMed
- Hirai Y, Haque M, Yoshida T, Yokota K, Yasuda T, Oguma K: Unique cholesteryl glucosides in Helicobacter pylori: composition and structural analysis. J Bacteriol. 1995, 177: 5327-5333.PubMed CentralPubMed
- Kleinschmidt J: Membrane protein folding on the example of outer membrane protein A ofEscherichia coli. Cell Mol Life Sci. 2003, 60: 1547-1558. 10.1007/s00018-003-3170-0.View ArticlePubMed
- Bos MP, Robert V, Tommassen J: Functioning of outer membrane protein assembly factor Omp85 requires a single POTRA domain. EMBO Rep. 2007, 8: 1149-1154. 10.1038/sj.embor.7401092.PubMed CentralView ArticlePubMed
- Kim S, Malinverni JC, Sliz P, Silhavy TJ, Harrison SC, Kahne D: Structure and function of an essential component of the outer membrane protein assembly machine. Science. 2007, 317: 961-964. 10.1126/science.1143993.View ArticlePubMed
- Walsh NP, Alba BM, Bose B, Gross CA, Sauer RT: OMP Peptide Signals Initiate the Envelope-Stress Response by Activating DegS Protease via Relief of Inhibition Mediated by Its PDZ Domain. Cell. 2003, 113: 61-71. 10.1016/S0092-8674(03)00203-4.View ArticlePubMed
- Meltzer M, Hasenbein S, Mamant N, Merdanovic M, Poepsel S, Hauske P, Kaiser M, Huber R, Krojer T, Clausen T, Ehrmann M: Structure, function and regulation of the conserved serine proteases DegP and DegS of Escherichia coli. Res Microbiol. 2009, 160: 660-666. 10.1016/j.resmic.2009.07.012.View ArticlePubMed
- Paramasivam N, Linke D: ClubSub-P: Cluster-based subcellular localization prediction for Gram-negative bacteria and Archaea. Front Microbiol. 2011, 2: 218-PubMed CentralView ArticlePubMed
- Berven FS, Flikka K, Jensen HB, Eidhammer I: BOMP: a program to predict integral β-barrel outer membrane proteins encoded within genomes of Gram-negative bacteria. Nucleic Acids Res. 2004, 32: W394-W399. 10.1093/nar/gkh351.PubMed CentralView ArticlePubMed
- Katoh K, Toh H: Recent developments in the MAFFT multiple sequence alignment program. Brief Bioinform. 2008, 9: 286-298. 10.1093/bib/bbn013.View ArticlePubMed
- Frith MC, Saunders NFW, Kobe B, Bailey TL: Discovering Sequence Motifs with Arbitrary Insertions and Deletions. PLoS Comput Biol. 2008, 4: e1000071-10.1371/journal.pcbi.1000071.PubMed CentralView ArticlePubMed
- Suzuki R, Shimodaira H: Pvclust: an R package for assessing the uncertainty in hierarchical clustering. Bioinformatics. 2006, 22: 1540-1542. 10.1093/bioinformatics/btl117.View ArticlePubMed
- Sandberg M, Eriksson L, Jonsson J, Sjöström M, Wold S: New chemical descriptors relevant for the design of biologically active peptides. A multivariate characterization of 87 amino acids. J Med Chem. 1998, 41: 2481-2491. 10.1021/jm9700575.View ArticlePubMed
- Ma S, Dai Y: Principal component analysis based methods in bioinformatics studies. Brief Bioinform. 2011, 12: 714-722. 10.1093/bib/bbq090.PubMed CentralView ArticlePubMed
- Ringnér M: What is principal component analysis?. Nat Biotechnol. 2008, 26: 303-304. 10.1038/nbt0308-303.View ArticlePubMed
- Vajda I: Theory of statistical inference and information. 1989, Dodrecht, The Netherlands: Kluwer
- Shutin D, Zlobinskaya O: Application of information-theoretic measures to quantitative analysis of immunofluorescent microscope imaging. Comput Methods Programs Biomed. 2010, 97: 114-129. 10.1016/j.cmpb.2009.05.009.View ArticlePubMed
- Crooks GE, Hon G, Chandonia JM, Brenner SE: WebLogo: a sequence logo generator. Genome Res. 2004, 14: 1188-1190. 10.1101/gr.849004.PubMed CentralView ArticlePubMed
- Ihaka R, Gentleman R: R: A language for data analysis and graphics. J Comput Graph Stat. 1996, 5: 299-314.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.