Skip to main content
  • Research article
  • Open access
  • Published:

Genome-wide subcellular localization of putative outer membrane and extracellular proteins in Leptospira interrogans serovar Lai genome using bioinformatics approaches



In bacterial pathogens, both cell surface-exposed outer membrane proteins and proteins secreted into the extracellular environment play crucial roles in host-pathogen interaction and pathogenesis. Considerable efforts have been made to identify outer membrane (OM) and extracellular (EX) proteins produced by Leptospira interrogans, which may be used as novel targets for the development of infection markers and leptospirosis vaccines.


In this study we used a novel computational framework based on combined prediction methods with deduction concept to identify putative OM and EX proteins encoded by the Leptospira interrogans genome. The framework consists of the following steps: (1) identifying proteins homologous to known proteins in subcellular localization databases derived from the "consensus vote" of computational predictions, (2) incorporating homology based search and structural information to enhance gene annotation and functional identification to infer the specific structural characters and localizations, and (3) developing a specific classifier for cytoplasmic proteins (CP) and cytoplasmic membrane proteins (CM) using Linear discriminant analysis (LDA). We have identified 114 putative EX and 63 putative OM proteins, of which 41% are conserved or hypothetical proteins containing sequence and/or protein folding structures similar to those of known EX and OM proteins.


Overall results derived from the combined computational analysis correlate with the available experimental evidence. This is the most extensive in silico protein subcellular localization identification to date for Leptospira interrogans serovar Lai genome that may be useful in protein annotation, discovery of novel genes and understanding the biology of Leptospira.


Leptospirosis is a globally widespread zoonosis caused by the animal spirochete pathogen Leptospira interrogans [1]. The clinical feature of its severe disease form, known as Weil's syndrome, or acute renal failure, is associated with multiple system complications, including renal failure, meningitis, and pulmonary haemorrhage. Although early treatment for leptospirosis is important for ensuring a favorable clinical outcome, this is often difficult to achieve, as symptoms during the early stages of infection resemble those of several other systematic diseases.

One potential method for controlling the spread of leptospirosis is through the development of vaccines. Candidates for vaccine production include outer membrane (OM) and extracellular (EX) proteins, several of which have been implicated in chemotaxis, adherence and other pathogenic steps. Attempts to identify such proteins have been performed previously by experimental [214] and computational methods [1520]. Complete genome sequences of two serovars, Lai and Copenhageni of L. interrogans have been reported [1517]. Hundreds of putative membrane proteins and lipoproteins were predicted, although in many cases, gene annotation may be incomplete or inaccurate to reliably identify putative vaccine candidates.

Previous studies have tried to identify potential vaccine candidates using experimental methods and in silico predictions. Proteomic analysis of purified outer membrane vesicles (OMVs) of L. interrogans serovar Copenhageni was performed by Nally et al. and revealed 33 intact OM proteins [13]. The study by Gamberini et al. [18] showed 16 predicted surface exposed lipoproteins of L. interrogans serovar Copenhageni via whole genome analysis, only four of which are conserved among 8 pathogenic serovars. Since leptospiral lipoproteins are usually (but not exclusively) surface exposed proteins, and many are vaccine candidates, Setubal et al. [19] focused on lipoprotein prediction using spirochaetal lipoprotein (SpLip) program and identified 146 predicted lipoproteins (but not their localizations) for L. interrogans serovar Lai. The search for new potential vaccine candidates was continued by Yang et al. [20], who used a filtering approach combining in silico analysis, comparative genome hybridization, and microarray methods to identify 226 leptospiral surface exposed proteins. All of the previous studies summarized above focus on identification of vaccine candidates.

However, both computational and experimental have their own drawbacks [21, 22] Computational methods, for instance, depend on the presence of type I signal peptides [23, 24], transmembrane helices [2426], or other particular features specifically found in previously identified membrane proteins, which may not be highly specific or sensitive. Experimental methods, on the other hand, yield results that may be complicated by cross-compartment contamination occurring during the preparation of samples, which can also result in the inclusion of false positive results in data sets [21, 22]. Hence, results obtained from both methods can occasionally lead to conflicting conclusions. We believe that such a focused approach without attempt to accurately identify periplasmic proteins (PP) and cytoplasmic membrane (CM) proteins can lead to erroneous identification of PP and CM as OM or EX by both in silico and experimental approaches. A holistic prediction of all membrane protein localizations will lead to better accuracy in genome annotation of membrane proteins, including vaccine candidates.

In this study we utilized a combination of three computational prediction tools PSORTb [27, 28], Proteome Analyst (PA) [29], and ProtCompB [30] to perform whole genome analysis of protein subcellular localization, and to identify novel putative L. interrogans serovar Lai OM and EX vaccine candidates. We combined the results derived from these three prediction algorithms into a consensus vote, resulting in a more accurate protein subcellular localization prediction. Furthermore, we incorporated homology searching against the DBSubloc database [31] and structural information from the GTD prediction [32] to enhance genome annotation, and to infer OM, EX and PP localized proteins. We also developed a specific classifier based on Linear Discriminant Analysis (LDA) for identification of leptospiral cytoplasmic proteins (CP) and cytoplasmic membrane proteins (CM), using a training set obtained from the consensus vote. We were able to assign subcellular localizations to several previously uncharacterized hypothetical proteins, thus improving L interrogans genome annotation.


We performed the subcellular localization prediction of L. interrogans serovar Lai using the pipeline described in the Material and methods section (shown in Figure 1), following the steps of training set verification, consensus vote, homology and structural prediction, and finally LDA-based classification.

Figure 1
figure 1

Flow chart of the method used for subcellular localizations of Leptospira interrogans serovar Lai genome. Protein sequences of Leptospira interrogans serovar Lai genome (4,727 ORFs) were analyzed for subcellular localization using PSORTb, ProtCompB, and Proteome analyst (PA) prediction. (a) The consensus vote was obtained from the majority vote type procedure to obtain the result with high prediction accuracy. If all 3 methods agree for localization it was assigned as a consensus vote. The remaining (1 or 2 out of 3 predicted result) was assigned as non-consensus vote. The consensus vote of CP and CM was used as a training set for the development of an LDA-based classifier for CP and CM in the next step. (b) The non-consensus vote results of OM, PP, and EX were further analyzed for sequence and structure homology by DBsubloc and GTD prediction. The non-consensus vote of EX, OM, and PP with significant homology or/and structure information were identified by DBsubloc and GTD prediction. (c) Non-consensus votes of CP, CM and the non predicted data from DBsubloc and GTD predictions were further analyzed for subcellular localization using LDA-based classifier for CP and CM. Significantly predicted results were proteins classified with more than 0.90 probability for CP and CM proteins. The remaining queries that could not be identified in this step were classified as "unknown" results.

Training set verification: Localization predictions of a set of experimentally verified proteins with known localization

To evaluate the robustness and versatility of our protein localization procedure, we used a set of well- characterized Gram-negative bacterial proteins with experimentally verified localizations taken from the work by Gardy and Brinkman [22] as a test set. The data set comprising 299 proteins was first analyzed by using PSORTb, PA, and ProtCompB. We found that, individually, PSORTb, PA, and ProtCompB assigned 73%, 71% and 79% of the verified protein localizations respectively (recall rate in Table 1). The overall precision rates were 97%, 95 and 83%, respectively. As expected, the overall recall rate was highest for ProtCompB, while its precision rate was also the lowest. The recall rate based on "consensus vote" (see materials and methods) results derived from all three methods was 48% without any false positives. Relaxing the criteria by considering predicted results of any two methods or the "majority vote" resulted in an overall recall rate of 77% with a single false positive.

Table 1 Localization predictions of a set of 299 experimentally verified proteins with known localization

Since the number of outputs for EX and OM proteins agreed by all three predictions was low (low recall rate), we used structure-based homology information from GTD and/or homology search results from DBSubloc prediction as the additional information for inferring protein localization. Using this information, we assessed the likelihood of the "non-consensus vote" outputs (see material and methods) for being EX or OM proteins. When the information from DBSubloc and GTD predictions were also used, the overall recall rates for the EX, OM and PP increased to 67%, 89% and 86% respectively as shown in Table 1. The method resulted in 96% precision. This performance was much better than any of the three individual methods, or any of the above combinations. Therefore, we have shown that the combination of prediction tools, DBSubloc homology search and GTD structural-based prediction markedly improved the accuracy and recall for EX, OM and PP protein localization prediction. Therefore, our prediction pipeline is applicable for subcellular localization prediction of hypothetical, or unknown proteins.

Subcellular localization predictions of L. interrogans: Step 1 Consensus votes

After demonstration of the accuracy of our pipeline prediction with the training set, the whole predicted proteome of L. interrogans serovar Lai was analyzed using three computational predictions for protein subcellular localization: PSORTb, ProtCompB, and Proteome analyst (PA). The results obtained from each prediction program are shown in Table 2. ProtCompB assigned subcellular localizations to all protein queries whereas approximately 50% of protein queries were assigned as unknown localization by PSORTb and PA.

Table 2 Predicted protein subcellular localizations of L. interrogans by PSORb, PA, ProtCompB and consensus vote predictions.

After inspection of the prediction results derived from the three prediction algorithms, it was found that 797 out of 4,727 ORFs of L. interrogans serovar Lai genome had the following consensus vote predicted localizations: 418 cytoplasmic proteins (CP), 332 cytoplasmic membrane proteins (CM), 17 periplasmic proteins (PP), 15 outer membrane proteins (OM), and 15 extracellular/secreted proteins (EX) (Table 2, 3, 4 Additional file 1, 2, 3). The biological functions of most of the localized proteins are already annotated. Only about 9% (68 of 797 ORFs) were proteins annotated as conserved hypothetical or unknown proteins. This shows that the consensus vote approach has a high accuracy of subcellular localization prediction for L. interrogans. However, this recall of these methods is unacceptably low, since the localization of the majority of proteins remains unknown (3930 out of 4727 proteins).

Table 3 Putative extracellular proteins (EX) predicted by the consensus vote
Table 4 Putative outer membrane proteins (OM) predicted by the consensus vote

When comparing the concordance or prediction agreement rates between the three prediction methods (excluding proteins with unknown localization by one or two programs), the rates for PSORTb and PA, PSORTb and ProtCompB, and PA and ProtCompB were 70.3%, 80%, and 59.5%, respectively. PSORTb was found to have a strong propensity to assign protein queries to CP and OM proteins, while PA was found to assign preferentially to CM, PP and EX proteins (p < 0.001, chi-square tests).

Step 2: Homology-based and protein folding recognition predictions for non-consensus vote localizations

The non-consensus vote OM, EX, and PP proteins were further analyzed for localizations using DBsubloc, and GTD. As presented in Table 5, 6, 99 more proteins (43 out of 83 proteins predicted by two previous methods and 56 out of 617 proteins predicted by one previous method) were additionally identified as putative EX, while 48 proteins (23 out of 59 proteins predicted by two methods, and 25 from 980 proteins predicted by one method) were additionally identified as putative OM proteins as shown in Table 7, 8. Moreover, 58 proteins (20 out of 20 proteins predicted by two methods and 38 out of 504 proteins predicted by one method) were additionally predicted as PP proteins (Additional file 1). It is of interest that several protein loci currently annotated as hypothetical proteins without localization information were predicted in EX, OM and PP compartments by the combination method (Tables 3, 4, 5, 6, 7, 8, 9 and Additional file 1). The homology search and structural information from DBSubloc and GTD thus allowed further identification of EX, OM, and PP from the non-consensus vote set, however, 3725 protein localizations remain unknown.

Table 5 43 Putative extracellular proteins (EX) derived from the 2 out of 3 predictions with significant DBSubloc or/and GTD prediction
Table 6 56 Putative extracellular proteins (EX) derived from the 1 out of 3 predictions with significant DBSubloc or/and GTD prediction
Table 7 23 Putative outer membrane proteins (OM) derived the 2 out of 3 predictions with significant DBSubloc or/and GTD prediction
Table 8 25 Putative outer membrane proteins (OM) derived from the 1 out of 3 predictions with significant DBSubloc and/or GTD prediction
Table 9 Protein subcellular localizations of L. interrogans predicted by PSORTb, PA, ProtCompB and the combination prediction

Step 3: Cytoplasmic (CP) and cytoplasmic membrane proteins (CM) identified by Linear Discriminant Analysis (LDA)

The remaining 3725 proteins with unknown localization after step 2 were further analyzed using an LDA-based classifier we developed to identify CP and CM proteins using the set of CP and CM consensus outputs (418 CP proteins and 332 CM proteins) predicted by all of the three prediction programs (Additional file 2, 3) as a training set (see Materials and Methods). 2272 CP and 481 CM proteins were additionally identified from the 3725 "unknown set" by this approach (Additional file 4, 5). We also found that 66% (1501 out of 2272) of the LDA based predicted CP and 54% (260 out of 481) of the LDA based predicted CM are hypothetical or unknown proteins. In other words, overall 56.3 % (1516 out of 2690) of hypothetical and/or unknown proteins in the whole genome were assigned as CP and 38 % as CM or helix transmembrane proteins.

After the final step in the prediction method, we are able to confidently predict the localization of 3755 (79.4%) Leptospiral proteins. Our combination method thus has a considerably improved recall over the PSORTB and PA methods, approaching that of ProtCompB (Table 1). To test the final prediction accuracy with estimated % agreement and % coverage of our combination method, we then performed the localization prediction of 28 experimentally verified proteins from several studies of Leptospiral outer membrane and extracellular, or cell surface proteins.

Protein subcellular localization prediction on the experimentally verified leptospiral outer membrane and extracellular proteins

As shown in the Additional file 6, the three prediction programs PSORTb, PA and ProtCompB gave markedly different predictions from one another for 28 experimentally OM and EX. Each of the three prediction programs had weaknesses, either poor agreement (ProtCompB) or low coverage (PSORTb and PA). Our combination approach was much better in the respect and showed good agreement and coverage.


Computational prediction for protein subcellular localization is a key step for genome annotation and development of drug and vaccine target. In this study, we used a combination method to putatively assign CP, CM, PP, OM, and EX proteins. We combined the results from three different algorithms namely PSORTb, PA and ProtCompB into a consensus vote to obtain higher prediction accuracy. The combination approach has previously been used to significantly reduce, or exclude false positive predictions for membrane topology prediction [33], and outer membrane prediction [34]. In our case, the accuracy of consensus vote is very high, since well characterized OM and EX proteins were predicted including lactonizing lipase [35], microbial collagenase [36], O-sialoglycoprotein endopeptidase [37], Rhs family protein [38], CsgA or C factor [39], thermolysin [40], leucine rich repeat proteins (LRR) [4143], Ton-B dependent outer membrane receptor proteins, OmpA, porin, heavy metal efflux pump, TolC, and general secretory pathway protein D (Table 4).

On the other hand, the recall, or sensitivity of consensus vote prediction is low, especially for EX and OM. The recall for consensus vote is low, because PSORTb and PA programs are known to have limitations for some proteins. PSORTb requires a training set from a limited number of experimentally-determined proteins, while PA has a disadvantage in that query proteins have to share similarity to known proteins in the Swiss-Prot database [44]. Among high-throughput computational predictions for protein subcellular localization, PSORTb has been reported as the prediction tool that achieves the highest overall accuracy, followed closely by PA [22].

To overcome the limitations in PSORTb, PA and ProtCompB, the predictions for proteins predicted by only one or two out of the three prediction methods (the non consensus vote) were refined by homology-based search using the DBSubloc database and structural annotation in GTD. This allowed us to identify protein localizations with greater confidence. The advantage of GTD is that protein folding recognition or threading methods can determine pairs of proteins that have no obvious similarities in sequence, but have similar folds. It was previously suggested this approach should be carried out to increase prediction sensitivity for specific protein localization [22, 45, 46]. To our knowledge, this study is the first to employ GTD information to infer leptospiral protein localizations.

Structure-based information from GTD prediction revealed that the majority of the 99 EX predictions were proteins that may be secreted by the type III or the type V (autotransport) system. These proteins are shown in Table 5, 6 with their corresponding PDB code. Many of the putative EX proteins that are annotated as leucine rich repeat (LRR) containing proteins share sequence similarity to PopC protein (Q9RBS2), which is secreted through the hrp-secretion apparatus or the type III secretion pathway of Ralstonia solanacearum [41]. Structurally related well-characterized extracellular LRR proteins in other species include YopM (PDB code 1jl5), a Yersinia pestis cytotoxin [43], internalin B [47], a virulence factor of Listeria monocytogenase (PDB code 1d0b) and polygalacturonase inhibiting protein (PDB code 1ogq), a secreted protein involved in plant defense [48].

It is of interest to note that several L. interrogans proteins are contained within the LRR and TPR (Tetratricopeptide repeat) protein families, but predicted sub-cellular localization is not necessarily conserved among all members within each family (Table 3, 5, 6, 7, 8, 9 and Table in additional file 4). The majority of LRR proteins were predicted to be EX localized, while TPR proteins were predicted in all compartments except PP. This finding is consistent with the multiple functions of TPR homologues from more distantly related species in different sub-cellular milieux, including signal transduction, chaperone activity, cell-cycle, transcription, and protein transport [49, 50].

Out of 48 non-consensus vote of predicted OM, 24 were proteins annotated as outer membrane or putative outer membrane proteins, while of the remainder were proteins annotated as conserved hypothetical proteins. The structural information derived from the GTD prediction of the conserved or hypothetical proteins that were predicted as putative OM were the same as that of the annotated outer membrane proteins. As shown in Table 7, 8, it can be observed that 24 hypothetical proteins can now be annotated as putative OM.

Although it is clear that the consensus vote combined with DB and GTD prediction can give robust prediction for EX, OM and PP, there are many proteins with either CP or CM localization remaining. Using our combination approach, we found that about 17% of genes encode putative CM proteins in L. interrogans serovar Lai genome, which is of similar proportion to the 20% – 30% CM proteins in other bacterial species [25, 51]. From our subcellular location prediction we identified 63 OM and 114 EX proteins as potential vaccine candidates. On the other hand, it is possible to exclude 813 CM and 75 PP predicted proteins as vaccine candidates, on the basis of their localization.

We compared our predictions with the previously published works. We found that 10 of 16 membrane proteins predicted by Gamberini et al. 2006, including four also demonstrated to be immunogenic among 8 pathogenic serovars in that study, were also predicted by our method as membrane proteins (2 EX, 1OM, 1PP and 6 CM) [18]. We examined the localizations of the 145 putative lipoproteins reported by Setubal et al. [19], and found 29 EX, 2 OM, 7 PP and 26 CM proteins among 125 probable lipoproteins, and 1 PP and 3 CM among 21 possible lipoproteins. The localizations of 63 putative lipoproteins could not be identified, which included proteins containing signal peptidase II recognition sites and proteins lacking sequence and/or structural homology to known membrane proteins (see Additional file 7). Spirochaetal lipoproteins are found in four subcellular compartments: the periplasmic leaflet of the cytoplasmic membrane, the periplasmic outer leaflet of the outer membrane, or beyond the outer membrane into the environment as extracellular proteins [52]. Therefore, 15 of the 145 putative lipoproteins identified as CP by our method are unlikely to be lipoproteins because of their localization. These false positive lipoproteins include UDP-glucose 6-dehydrogenase, cell-division protein, regulator of chromosome condensation RCC1 family, and 3-oxoacyl- [acyl-carrier protein] reductase. The frequency of falsely-identified lipoproteins just exceeds the reported 1% false positive rate for the SpLip program [52]. Our results can be considered as complementary to those reported by Setubal et al. [52], and increase the accuracy of lipoprotein prediction.

We also compared our predictions with the 226 leptospiral surface exposed protein predictions (extracellular, outer membrane, periplasmic, inner (cytoplasmic) membrane by their localization definition) reported by Yang et al. [20] and found a concordance of 38.5 % (87/226) (see Additional file 8). We think the discrepancies arise from false assignments generated by the prediction algorithms used, which can be identified by comparison with proteins for which there are reliable experimental data of localization (see Additional file 6) [214, 5357]. Our predictions have a higher coverage and agreement with the experimentally tested L. interrogans protein set than the study by Yang et al. [20], suggesting that our prediction method may be of greater overall utility for genome annotation of membrane proteins. After manual inspection of predicted localizations, we found further examples of possible false assignments. The greatest discrepancy was found for 42 proteins were identified as CM by our method, but OM by Yang et al. Some proteins among this group have homologues in other species for which there is experimental evidence of CM location, including methyl-accepting chemotaxis protein mcpB [58], aerotaxis sensor receptor [59], and penicillin-binding protein [60].

It was found that several loci without localization annotation were assigned by the combination prediction method. Therefore, we propose that the annotations with respect to subcellular localization for these loci can be tentatively revised. Among this group of proteins, we noted additional similarities to known protein families. One prominent group with the the SBBP domain (seven beta blade propeller proteins, Pfam PF06739) contain 9 hypothetical proteins: LA0283 (LIC10239), LA0423 (LIC10371), LA0426 (LIC10373), LA1567 (LIC12209), LA1568(12209), LA1569 (LIC12208), LA1691 (LIC12099), LA3276 (LIC10868), LA3834 (LIC13066). Three loci annotated as hypothetical proteins or lipoproteins, namely LA0996 (LIC12668), LA0962 (LIC12690), and LIC13296 (LA4135), were predicted as EX localized (shown in Table 5, 6), and may belong to the Len (leptospiral endostatin-like lipoproteins) family, based on conservation of DUF1554 domain (pfam PF07588) and structural similarity to mammalian endostatin-like protein (PDB 1koe). These proteins act as adhesion proteins and bind to host extracellular matrix (ECM) [53, 57] or human factor H [56]. (Table 5, 6 and Table in the Additional file 6). Furthermore, three loci LIC11207 (LA2823), LIC10821 (LA3340) and LIC10774 (LA3394) and LIC10365 (LA0416), previously described to have similarity with the leptospiral effector protein [54] were identified as putative EX proteins in agreement with their proposed immunomodulator function.

Our combination prediction method has high agreement and coverage of experimentally verified OM and EX proteins (see Additional file 6). On the other hand, experimental localization studies are limited by insufficient sensitivity to detect low abundance proteins and cross contamination of cellular compartments during sample purification, as discussed previously by Rey et al. [21]. It is of note that several predicted PP proteins in this work e.g. FlaB1 periplasmic flagellin (LA2017/LIC11890) have previously been identified as possible PP contaminants in experimental studies of OMV proteins [13, 20]; hence our prediction method may help in correct interpretation of future experimental verification studies, thus leading to better predictions in uncharacterized genomes. However, it should be emphasized that no automatic prediction can be accurate without experimental verification.


In this study, we have demonstrated that the specificity and sensitivity of protein subcellular localization prediction can be improved by incorporation of multiple predictive methods and structural information. By this approach, localizations can be assigned to previously hypothetical L. interrogans proteins. We think this approach is applicable for subcellular localization predictions in other prokaryote proteomes, with the caveat that some predictions are robust than others, i.e. CP and CM better than OM, EX or PP.

Materials and Methods

Data sets

Amino acid sequence queries were 4,727 proteins of Leptospria interrograns serovar Lai genome (chromosome I: NC_004342, chromosome II: NC_004343) [15] and 3,728 protein ORFs of Leptospira interrogans serovar Copenhageni strain (Fiocuz L1-I30) [accession number AEO16823 (chromosome I) and AEO16824 (chromosome II) [17] obtained from GenBank. Two datasets of proteins with known subcellular localization were used. One was an experimentally confirmed data set containing 278 CP and 309 CM of Gram-negative bacteria described by Gardy et al. 2003 [28] and used for validation of the LDA based classifier's performance. Another one was a 299 protein-data set containing 145 CP, 69 CM proteins, 29 PP, 38 OM and 18 EX which was the testing data previously used to evaluate various protein localization predictions in Gardy and Brinkman [22].

Computational Data sets mputational prediction tools for in silico protein localization

Several publicly available programs were used in combination of predictions. Protein subcellular localization for Gram-negative bacteria was carried out using PSORTb [27, 28], Proteome analysis (PA) [29], and ProtCompB [30]. Feature based predictions for signal peptide sequence and α helix transmembrane proteins were identified using SignalP [23] and TMHMM [24, 25] respectively.

Homology based searching and structural annotation

Homology search for subcellular localization information was carried out using BLAST search against DBSubloc, a localization specific protein database [31]. A protein folding recognition method for structural information used to predict the fold of protein sequence with distant homology to known structure was performed using homology search against GTD (the Genomic Threading Database) [32].

Prediction strategy (as shown in Figure 1)

Step 1. Consensus votes prediction

We reasoned that more accurate protein subcellular localization predictions can be gained from the consensus of methods. All leptospiral protein queries were analyzed using three subcellular localization prediction tools for Gram-negative bacteria, namely PSORTb, Proteome analysis (PA), and ProtCompB for cytoplasm (CP), cytoplasmic membrane (CM), periplasmic (PP), outer membrane (OM) and extracellular proteins (EX). Note that in this version ProtCompB prediction, CM and OM are not distinguished so both proteins are predicted as membrane proteins. The consensus prediction for each sequence was calculated using a simple majority vote type procedure. If all 3 methods agree for localization, it is assigned as a "consensus vote". The remaining results (1 or 2 out of 3 predicted) were assigned as "non-consensus vote". The CP and CM proteins assigned in this step were used as a training set for the development of LDA based classifier for CP and CM in a the next step.

Step 2. Homology-based and protein folding recognition prediction

Homology based and structural information can also be used to infer the potential localization site of query proteins [22, 45, 46]. Therefore, the remaining query proteins assigned as non-consensus vote results of PP, OM and EX were further analyzed for sequence and structure homology. Since subcellular localization is an evolutionarily conserved trait, if a protein query is homologous to a known protein with the same localization, the localization was assigned. The protein query sequences were compared to proteins in DBSubloc database at E-value ≤ 10-3 using BLAST search. Structure annotation of these queries was also performed using GTD prediction. The query proteins sequences were assigned to structures (shown as PDB code) with the high level of probability prediction (certain and high) for these protein queries. In this study, the confidence range based on p-value of measuring the reliability of the structure annotation as certain (0 ≤ p < 0.01%) and high (0.01% ≤ p < 0.1%) were considered as a statistically significant structure annotation.

Step 3. Identification of putative CP and CM using the LDA based classifier

A number of putative CP and CM identified as non-consensus vote results was further analyzed by SignalP and TMHMM. The feature attributors derived from SignalP and TMHMM predictions were then integrated and analyzed using the LDA based classifier. Proteins classified with probabilities ≥ 0.9 to be CP or CM proteins were taken as significant. The remaining queries that could not be identified in this step were classified as "unknown" results.

LDA based Classifier for CP and CM

We developed a specific classifier using the training set driven from the consensus vote prediction of leptospiral CP and CM proteins to increase the accuracy of prediction. In the classification-based prediction, our classifier was built on an LDA algorithm analyzing the value of multiple character vectors of SignalP-NN, SignalP-HMM and TMHMM prediction results of the set of training sequences. The accuracy of the LDA based classifier was investigated using leave-one out cross validation. We used experimentally determined or known CP and CM proteins of Gram-negative bacteria previously performed in the evaluation of PSORTb as a test dataset for validation of the LDA based classifier's performance [27]. Overall, the accuracy of LDA based classifier achieved 94.96%.


  1. Bharti AR, Nally JE, Ricaldi JN, Matthias MA, Diaz MM, Lovett MA, Levett PN, Gilman RH, Willig MR, Gotuzzo E, Vinetz JM: Leptospirosis: a zoonotic disease of global importance. Lancet Infect Dis. 2003, 3 (12): 757-771. 10.1016/S1473-3099(03)00830-2.

    Article  PubMed  Google Scholar 

  2. Haake DA, Champion CI, Martinich C, Shang ES, Blanco DR, Miller JN, Lovett MA: Molecular cloning and sequence analysis of the gene encoding OmpL1, a transmembrane outer membrane protein of pathogenic Leptospira spp. J Bacteriol. 1993, 175 (13): 4225-4234.

    PubMed  PubMed Central  Google Scholar 

  3. Shang ES, Summers TA, Haake DA: Molecular cloning and sequence analysis of the gene encoding LipL41, a surface-exposed lipoprotein of pathogenic Leptospira species. Infect Immun. 1996, 64 (6): 2322-2330.

    PubMed  PubMed Central  Google Scholar 

  4. Haake DA, Martinich C, Summers TA, Shang ES, Pruetz JD, McCoy AM, Mazel MK, Bolin CA: Characterization of leptospiral outer membrane lipoprotein LipL36: downregulation associated with late-log-phase growth and mammalian infection. Infect Immun. 1998, 66 (4): 1579-1587.

    PubMed  PubMed Central  Google Scholar 

  5. Haake DA, Chao G, Zuerner RL, Barnett JK, Barnett D, Mazel M, Matsunaga J, Levett PN, Bolin CA: The leptospiral major outer membrane protein LipL32 is a lipoprotein expressed during mammalian infection. Infect Immun. 2000, 68 (4): 2276-2285. 10.1128/IAI.68.4.2276-2285.2000.

    Article  PubMed  PubMed Central  Google Scholar 

  6. Lee SH, Kim KA, Park YG, Seong IW, Kim MJ, Lee YJ: Identification and partial characterization of a novel hemolysin from Leptospira interrogans serovar lai. Gene. 2000, 254 (1-2): 19-28. 10.1016/S0378-1119(00)00293-6.

    Article  PubMed  Google Scholar 

  7. Cullen PA, Cordwell SJ, Bulach DM, Haake DA, Adler B: Global analysis of outer membrane proteins from Leptospira interrogans serovar Lai. Infect Immun. 2002, 70 (5): 2311-2318. 10.1128/IAI.70.5.2311-2318.2002.

    Article  PubMed  PubMed Central  Google Scholar 

  8. Haake DA, Matsunaga J: Characterization of the leptospiral outer membrane and description of three novel leptospiral membrane proteins. Infect Immun. 2002, 70 (9): 4936-4945. 10.1128/IAI.70.9.4936-4945.2002.

    Article  PubMed  PubMed Central  Google Scholar 

  9. Cullen PA, Haake DA, Bulach DM, Zuerner RL, Adler B: LipL21 is a novel surface-exposed lipoprotein of pathogenic Leptospira species. Infect Immun. 2003, 71 (5): 2414-2421. 10.1128/IAI.71.5.2414-2421.2003.

    Article  PubMed  PubMed Central  Google Scholar 

  10. Koizumi N, Watanabe H: Molecular cloning and characterization of a novel leptospiral lipoprotein with OmpA domain. FEMS Microbiol Lett. 2003, 226 (2): 215-219. 10.1016/S0378-1097(03)00619-0.

    Article  PubMed  Google Scholar 

  11. Matsunaga J, Barocchi MA, Croda J, Young TA, Sanchez Y, Siqueira I, Bolin CA, Reis MG, Riley LW, Haake DA, Ko AI: Pathogenic Leptospira species express surface-exposed proteins belonging to the bacterial immunoglobulin superfamily. Mol Microbiol. 2003, 49 (4): 929-945. 10.1046/j.1365-2958.2003.03619.x.

    Article  PubMed  PubMed Central  Google Scholar 

  12. Zhang YX, Geng Y, Bi B, He JY, Wu CF, Guo XK, Zhao GP: Identification and classification of all potential hemolysin encoding genes and their products from Leptospira interrogans serogroup Icterohae-morrhagiae serovar Lai. Acta Pharmacol Sin. 2005, 26 (4): 453-461. 10.1111/j.1745-7254.2005.00075.x.

    Article  PubMed  Google Scholar 

  13. Nally JE, Whitelegge JP, Aguilera R, Pereira MM, Blanco DR, Lovett MA: Purification and proteomic analysis of outer membrane vesicles from a clinical isolate of Leptospira interrogans serovar Copenhageni. Proteomics. 2005, 5 (1): 144-152. 10.1002/pmic.200400880.

    Article  PubMed  Google Scholar 

  14. Asuthkar S, Velineni S, Stadlmann J, Altmann F, Sritharan M: Expression and characterization of an iron-regulated hemin-binding protein, HbpA, from Leptospira interrogans serovar Lai. Infect Immun. 2007, 75 (9): 4582-4591. 10.1128/IAI.00324-07.

    Article  PubMed  PubMed Central  Google Scholar 

  15. Ren SX, Fu G, Jiang XG, Zeng R, Miao YG, Xu H, Zhang YX, Xiong H, Lu G, Lu LF, Jiang HQ, Jia J, Tu YF, Jiang JX, Gu WY, Zhang YQ, Cai Z, Sheng HH, Yin HF, Zhang Y, Zhu GF, Wan M, Huang HL, Qian Z, Wang SY, Ma W, Yao ZJ, Shen Y, Qiang BQ, Xia QC, Guo XK, Danchin A, Saint Girons I, Somerville RL, Wen YM, Shi MH, Chen Z, Xu JG, Zhao GP: Unique physiological and pathogenic features of Leptospira interrogans revealed by whole-genome sequencing. Nature. 2003, 422 (6934): 888-893. 10.1038/nature01597.

    Article  PubMed  Google Scholar 

  16. Nascimento AL, Ko AI, Martins EA, Monteiro-Vitorello CB, Ho PL, Haake DA, Verjovski-Almeida S, Hartskeerl RA, Marques MV, Oliveira MC, Menck CF, Leite LC, Carrer H, Coutinho LL, Degrave WM, Dellagostin OA, El-Dorry H, Ferro ES, Ferro MI, Furlan LR, Gamberini M, Giglioti EA, Goes-Neto A, Goldman GH, Goldman MH, Harakava R, Jeronimo SM, Junqueira-de-Azevedo IL, Kimura ET, Kuramae EE, Lemos EG, Lemos MV, Marino CL, Nunes LR, de Oliveira RC, Pereira GG, Reis MS, Schriefer A, Siqueira WJ, Sommer P, Tsai SM, Simpson AJ, Ferro JA, Camargo LE, Kitajima JP, Setubal JC, Van Sluys MA: Comparative genomics of two Leptospira interrogans serovars reveals novel insights into physiology and pathogenesis. J Bacteriol. 2004, 186 (7): 2164-2172. 10.1128/JB.186.7.2164-2172.2004.

    Article  PubMed  PubMed Central  Google Scholar 

  17. Nascimento AL, Verjovski-Almeida S, Van Sluys MA, Monteiro-Vitorello CB, Camargo LE, Digiampietri LA, Harstkeerl RA, Ho PL, Marques MV, Oliveira MC, Setubal JC, Haake DA, Martins EA: Genome features of Leptospira interrogans serovar Copenhageni. Braz J Med Biol Res. 2004, 37 (4): 459-477. 10.1590/S0100-879X2004000400003.

    Article  PubMed  PubMed Central  Google Scholar 

  18. Gamberini M, Gomez RM, Atzingen MV, Martins EA, Vasconcellos SA, Romero EC, Leite LC, Ho PL, Nascimento AL: Whole-genome analysis of Leptospira interrogans to identify potential vaccine candidates against leptospirosis. FEMS Microbiol Lett. 2005, 244 (2): 305-313. 10.1016/j.femsle.2005.02.004.

    Article  PubMed  Google Scholar 

  19. Setubal JC, Reis M, Matsunaga J, Haake DA: Lipoprotein computational prediction in spirochaetal genomes. Microbiology. 2006, 152 (Pt 1): 113-121. 10.1099/mic.0.28317-0.

    Article  PubMed  PubMed Central  Google Scholar 

  20. Yang HL, Zhu YZ, Qin JH, He P, Jiang XC, Zhao GP, Guo XK: In silico and microarray-based genomic approaches to identifying potential vaccine candidates against Leptospira interrogans. BMC Genomics. 2006, 7: 293-10.1186/1471-2164-7-293.

    Article  PubMed  PubMed Central  Google Scholar 

  21. Rey S, Gardy JL, Brinkman FS: Assessing the precision of high-throughput computational and laboratory approaches for the genome-wide identification of protein subcellular localization in bacteria. BMC Genomics. 2005, 6: 162-10.1186/1471-2164-6-162.

    Article  PubMed  PubMed Central  Google Scholar 

  22. Gardy JL, Brinkman FS: Methods for predicting bacterial protein subcellular localization. Nat Rev Microbiol. 2006, 4 (10): 741-751. 10.1038/nrmicro1494.

    Article  PubMed  Google Scholar 

  23. Bendtsen JD, Nielsen H, von Heijne G, Brunak S: Improved prediction of signal peptides: SignalP 3.0. J Mol Biol. 2004, 340 (4): 783-795. 10.1016/j.jmb.2004.05.028.

    Article  PubMed  Google Scholar 

  24. Emanuelsson O, Brunak S, von Heijne G, Nielsen H: Locating proteins in the cell using TargetP, SignalP and related tools. Nat Protoc. 2007, 2 (4): 953-971. 10.1038/nprot.2007.131.

    Article  PubMed  Google Scholar 

  25. Krogh A, Larsson B, von Heijne G, Sonnhammer EL: Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol. 2001, 305 (3): 567-580. 10.1006/jmbi.2000.4315.

    Article  PubMed  Google Scholar 

  26. Kall L, Krogh A, Sonnhammer EL: A combined transmembrane topology and signal peptide prediction method. J Mol Biol. 2004, 338 (5): 1027-1036. 10.1016/j.jmb.2004.03.016.

    Article  PubMed  Google Scholar 

  27. Gardy JL, Spencer C, Wang K, Ester M, Tusnady GE, Simon I, Hua S, deFays K, Lambert C, Nakai K, Brinkman FS: PSORT-B: Improving protein subcellular localization prediction for Gram-negative bacteria. Nucleic Acids Res. 2003, 31 (13): 3613-3617. 10.1093/nar/gkg602.

    Article  PubMed  PubMed Central  Google Scholar 

  28. Gardy JL, Laird MR, Chen F, Rey S, Walsh CJ, Ester M, Brinkman FS: PSORTb v.2.0: expanded prediction of bacterial protein subcellular localization and insights gained from comparative proteome analysis. Bioinformatics. 2005, 21 (5): 617-623. 10.1093/bioinformatics/bti057.

    Article  PubMed  Google Scholar 

  29. Lu Z, Szafron D, Greiner R, Lu P, Wishart DS, Poulin B, Anvik J, Macdonell C, Eisner R: Predicting subcellular localization of proteins using machine-learned classifiers. Bioinformatics. 2004, 20 (4): 547-556. 10.1093/bioinformatics/btg447.

    Article  PubMed  Google Scholar 

  30. ProtCompB - Prediction sub-cellular protein localization. []

  31. Guo T, Hua S, Ji X, Sun Z: DBSubLoc: database of protein subcellular localization. Nucleic Acids Res. 2004, 32 (Database issue): D122-4. 10.1093/nar/gkh109.

    Article  PubMed  PubMed Central  Google Scholar 

  32. McGuffin LJ, Street SA, Bryson K, Sorensen SA, Jones DT: The Genomic Threading Database: a comprehensive resource for structural annotations of the genomes from key organisms. Nucleic Acids Res. 2004, 32 (Database issue): D196-9. 10.1093/nar/gkh043.

    Article  PubMed  PubMed Central  Google Scholar 

  33. Moller S, Croning MD, Apweiler R: Evaluation of methods for the prediction of membrane spanning regions. Bioinformatics. 2001, 17 (7): 646-653. 10.1093/bioinformatics/17.7.646.

    Article  PubMed  Google Scholar 

  34. Bagos PG, Liakopoulos TD, Hamodrakas SJ: Evaluation of methods for predicting the topology of beta-barrel outer membrane proteins and a consensus prediction method. BMC Bioinformatics. 2005, 6: 7-10.1186/1471-2105-6-7.

    Article  PubMed  PubMed Central  Google Scholar 

  35. Ihara F, Kageyama Y, Hirata M, Nihira T, Yamada Y: Purification, characterization, and molecular cloning of lactonizing lipase from Pseudomonas species. J Biol Chem. 1991, 266 (27): 18135-18140.

    PubMed  Google Scholar 

  36. Matsushita O, Yoshihara K, Katayama S, Minami J, Okabe A: Purification and characterization of Clostridium perfringens 120-kilodalton collagenase and nucleotide sequence of the corresponding gene. J Bacteriol. 1994, 176 (1): 149-156.

    PubMed  PubMed Central  Google Scholar 

  37. Abdullah KM, Lo RY, Mellors A: Cloning, nucleotide sequence, and expression of the Pasteurella haemolytica A1 glycoprotease gene. J Bacteriol. 1991, 173 (18): 5597-5603.

    PubMed  PubMed Central  Google Scholar 

  38. Hill CW, Sandt CH, Vlazny DA: Rhs elements of Escherichia coli: a family of genetic composites each encoding a large mosaic protein. Mol Microbiol. 1994, 12 (6): 865-871. 10.1111/j.1365-2958.1994.tb01074.x.

    Article  PubMed  Google Scholar 

  39. Tukel C, Raffatellu M, Humphries AD, Wilson RP, Andrews-Polymenis HL, Gull T, Figueiredo JF, Wong MH, Michelsen KS, Akcelik M, Adams LG, Baumler AJ: CsgA is a pathogen-associated molecular pattern of Salmonella enterica serotype Typhimurium that is recognized by Toll-like receptor 2. Mol Microbiol. 2005, 58 (1): 289-304. 10.1111/j.1365-2958.2005.04825.x.

    Article  PubMed  Google Scholar 

  40. Tran L, Wu XC, Wong SL: Cloning and expression of a novel protease gene encoding an extracellular neutral protease from Bacillus subtilis. J Bacteriol. 1991, 173 (20): 6364-6372.

    PubMed  PubMed Central  Google Scholar 

  41. Gueneron M, Timmers AC, Boucher C, Arlat M: Two novel proteins, PopB, which has functional nuclear localization signals, and PopC, which has a large leucine-rich repeat domain, are secreted through the hrp-secretion apparatus of Ralstonia solanacearum. Mol Microbiol. 2000, 36 (2): 261-277. 10.1046/j.1365-2958.2000.01870.x.

    Article  PubMed  Google Scholar 

  42. Ikegami A, Honma K, Sharma A, Kuramitsu HK: Multiple functions of the leucine-rich repeat protein LrrA of Treponema denticola. Infect Immun. 2004, 72 (8): 4619-4627. 10.1128/IAI.72.8.4619-4627.2004.

    Article  PubMed  PubMed Central  Google Scholar 

  43. Evdokimov AG, Anderson DE, Routzahn KM, Waugh DS: Unusual molecular architecture of the Yersinia pestis cytotoxin YopM: a leucine-rich repeat protein with the shortest repeating unit. J Mol Biol. 2001, 312 (4): 807-821. 10.1006/jmbi.2001.4973.

    Article  PubMed  Google Scholar 

  44. Boeckmann B, Bairoch A, Apweiler R, Blatter MC, Estreicher A, Gasteiger E, Martin MJ, Michoud K, O'Donovan C, Phan I, Pilbout S, Schneider M: The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 2003, 31 (1): 365-370. 10.1093/nar/gkg095.

    Article  PubMed  PubMed Central  Google Scholar 

  45. Nair R, Rost B: Sequence conserved for subcellular localization. Protein Sci. 2002, 11 (12): 2836-2847. 10.1110/ps.0207402.

    Article  PubMed  PubMed Central  Google Scholar 

  46. Nair R, Rost B: Better prediction of sub-cellular localization by combining evolutionary and structural information. Proteins. 2003, 53 (4): 917-930. 10.1002/prot.10507.

    Article  PubMed  Google Scholar 

  47. Bierne H, Sabet C, Personnic N, Cossart P: Internalins: a complex family of leucine-rich repeat-containing proteins in Listeria monocytogenes. Microbes Infect. 2007, 9 (10): 1156-1166. 10.1016/j.micinf.2007.05.003.

    Article  PubMed  Google Scholar 

  48. Di Matteo A, Federici L, Mattei B, Salvi G, Johnson KA, Savino C, De Lorenzo G, Tsernoglou D, Cervone F: The crystal structure of polygalacturonase-inhibiting protein (PGIP), a leucine-rich repeat protein involved in plant defense. Proc Natl Acad Sci U S A. 2003, 100 (17): 10124-10128. 10.1073/pnas.1733690100.

    Article  PubMed  PubMed Central  Google Scholar 

  49. D'Andrea LD, Regan L: TPR proteins: the versatile helix. Trends Biochem Sci. 2003, 28 (12): 655-662. 10.1016/j.tibs.2003.10.007.

    Article  PubMed  Google Scholar 

  50. Blatch GL, Lassle M: The tetratricopeptide repeat: a structural motif mediating protein-protein interactions. Bioessays. 1999, 21 (11): 932-939. 10.1002/(SICI)1521-1878(199911)21:11<932::AID-BIES5>3.0.CO;2-N.

    Article  PubMed  Google Scholar 

  51. Wallin E, von Heijne G: Genome-wide analysis of integral membrane proteins from eubacterial, archaean, and eukaryotic organisms. Protein Sci. 1998, 7 (4): 1029-1038.

    Article  PubMed  PubMed Central  Google Scholar 

  52. Haake DA: Spirochaetal lipoproteins and pathogenesis. Microbiology. 2000, 146 ( Pt 7): 1491-1504.

    Article  Google Scholar 

  53. Stevenson B, Choy HA, Pinne M, Rotondi ML, Miller MC, Demoll E, Kraiczy P, Cooley AE, Creamer TP, Suchard MA, Brissette CA, Verma A, Haake DA: Leptospira interrogans Endostatin-Like Outer Membrane Proteins Bind Host Fibronectin, Laminin and Regulators of Complement. PLoS ONE. 2007, 2 (11): e1188-10.1371/journal.pone.0001188.

    Article  PubMed  PubMed Central  Google Scholar 

  54. Vieira ML, D'Atri LP, Schattner M, Habarta AM, Barbosa AS, de Morais ZM, Vasconcellos SA, Abreu PA, Gomez RM, Nascimento AL: A novel leptospiral protein increases ICAM-1 and E-selectin expression in human umbilical vein endothelial cells. FEMS Microbiol Lett. 2007, 276 (2): 172-180. 10.1111/j.1574-6968.2007.00924.x.

    Article  PubMed  Google Scholar 

  55. Neves FO, Abreu PA, Vasconcellos SA, de Morais ZM, Romero EC, Nascimento AL: Identification of a novel potential antigen for early-phase serodiagnosis of leptospirosis. Arch Microbiol. 2007, 188 (5): 523-532. 10.1007/s00203-007-0273-2.

    Article  PubMed  Google Scholar 

  56. Barbosa AS, Abreu PA, Neves FO, Atzingen MV, Watanabe MM, Vieira ML, Morais ZM, Vasconcellos SA, Nascimento AL: A newly identified leptospiral adhesin mediates attachment to laminin. Infect Immun. 2006, 74 (11): 6356-6364. 10.1128/IAI.00460-06.

    Article  PubMed  PubMed Central  Google Scholar 

  57. Verma A, Hellwage J, Artiushin S, Zipfel PF, Kraiczy P, Timoney JF, Stevenson B: LfhA, a novel factor H-binding protein of Leptospira interrogans. Infect Immun. 2006, 74 (5): 2659-2666. 10.1128/IAI.74.5.2659-2666.2006.

    Article  PubMed  PubMed Central  Google Scholar 

  58. Alexander RP, Zhulin IB: Evolutionary genomics reveals conserved structural determinants of signaling and adaptation in microbial chemoreceptors. Proc Natl Acad Sci U S A. 2007, 104 (8): 2885-2890. 10.1073/pnas.0609359104.

    Article  PubMed  PubMed Central  Google Scholar 

  59. Amin DN, Taylor BL, Johnson MS: Organization of the aerotaxis receptor aer in the membrane of Escherichia coli. J Bacteriol. 2007, 189 (20): 7206-7212. 10.1128/JB.00871-07.

    Article  PubMed  PubMed Central  Google Scholar 

  60. Scheffers DJ, Pinho MG: Bacterial cell wall synthesis: new insights from localization studies. Microbiol Mol Biol Rev. 2005, 69 (4): 585-607. 10.1128/MMBR.69.4.585-607.2005.

    Article  PubMed  PubMed Central  Google Scholar 

Download references


We greatly thank Philip Shaw, Sastra Chaotheing and Duangdoa Wichadakul for their helpful critical reading and commend of the manuscript. This work was supported by the grant from the National Center for Genetic Engineering and Biotechnology, Thailand.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Wasna Viratyosin.

Additional information

Authors' contributions

WV and SI participated in designed the research project. SI and EP carried out the computational analysis and developed LDA-based classifier. WV analyzed and interpreted the result, drafted and produced the manuscript. PP provided the further insights for refining the manuscript. All authors read and approved the final manuscript.

Electronic supplementary material


Additional file 1: Putative PP proteins in L. interrogans serovar Lai genome. This table lists the Lai locus and protein annotation of (A) 17 predicted PP derived from the consensus vote prediction (B) 20 predicted PP derived from 2 out of 3 predictions with significant DBsubloc and/or GTD predictions, (C) 38 predicted PP derived from 1 out of 3 predictions with significant DBsubloc and/or GTD predictions. (XLS 46 KB)


Additional file 2: Putative CP proteins predicted by the consensus vote prediction in L. interrogans serovar Lai genome. This table lists the Lai locus and protein annotation of 418 predicted CP proteins derived from consensus vote and used as the training set for the development of the LDA based classifier. (XLS 76 KB)


Additional file 3: Putative CM proteins predicted by the consensus vote prediction in L. interrogans serovar Lai genome. This table lists the Lai locus and protein annotation of 332 predicted CM proteins derived from consensus vote and used as the training set for the development of the LDA based classifier. (XLS 60 KB)


Additional file 4: Putative CP proteins predicted by LDA based classifier of L. interrogans serovar Lai genome. This table lists the Lai locus and protein annotation of 2272 predicted CP proteins predicted by LDA based classifier (XLS 222 KB)


Additional file 5: Putative CM proteins predicted by LDA based classifier of L. interrogans serovar Lai genome. This table lists the Lai locus and protein annotation of 481 predicted CM proteins predicted by LDA based classifier. (XLS 66 KB)


Additional file 6: Subcellular localizations of 28 experimentally studied OM and EX proteins of L. interrogans serovar Lai. This table lists the protein name, L. interrogans serovar Lai and copenhengeni locus, experimental localization, subcellular localization prediction using PSORTb, ProtCompB, PA, and the combination prediction of 28 experimentally studied OM and EX proteins. (XLS 44 KB)


Additional file 7: The result of subcellular localization of putative lipoproteins using the combination method. This table lists the Lai locus tag and protein annotation of 125 probable lipoproteins and 21 possible lipoproteins predicted by SpLip programs [19] and the subcellular localization of these lipoproteins predicted by the combination method. (XLS 48 KB)


Additional file 8: Subcellular localization of vaccine candidate using the combination method.. This table lists the Lai locus tag and protein annotation of 226 vaccine candidate predicted by Yang et al. [20] and the subcellular localization of these vaccine candidates predicted by the combination method. (XLS 58 KB)

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Viratyosin, W., Ingsriswang, S., Pacharawongsakda, E. et al. Genome-wide subcellular localization of putative outer membrane and extracellular proteins in Leptospira interrogans serovar Lai genome using bioinformatics approaches. BMC Genomics 9, 181 (2008).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: