Module (Reference) | Gene property | Evidence | Output | Scoring weight (points) | Example Protein (M. catarrhalis NAO366_1291) | Example Feature | Example Weight | Example Cumulative Score |
---|---|---|---|---|---|---|---|---|
PSORTb* [13] | Surface exposure^ | Sub-cellular localization | Surface localization prediction | + 1 if surface exposed | 9.52|OuterMembrane | Positive surface exposure | 1 | 1 |
−1 if cytoplasmic | ||||||||
LipoP [14] | Surface exposure^ | Lipoprotein motif | Presence or absence of a motif | 1 or 0 | SpI|18.809 | Positive for lipoprotein motif | 1 | 2 |
TMHMM [15] | Surface exposure^ | Transmembrane spans | Number of helices | If surface exposed < 2: + 0.5 | 1 | Presence of 1 TMH | 0.5 | 2.5 |
02:00.0 | ||||||||
3: −0.2 | ||||||||
≥4: −2 | ||||||||
If cytoplasmic | ||||||||
−2 | ||||||||
SignalP [16] | Surface exposure^ | Signal peptide | Signal peptide | + 1 for presence | MNKTSTQLGLLAVSVSLIMASLPAHA | Signal peptide present | 1 | 3.5 |
SPAAN [17] | Surface exposure^ | Adhesin protein | Adhesin protein score | + 0.5 if above cutoff score (default 0.75) | 0.907057 | Predicted Adhesin | 0.5 | 4 |
Surface HMMs [18]a | Surface exposure Function | HMM for motif or function | HMM title and score | 0.5 | None | No HMM alignment | 0 | 4 |
Antigenic [19] | Antigenicity | Antigenic epitopes | Peptides, scores, protein coverage | 0.5 | QLGLLAVSVSLIMASLPAHAVYLDR|1.193|10(169)|41.73 | Predicted antigenic region. | 0.5 | 4.9173 |
+ 0–1 proportional to coverage | 41.73% of the protein is antigenic | 0.4173 | ||||||
Bcell Pred [20] | Antigenicity | B cell epitopes, 6 prediction methods combined | Number of peptides, protein coverage | + 0–1 proportional to coverage | 6(59)|14|14.57 | Predicited B-cell Epitopes | 0.1457 | 5.09809 |
+ 0–1 proportional to total number of peptides of a given length per protein | 14.57% predicted in 14 peptides of 7AA | 14/(405–7 + 1) = 0.03509 | ||||||
MHC class I [20] | Antigenicity | MHC-I epitopes | Number of peptides, protein coverage | + 0–1 proportional to coverage if 80–90% | 6(378)|124|73|93.33 | Predicted MHC binding | 6.41039 | |
+ 0–1 proportional to total number of peptides of a given length per protein | 93.33% predicted in 124 peptides of 9AA | 124/(405–9 + 1) = 0.3123 | ||||||
+ 1 if coverage is > = 90% | 1 | |||||||
NetCTLpan [20] | Antigenicity | MHC-I epitopes | Number of peptides, protein coverage | + 0–1 proportional to coverage if 80–90% | 12(334)|70|12|82.47 | Predicted MHC binding | 0.8247 | 7.41139 |
+ 0–1 proportional to total number of peptides of a given length per protein | 82.47% predicted in 70 peptides | 70/(405–9 + 1) = 0.1763 | ||||||
+ 1 if coverage is > = 90% | ||||||||
Immunogenicity (MHC-I) [20] | Antigenicity | MHC-I epitopes immunogenicity | Number of peptides, protein coverage | + 0–1 proportional to coverage | 7(76)|14|36|18.77 | Predicted immunogenic region | 0.1877 | 8.63435 |
+ 0–1 proportional to total number of peptides of a given length per protein | 14 peptides of 9AA | 14/(405–9 + 1) = 0.035264 | ||||||
+ 1 if coverage is > = 10% | 1 | |||||||
MHC class II [20] | Antigenicity | MHC-II epitopes | Number of peptides, protein coverage | + 0–1 proportional to coverage if 80–90% | 2(404)|315|61|99.75 | Predicted MHC-II binding | 10.43995 | |
+ 0–1 proportional to total number of peptides of a given length per protein | 99.75% predicted in 315 peptides of 15AA | 315/(405–15 + 1) = 0.8056 | ||||||
+ 1 if coverage is > = 90% | 1 | |||||||
BLAT (IEDBb database*) [20] | Antigenicity | Similarity to curated epitopes from IEDB | Protein coverage | + 0–1 proportional to coverage | None | No hits to epitope database | 0 | 10.43995 |
+ 1 if coverage is > 70% | ||||||||
Autoimmunity [5] | Autoimmunity | Similarity to human proteins | Protein coverage | + 1 if no autoimmunity | None | No hits to Human | 1 | 11.43995 |
−2 *(0 to1) proportional to coverage | ||||||||
−2 if coverage is > 20% | ||||||||
Autoimmunity Commensals [5] | Autoimmunity | Similarity to user-defined commensal organisms’ proteins | Protein coverage | + 1 if no autoimmunity | 3(39)|9.63 | 9.63% similarity to commensal | (0.0963)x(− 2) = −0.1926 | 11.24735 |
−2 *(0 to1) proportional to coverage | (Negative feature) | |||||||
−2 if coverage is > 20% | ||||||||
SSRd Finder [4] | Variability of expression | Phase variation | Number of simple sequence repeats | + 1 if no SSR | None | No DNA SSR found | 1 | 12.24735 |
−0.5 for each SSR | ||||||||
−0.25 for each SSR in the promoter | ||||||||
−0.5 for each SSR with frameshift potential | ||||||||
−0.01 times the length of the SSR. | ||||||||
SSRd Finder Protein [4] | Variability of expression | Potential conformational shifts | Number of protein tandem repeats | −0.2 for each protein repeat, max penalty of 1. | None | No protein SSR found | 0 | 12.24735 |
IslandPath [21] | Potential for horizontal gene transfer | Genomic Islands | Presence in a GI | −1 for each protein in a GI | None | Not present in a GI | 0.5 | 12.74735 |
+ 0.5 for absence | ||||||||
Jaccard Clusters [22]† | Conservation | Orthologous clusters | Presence in an orthologous cluster | + 1 for each protein in a COG in > = 90% of genomes in atleast one method | j_ortholog_cluster_3254|63 | Present in > 90% of the genomes | ||
−0.25 for each protein in a COG in < 90% of genomes | ||||||||
PanOCT [23]† | Conservation | Orthologous clusters | Presence in an orthologous cluster | + 1 for each protein in a COG in > = 90% of genomes in atleast one method | PanOCT_cluster_108|63 | Present in > 90% of the genomes | ||
−0.25 for each protein in a COG in < 90% of genomes | 1 | 13.74735 | ||||||
OrthoMCL [24]† | Conservation | Orthologous clusters | Presence in an orthologous cluster | + 1 for each protein in a COG in > = 90% of genomes in atleast one method | orthomcl_cluster1407|63 | Present in > 90% of the genomes | ||
−0.25 for each protein in a COG in < 90% of genomes | ||||||||
LS-BSR [25]† | Conservation | Orthologous clusters | Presence in an orthologous cluster | + 1 for each protein in a COG in > = 90% of genomes in atleast one method | 63 | Present in > 90% of the genomes | ||
−0.25 for each protein in a COG in < 90% of genomes | ||||||||
Attributorc | Function | Annotation & GO Terms | Annotation & GO Terms | + 1 for each GO term in our surface exposed GO db | hypothetical_protein_domain_protein | No conclusive GO terms predicted | 0 | 13.74735 |
−1 for each GO term in our non-surface exposed GO db | ||||||||
+ 1 if presence of surface exposure keywords if predicted periplasmic | ||||||||
aHMM: Hidden Markov Model. This component includes a collection of HMMs selected from the Pfam database for motifs associated with surface proteins. | ||||||||
bIEDB: Immune Epitope Database and Analysis Resource | ||||||||
cIn house Perl/Python script | ||||||||
dSSRs: simple sequence repeats | ||||||||
*If any three of PSORTB, LipoP, SignalP and IEDB Database matches are all positive, weight is incremented by 2. | True | 2 | 15.74735 | |||||
^If all surface exposure tools fail a conclusive prediction, weight is decremented by 2 | False | 0 | 15.74735 | |||||
†Each protein is given an additional 0.1 for > 90% presence in each of the clustering algorithms, Jaccard Clusters, PanOCT, OrthoMCL and LS-BSR, and penalized 0.5 for < 90% presence or absence of a cluster for each tool. | True | 0.4 | ReVac Score = 16.14735 |