- Research
- Open Access
NeEMO: a method using residue interaction networks to improve prediction of protein stability upon mutation
- Manuel Giollo†1, 2,
- Alberto JM Martin†1,
- Ian Walsh1,
- Carlo Ferrari2 and
- Silvio CE Tosatto1Email author
https://doi.org/10.1186/1471-2164-15-S4-S7
© Giollo et al; licensee BioMed Central Ltd. 2014
- Published: 20 May 2014
Abstract
Background
The rapid growth of un-annotated missense variants poses challenges requiring novel strategies for their interpretation. From the thermodynamic point of view, amino acid changes can lead to a change in the internal energy of a protein and induce structural rearrangements. This is of great relevance for the study of diseases and protein design, justifying the development of prediction methods for variant-induced stability changes.
Results
Here we propose NeEMO, a tool for the evaluation of stability changes using an effective representation of proteins based on residue interaction networks (RINs). RINs are used to extract useful features describing interactions of the mutant amino acid with its structural environment. Benchmarking shows NeEMO to be very effective, allowing reliable predictions in different parts of the protein such as β-strands and buried residues. Validation on a previously published independent dataset shows that NeEMO has a Pearson correlation coefficient of 0.77 and a standard error of 1 Kcal/mol, outperforming nine recent methods. The NeEMO web server can be freely accessed from URL: http://protein.bio.unipd.it/neemo/.
Conclusions
NeEMO offers an innovative and reliable tool for the annotation of amino acid changes. A key contribution are RINs, which can be used for modeling proteins and their interactions effectively. Interestingly, the approach is very general, and can motivate the development of a new family of RIN-based protein structure analyzers. NeEMO may suggest innovative strategies for bioinformatics tools beyond protein stability prediction.
Keywords
- Amino Acid Change
- Stability Change
- Stability Prediction
- Relative Solvent Accessibility
- Thermodynamic Energy
Introduction
The development of Next Generation Sequencing technologies has a tremendous impact on the discovery of missense variants. In humans, dbSNP [1] reports more than one million such variants, while only 1% of them have functional annotation or are referenced in the literature. This gap represents a problem for understanding disease development [2], as the proper characterization of variant effects may require expensive experiments. This is not only important for healthcare, but also for biotechnology, where alanine-scanning mutagenesis is commonly used to study the effect of amino acid variants on protein function and interactions [3]. Finally, designing mutants for protein design [4] and to evaluate their effects on function requires a deeper understanding of the mechanisms by which single variants affect stability. The Gibbs free energy (ΔG) defines the thermodynamic energy of folding compared to the denatured state. The difference between wild type and mutant polypeptide energy (ΔΔG = ΔGwt - ΔGmut) is a measure of how the amino acid change affects protein stability. Polypeptide chains are held together by non-covalent interactions between the residues forming them. The most relevant factors affecting protein folding and stability are hydrogen bonds, van der Waals, electrostatic and hydrophobic interactions, backbone angle preferences and protein chain entropy [5]. Interestingly, the assessment of stability changes has been shown to be critical for the interpretation of variants in key proteins like TP53 [6], which is known to have a strong connection with cancer development. In order to help understand the impact of amino acid changes, the ProTherm database [7] collects the free Gibbs energy for thousands wild type and mutant proteins. This source of information is critical for the development of new methods that try to fill the gap of unannotated variants. For the last 15 years, a number of computational tools have been developed for the prediction of stability changes in mutant proteins. Energy-based methods are based on two main approaches [8]. The first type is based on the use of molecular (or quantum mechanic) force fields that try to reflect the physical energy of molecules [9, 10]. The second type, also known as knowledge-based potential functions (KBPFs), are energy functions based on statistics computed on sets of experimental or artificially generated protein structures. Most KBPFs rely on a weighted combination of several statistical terms, as in Eris [11] or FoldX [12]. In particular, the latter considers nine different terms like van-der-Waals contributions, solvation energy, hydrogen bonds and the entropy cost. All terms are linearly combined after fitting to experimental data [12].
A completely different approach is adopted by machine learning algorithms (ML). Rather than trying to explicitly describe complex models of thermodynamic energy, they are trained by minimizing the classification error on a reference dataset. A number of ML tools have been proposed for stability prediction of variants, like AutoMute [13], I-Mutant [14, 15], MuPro [16] and PoPMuSiC 2.0 [17]. Most of these simulate the change by replacing the side chain of the mutated residue, disregarding possible structural rearrangements in the backbone. As an example, I-Mutant 2.0 [14] represents variants as a vector with 42 dimensions: two for pH and temperature, 20 for encoding the wild-type and mutant residues, and 20 to describe the residue frequency in the environment surrounding the amino acid. Similarly, two versions of MuPro [16] use vectors with 140 elements to encode the residue in a sliding window that considers 3 positions on the left and right of the mutant amino acid. Both methods trained a Support Vector Machine for classification and regression purposes with the radial basis function kernel [18]. This is a general trend of ML-based approaches for stability prediction: non-linear functions are preferred due to their increased ability to detect patterns in the data, leading to better performance. In addition, all methods try to encode explicitly information about the protein of interest using either structure or sequence information. Both information can be described effectively using residue-residue interaction networks (RINs), as suggested by RING [19]. RINs are a graph description of protein structures where nodes represent amino acids and edges represent different types of physico-chemical bonds (e.g. hydrogen bonds, salt bridges, hydrophobic contacts). Using RINs can be of interest for stability estimation due to their implicit detailed representation of different chemical interactions in proteins. These interactions play a central role for the internal folding energy, so they may introduce new discriminative variables for the analysis of mutants [20]. Using this insight, in our work we trained a non-linear neural network for the prediction of stability changes based on RINs. We will show that using this effective protein representation there is an improvement in the prediction of protein stability. We believe that NeEMO can contribute significantly for the characterization of un-annotated missense variants and for protein mutagenesis studies, increasing the knowledge in this challenging field.
Methods
Dataset
For machine learning methods, the construction of a dataset is a critical process requiring a meticulous selection and curation of the starting data. The ProTherm database [7] represents a reference dataset describing the effects of amino acid mutations in terms of thermodynamic energy changes, currently containing information on 647 different proteins. Roughly one third of the 22,713 entries represent the Gibbs free energy of the wild type protein, while the reminder report the ΔG of a mutant. It is clear that there is a remarkable redundancy of information that needs to be managed. Here, we decided to focus on the curated version of ProTherm used to train PoPMuSiC 2.0 [17]. In order to avoid bias, we evaluated sequence similarity on the 131 proteins of this training dataset. Using PANADA [21], clustering at 90% and 40% identical sequences produces 129 and 119 different clusters respectively. In particular, none of these clusters had more than three sequences in it. This high diversity is therefore a key factor for the machine learning procedure, as it is likely to provide an effective estimation of the data model.
03940394G distribution on the training set.
To perform additional tests, we created a second dataset (IM_631) from the training data used in MuPro [16] and I-Mutant [15], containing 631 new mutations in 30 different proteins, to be used as independent samples providing indication of overfitting. The dataset distribution is quite different from the PoPMuSiC data (Supplementary Figure S1), especially in the frequency of highly destabilizing variants (ΔΔG > 5 kcal/mol). The latter dataset explicitly removed strong mutants likely to yield significant changes to the protein structure, which may represent a threat during the learning process. On the other hand, the IM_631 dataset collects real variants with no prior filtering, so these mutations can be used to evaluate NeEMO without bias. Last but not least, the S350 dataset contains further mutations which are typically used to compare the performances of different methods [17]. This data will be considered to obtain a fair comparison of NeEMO performance with other stability prediction tools.
Relevant features
Our objective is to investigate how useful RINs are in the context of stability prediction. RINs are potentially interesting because they can be used to detect informative amino-acids in a target protein using standard graph algorithms like Dijkstra's shortest path or PageRank [22]. These networks have been generated by RING [19] with default parameters, i.e. closest atom networks where interactions are reported for residues that have atoms at less than 5 Å. There are four main features that we obtain with this tool, which will be briefly described in the following. For a more detailed description of the features see Supplementary Table S1.
Evolutionary information
The overall idea is that evolutionary information can discriminate key residues in the protein, either for stability or functional reasons. NeEMO considers conservation, Mutual Information and its correction Average Cluster Purity as a feature for stability prediction. These values are generated by RING, which generates a multiple sequence alignment using PSI-BLAST [23] on the UniRef90 sequence database and computes several measures reflecting evolutionary information of each residue.
Residue conformational propensities
The impact of variants strongly depends on the local environment of each residue in the structure. Classical tools for the evaluation of protein structures can highlight residues with high structural constraints that should not be mutated. In the current implementation, RING uses TAP [24], FRST [25], and QMEAN [26] to estimate the amino acid energy contribution. In particular, these tools evaluate statistical potentials such as all atom distance-dependent pairwise, torsion angle, and solvation potentials. All these numerical terms are included in NeEMO for an accurate description of the mutant context.
Amino acid information
The wild type, the mutant and its two adjacent residues in the sequence (left and right) are used to describe protein changes. One-hot encoding is used to represent the sequence information, as it was previously shown to be effective [27]. I.e. the 20 standard residues are translated into a 20-dimensional vector where the i-th element is 1, and the others are 0. In addition, secondary structure and relative solvent accessibility (RSA) defined by DSSP [28] are used to describe the local context.
Network topology
Using RING it is possible to distinguish between H-bond, inter-atomic contacts, π-cation, π-π stacks, salt bridges and the atoms involved in these interactions [19]. The standard node parameters described in NetworkAnalyzer [29] are computed on that information and used to describe the mutant and its sequence neighbor (left and right) for stability prediction. Centralities are computed by considering multiple sub-network that consider a single chemical bond at a time. In addition, the network size and frequency of each amino acid type in contact with the mutation position in the RIN were also counted. Neighboring residues are defined as those which have any atom at ≤ 5 Å to any of the atoms from the other residue. The overall idea is to comprehensively assess the network connections, and measure if the mutant is central in the protein graph topology. This information was critically discriminative in previous work [15], [20], so we expect it to be also effective in the context of stability prediction.
Last but not least, pH and temperature are considered during the prediction. All information is stored in 184 dimensional vectors for each mutation. Almost half of the features are needed to describe amino acid information, due to the one-hot encoding sparsity with 20 descriptors for every residue.
Training
Representation of the NeEMO pipeline. The 3D structure of a protein is transformed into a RIN. Node centralities are then computed and combined with other RIN features, such as secondary structure, relative solvent accessibility and sequence conservation. All numerical descriptors are then fed into a neural network that predicts the ΔΔG for the chosen mutation.
Performance measures
where ρ is identical to Pearson correlation applied to the rank of the predictions, while τ accounts for the number of prediction pairs having correct order (CP) or wrong order (DP) with respect to the real ΔΔG for the n dataset examples. Finally, the standard error σ is used to report the expected distance of the prediction from the real G of the mutation.
Termophile case study
Summary of the 10 pairs of mesophilic and thermophilic proteins used in the case study, their similarity and the environmental conditions (pH and Temperature) used to perform the test [31].
Mesophile | Extremophile | Alignment | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
Protein | PDB code | Species | pH | T (°C) | PDB code | Species | pH | T (°C) | Identity | Gaps |
Adenylate kinase | 1AK2A | B. taurus | 7 | 38 | 1ZIPA | G. stearothermophilus | 7 | 65 | 90/223 (40.4%) | 9 |
Phosphoglycerate Kinase | 3PGKA | S. cerevisiae | 6,6 | 30 | 1PHPA | G. stearothermophilus | 7 | 65 | 210/420 (50.0%) | 31 |
Reductase | 1LVLA | P. putida | 7 | 30 | 1EBDA | G. stearothermophilus | 7 | 65 | 192/466 (41.2%) | 19 |
Lactate Dehydrogenase | 1LDMA | S. acanthias | 7,9 | 11 | 1LDNA | G. stearothermophilus | 7 | 65 | 111/335 (33.1%) | 25 |
TATA box binding protein | 1VOKA | A. thaliana | 7 | 20 | 1PCZA | P. woesei | 7 | 98 | 75/198 (37.9%) | 22 |
Subtilisin | 1ST3A | B. lentus | 7 | 20 | 1THMA | T. vulgaris | 6 | 60 | 132/282 (46.8%) | 16 |
Carboxy Peptidase | 2CTCA | B. taurus | 7 | 38 | 1OBRA | T. vulgaris | 6 | 60 | 93/346 (26.9%) | 62 |
Glyceraldehyde-3-phosphate | 1GADO | E. coli | 7 | 37 | 1GD1O | G. stearothermophilus | 7 | 65 | 194/335 (57.9%) | 6 |
Neutral Protease | 1NPCA | B. cereus | 7 | 30 | 1THLA | B. thermoproteolyticus | 7 | 80 | 231/318 (72.6%) | 2 |
Phosphofructo Kinase | 2PFKD | E. coli | 7 | 37 | 3PFKA | G. stearothermophilus | 7 | 65 | 172/320 (53.8%) | 20 |
Results
We developed NeEMO, a machine learning method that uses RIN information, to evaluate the impact of amino acid changes in protein stability. Using a curated ProTherm dataset, 10-fold cross validation was used for training and performance evaluation. Finally, the tool is tested on two independent sets of protein variants, providing an unbiased evaluation of its reliability and a fair comparison with other methods.
Training and cross-validation
NeEMO was trained on a large dataset previously used by PoPMuSiC 2.0 [17]. The results of 10-fold cross validation on this dataset are shown together with a preliminary comparison to other methods in Table 2. Our goal was to assess if the features and the mathematical model of our method are able to fit effectively into the traning data. Several state-of-the-art methods were used, namely Auto-Mute, I-Mutant 2.0 and 3.0, MuPro and PoPMuSiC 2.0. The comparison was not straightforward, as most predictors were occasionally not able to make a prediction for some variants due to their inability to manage certain PDB files. We decided to compare NeEMO only on the mutations where all tools were executed successfully. In many cases the variants of this test set are part of the training dataset of other methods. For this reason, this performance comparison cannot be considered unbiased, and therefore it is just a mean to measure if the fitting procedure is as good as the one used on other methods. As shown in Table 2, NeEMO performs consistently well compared to other state-of-the-art tools. Auto-Mute is the only method providing comparable results, but seems very poor in the input and mutation management, as the method cannot make a reliable prediction for half of the examples (e.g. NMR solved proteins, or in case if atoms with repeated coordinate sets). In view of the good performance in the cross-validation, we expect that the fitting process was overall good.
Regression performance comparison of NeEMO with other methods on the ten-fold cross-validation test.
r | ρ | τ | |||||
---|---|---|---|---|---|---|---|
Method | Mutations | Method | NeEMO | Method | NeEMO | Method | NeEMO |
Auto-Mute | 1,144 | 0.691 | 0.640 | 0.686 | 0.635 | 0.509 | 0.456 |
I-Mutant 2.0 | 2,171 | 0.642 | 0.678 | 0.623 | 0.652 | 0.467 | 0.471 |
I-Mutant 3.0 | 2,112 | 0.620 | 0.679 | 0.623 | 0.658 | 0.434 | 0.477 |
MuPro | 2,398 | 0.606 | 0.665 | 0.571 | 0.643 | 0.416 | 0.465 |
PoPMuSiC 2.0 | 2,399 | 0.623 | 0.666 | 0.617 | 0.644 | 0.445 | 0.465 |
NeEMO in-depth analysis
Regression results of NeEMO versions on the training set.
Correlation measure performance of different NeEMO versions on the IM_631 dataset.
r | ρ | τ | ||
---|---|---|---|---|
All | NeEMO | 0.666 | 0.644 | 0.465 |
NeEMO_NOCC | 0.637 | 0.626 | 0.447 | |
NeEMO_NORING | 0.618 | 0.603 | 0.430 | |
Helix | NeEMO | 0.645 | 0.612 | 0.436 |
NeEMO_NOCC | 0.613 | 0.607 | 0.430 | |
NeEMO_NORING | 0.585 | 0.600 | 0.424 | |
Beta | NeEMO | 0.716 | 0.687 | 0.506 |
strand | NeEMO_NOCC | 0.694 | 0.672 | 0.490 |
NeEMO_NORING | 0.690 | 0.662 | 0.482 | |
Coil | NeEMO | 0.581 | 0.588 | 0.418 |
NeEMO_NOCC | 0.546 | 0.560 | 0.391 | |
NeEMO_NORING | 0.502 | 0.501 | 0.350 | |
Exposed | NeEMO | 0.603 | 0.551 | 0.391 |
NeEMO_NOCC | 0.553 | 0.516 | 0.360 | |
NeEMO_NORING | 0.522 | 0.498 | 0.350 | |
Buried | NeEMO | 0.638 | 0.614 | 0.441 |
NeEMO_NOCC | 0.612 | 0.593 | 0.422 | |
NeEMO_NORING | 0.591 | 0.559 | 0.397 |
Comparison with other methods
Performance of different methods on the independent S350 dataset.
All mutations | Common mutations | Common mutations -10% | |||||||
---|---|---|---|---|---|---|---|---|---|
n | r | σ | n | r | σ | n | r | σ | |
Automute | 315 | 0.46 | 1.42 | 299 | 0.44 | 1.45 | 264 | 0.60 | 1.06 |
CUPSAT | 346 | 0.37 | 1.46 | 299 | 0.37 | 1.50 | 264 | 0.50 | 1.10 |
Dmutant | 350 | 0.48 | 1.38 | 299 | 0.46 | 1.44 | 264 | 0.63 | 1.05 |
Eris | 334 | 0.35 | 1.49 | 299 | 0.35 | 1.52 | 264 | 0.55 | 1.07 |
I-Mutant 2.0 | 346 | 0.29 | 1.50 | 299 | 0.27 | 1.56 | 264 | 0.39 | 1.16 |
I-Mutant 3.0 | 338 | 0.53 | 1.35 | 299 | 0.53 | 1.37 | 264 | 0.71 | 1.00 |
MuPro | 350 | 0.41 | 1.43 | 299 | 0.41 | 1.48 | 264 | 0.49 | 1.12 |
PoPMuSiC 1.0 | 350 | 0.62 | 1.23 | 299 | 0.63 | 1.26 | 264 | 0.72 | 0.93 |
PoPMuSiC 2.0 | 350 | 0.67 | 1.16 | 299 | 0.67 | 1.21 | 264 | 0.80 | 0.86 |
NeEMO | 350 | 0.67 | 1.16 | 299 | 0.68 | 1.19 | 264 | 0.79 | 0.88 |
Termophile analysis
NeEMO predictions on the mesophilic and thermophilic mutations.
T → M | M → T | ||||||
---|---|---|---|---|---|---|---|
Mesophile | Thermophile | Increase | Decrease | Energy | Increase | Decrease | Energy |
1AK2A | 1ZIPA | 28 | 85 | 56.66 | 50 | 63 | 38.92 |
3PGKA | 1PHPA | 66 | 104 | 55.22 | 128 | 42 | -42.47 |
1LVLA | 1EBDA | 102 | 138 | 50.85 | 169 | 71 | -63.93 |
1LDMA | 1LDNA | 46 | 140 | 94.32 | 105 | 81 | -1.99 |
1VOKA | 1PCZA | 18 | 77 | 78.02 | 42 | 53 | 11.63 |
1ST3A | 1THMA | 35 | 90 | 75.91 | 48 | 77 | 34.62 |
2CTCA | 1OBRA | 73 | 108 | 51.41 | 100 | 81 | 9.46 |
1GADO | 1GD1O | 50 | 82 | 23.93 | 78 | 54 | -14.69 |
1NPCA | 1THLA | 20 | 61 | 33.95 | 49 | 32 | -2.37 |
2PFKD | 3PFKA | 60 | 69 | 20.48 | 50 | 79 | 41.91 |
Total | 498 | 954 | 819 | 633 |
3D structure of the 3PGKA, showing well predicted buried residues (blue) and mispredictions (red). For this protein, the mesophile and thermophile core amino acids share a similar structure.
Web server
The NeEMO web server is freely available to the scientific community from URL: http://protein.bio.unipd.it/neemo/. Once a PDB file is specified by the user, the service computes the RIN in a few minutes, and provides a user-friendly interface for variant prediction. Multiple amino acid changes can be tested at a time, including different pH and temperature parameters. The tool is also very fast. Once the multiple alignment is computed, the effect of a residue change on the protein structure can be predicted in few seconds, making it scalable for large-scale usage.
Conclusions
NeEMO represents a novel approach to predict ΔΔG changes after point mutations in protein structures. It takes advantage of RINs created by our previous work RING [19] to describe protein structures and interactions between the amino acids forming them. In RINs each residue is described by several features, including secondary structure, solvent accessibility, conservation and a number of residue-specific energy potentials. RING also provides detailed information about interactions found between different amino acids, including their occurrence and types. The interactions present in the RIN are used to compute node centralities that encode the relevance of each RIN node in a protein structure. Inclusion of RINs and information derived from them was shown to improve mutation stability prediction performance. Overall, NeEMO seems able to significantly outperform all other tested methods, and shows very good accuracy across different secondary structures and in classification. It also seems good in terms of reliability, as it can manage and produce a prediction for nearly all PDB files of the PoPMuSiC 2.0 dataset. For the near future, we are planning to extend NeEMO to map multiple chains directly into an integrated RIN.
Another advantage of our approach is that it does not rely on 3D models for the mutant proteins. Instead the RIN for the wild type protein is used to predict the stability change. Other methods have to model the mutant structure first, which may be computationally expensive and in some cases can introduce errors that our protocol avoids. In addition, RINs are very comprehensive data structures that help the management of heterogeneous information sources like evolutionary and topological data. In fact, experiments show that network data improves prediction quality without exception. Finally, it is interesting to note that the evaluation on unseen examples in IM_631 results in basically unchanged performance. This is a nice result, because the ΔΔG distribution of the training data was significantly different. The overall results also prove no overfitting was introduced in NeEMO, and confirm that it can be used effectively for the assessment of mutation impact. As the number of known variants and PDB structures in different organisms is rapidly increasing, we believe that the tool can be important for variant assessment. Finally, NeEMO can also play a role for pathogenicity prediction as shown in [37]. It is well known that stability loss in proteins like TP53 [6] is associated with disease development. The ability of RINs to describe proteins and their variants effectively can play a role for the detection of deleterious protein changes, and may also contribute to pathogenicity prediction.
Notes
Declarations
Acknowledgements
Dr. Majid Masso for providing AUTO-MUTE predictions and Dr. Yves Dehouck for PoPMuSiC 2.0 predictions in the original cross-validation experiment.
Declarations
The publication costs for this article were funded by a grant from FIRB Futuro in Ricerca [RBFR08ZSXY], AIRC [MFAG 12740] and CARIPLO [2011/0724].
This article has been published as part of BMC Genomics Volume 15 Supplement 4, 2014: SNP-SIG 2013: Identification and annotation of genetic variants in the context of structure, function, and disease. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcgenomics/supplements/15/S4
Authors’ Affiliations
References
- Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K: dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001, 29 (1): 308-311. 10.1093/nar/29.1.308. JanPubMedPubMed CentralView ArticleGoogle Scholar
- Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, McCarthy MI, Ramos EM, Cardon LR, Chakravarti A, Cho JH, Guttmacher AE, Kong A, Kruglyak L, Mardis E, Rotimi CN, Slatkin M, Valle D, Whittemore AS, Boehnke M, Clark AG, Eichler EE, Gibson G, Haines JL, Mackay TFC, McCarroll SA, Visscher PM: Finding the missing heritability of complex diseases. Nature. 2009, 461 (7265): 747-753. 10.1038/nature08494. OctPubMedPubMed CentralView ArticleGoogle Scholar
- VanPetegem F, Duderstadt KE, Clark KA, Wang M, Minor DL: Alanine-scanning mutagenesis defines a conserved energetic hotspot in the CaValpha1 AID-CaVbeta interaction site that is critical for channel modulation. Struct Lond Engl 1993. 2008, 16 (2): 280-294. FebGoogle Scholar
- Bryson JW, Betz SF, Lu HS, Suich DJ, Zhou HX, O'Neil KT, DeGrado WF: Protein Design: A Hierarchic Approach. Science. 1995, 270 (5238): 935-941. 10.1126/science.270.5238.935. NovPubMedView ArticleGoogle Scholar
- Dill KA, MacCallum JL: The Protein-Folding Problem, 50 Years On. Science. 2012, 338 (6110): 1042-1046. 10.1126/science.1219021. NovPubMedView ArticleGoogle Scholar
- Bullock AN, Henckel J, DeDecker BS, Johnson CM, Nikolova PV, Proctor MR, Lane DP, Fersht AR: Thermodynamic stability of wild-type and mutant p53 core domain. Proc Natl Acad Sci USA. 1997, 94 (26): 14338-14342. 10.1073/pnas.94.26.14338. DecPubMedPubMed CentralView ArticleGoogle Scholar
- Bava KA, Gromiha MM, Uedaira H, Kitajima K, Sarai A: ProTherm, version 4.0: thermodynamic database for proteins and mutants. Nucleic Acids Res. 2004, 32 (Database): D120-121. JanPubMedPubMed CentralView ArticleGoogle Scholar
- Lazaridis T, Karplus M: Effective energy functions for protein structure prediction. Curr Opin Struct Biol. 2000, 10 (2): 139-145. 10.1016/S0959-440X(00)00063-4. AprPubMedView ArticleGoogle Scholar
- Benedix A, Becker CM, deGroot BL, Caflisch A, Böckmann RA: Predicting free energy changes using structural ensembles. Nat Methods. 2009, 6 (1): 3-4. 10.1038/nmeth0109-3. JanPubMedView ArticleGoogle Scholar
- Pokala N, Handel TM: Energy Functions for Protein Design: Adjustment with Protein-Protein Complex Affinities, Models for the Unfolded State, and Negative Design of Solubility and Specificity. J Mol Biol. 2005, 347 (1): 203-227. 10.1016/j.jmb.2004.12.019. MarPubMedView ArticleGoogle Scholar
- Yin S, Ding F, Dokholyan NV: Modeling backbone flexibility improves protein stability estimation. StructLond Engl 1993. 2007, 15 (12): 1567-1576. DecGoogle Scholar
- Guerois R, Nielsen JE, Serrano L: Predicting Changes in the Stability of Proteins and Protein Complexes: A Study of More Than 1000 Mutations. JMol Biol. 2002, 320 (2): 369-387. 10.1016/S0022-2836(02)00442-4. JulView ArticleGoogle Scholar
- Masso M, Vaisman II: AUTO-MUTE: web-based tools for predicting stability changes in proteins due to single amino acid replacements. Protein Eng Des Sel PEDS. 2010, 23 (8): 683-687. 10.1093/protein/gzq042. AugPubMedView ArticleGoogle Scholar
- Capriotti E, Fariselli P, Casadio R: I-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structure. Nucleic Acids Res. 2005, 33 (suppl 2): W306-W310. JulPubMedPubMed CentralView ArticleGoogle Scholar
- Capriotti E, Fariselli P, Rossi I, Casadio R: A three-state prediction of single point mutations on protein stability changes. BMC Bioinformatics. 2008, 9 (Suppl 2): S6-10.1186/1471-2105-9-S2-S6. MarPubMedPubMed CentralView ArticleGoogle Scholar
- Cheng J, Randall A, Baldi P: Prediction of protein stability changes for single-site mutations using support vector machines. Proteins. 2006, 62 (4): 1125-1132. MarPubMedView ArticleGoogle Scholar
- Dehouck Y, Grosfils A, Folch B, Gilis D, Bogaerts P, Rooman M: Fast and accurate predictions of protein stability changes upon mutations using statistical potentials and neural networks: PoPMuSiC-2.0. Bioinforma Oxf Engl. 2009, 25 (19): 2537-2543. 10.1093/bioinformatics/btp445. OctView ArticleGoogle Scholar
- Musavi MT, Ahmed W, Chan KH, Faris KB, Hummels DM: On the training of radial basis function classifiers. Neural Netw. 1992, 5 (4): 595-603. 10.1016/S0893-6080(05)80038-3. JulView ArticleGoogle Scholar
- Martin AJM, Vidotto M, Boscariol F, Domenico Di, Walsh I, Tosatto SCE: RING: networking interacting residues, evolutionary information and energetics in protein structures. Bioinformatics. 2011, 27 (14): 2003-2005. 10.1093/bioinformatics/btr191. JulPubMedView ArticleGoogle Scholar
- Cheng TMK, Lu YE, Vendruscolo M, Lio P, Blundell TL: Prediction by Graph Theoretic Measures of Structural Effects in Proteins Arising from Non-Synonymous Single Nucleotide Polymorphisms. PLoS Comput Biol. 2008, 4 (7): e1000135-10.1371/journal.pcbi.1000135. JulPubMedPubMed CentralView ArticleGoogle Scholar
- Martin AJM, Walsh I, Domenico TD, Mi010Deti0107 I, Tosatto SCE: PANADA: protein association network annotation, determination and analysis. PloS One. 2013, 8 (11): e78383-10.1371/journal.pone.0078383.PubMedPubMed CentralView ArticleGoogle Scholar
- Page L, Brin S, Motwani R, Winograd T: The PageRank Citation Ranking: Bringing Order to the Web. 1999, 11 Nov 23 Oct 2013, [http://ilpubs.stanford.edu:8090/422/]Google Scholar
- Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25 (17): 3389-3402. 10.1093/nar/25.17.3389. SepPubMedPubMed CentralView ArticleGoogle Scholar
- Tosatto SCE, Battistutta R: TAP score: torsion angle propensity normalization applied to local protein structure evaluation. BMC Bioinformatics. 2007, 8: 155-10.1186/1471-2105-8-155.PubMedPubMed CentralView ArticleGoogle Scholar
- Tosatto SCE: The victor/FRST function for model quality estimation. J Comput Biol J Comput Mol Cell Biol. 2005, 12 (10): 1316-1327. 10.1089/cmb.2005.12.1316. DecView ArticleGoogle Scholar
- Benkert P, Tosatto SCE, Schomburg D: QMEAN: A comprehensive scoring function for model quality assessment. Proteins. 2008, 71 (1): 261-277. 10.1002/prot.21715. AprPubMedView ArticleGoogle Scholar
- Walsh I, Martin AJM, Domenico Di, Tosatto SCE: ESpritz: accurate and fast prediction of protein disorder. Bioinforma Oxf Engl. 2012, 28 (4): 503-509. 10.1093/bioinformatics/btr682. FebView ArticleGoogle Scholar
- Kabsch W, Sander C: Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 1983, 22 (12): 2577-2637. 10.1002/bip.360221211. DecPubMedView ArticleGoogle Scholar
- Assenov Y, Ramírez F, Schelhorn SE, Lengauer T, Albrecht M: Computing topological parameters of biological networks. Bioinformatics. 2008, 24 (2): 282-284. 10.1093/bioinformatics/btm554. JanPubMedView ArticleGoogle Scholar
- Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH: The WEKA data mining software: an update. SIGKDD Explor Newsl. 2009, 11 (1): 10-18. 10.1145/1656274.1656278. NovView ArticleGoogle Scholar
- Brinda KV, Vishveshwara S: A Network Representation of Protein Structures: Implications for Protein Stability. Biophys J. 2005, 89 (6): 4159-4170. 10.1529/biophysj.105.064485. DecPubMedPubMed CentralView ArticleGoogle Scholar
- Rice P, Longden I, Bleasby A: EMBOSS: The European Molecular Biology Open Software Suite. Trends Genet. 2000, 16 (6): 276-277. 10.1016/S0168-9525(00)02024-2. JunPubMedView ArticleGoogle Scholar
- Gilis D, Rooman M: PoPMuSiC, an algorithm for predicting protein mutant stability changes: application to prion proteins. Protein Eng. 2000, 13 (12): 849-856. 10.1093/protein/13.12.849. DecPubMedView ArticleGoogle Scholar
- Yin S, Ding F, Dokholyan NV: Eris: an automated estimator of protein stability. Nat Methods. 2007, 4 (6): 466-467. 10.1038/nmeth0607-466. JunPubMedView ArticleGoogle Scholar
- Parthiban V, Gromiha MM, Schomburg D: CUPSAT: prediction of protein stability upon point mutations. Nucleic Acids Res. 2006, 34 (Web Server): W239-242. 10.1093/nar/gkl190. JulPubMedPubMed CentralView ArticleGoogle Scholar
- Zhou H, Zhou Y: Distance-scaled, finite ideal-gas reference state improves structure-derived potentials of mean force for structure selection and stability prediction. Protein Sci Publ Protein Soc. 2002, 11 (11): 2714-2726. NovView ArticleGoogle Scholar
- Olatubosun A, Väliaho J, Härkönen J, Thusberg J, Vihinen M: PON-P: integrated predictor for pathogenicity of missense variants. Hum Mutat. 2012, 33 (8): 1166-1174. 10.1002/humu.22102. AugPubMedView ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.