Volume 15 Supplement 1
Proposing a highly accurate protein structural class predictor using segmentation-based features
- Abdollah Dehzangi†1, 2Email author,
- Kuldip Paliwal3,
- James Lyons3,
- Alok Sharma1, 4 and
- Abdul Sattar1, 2
© Dehzangi et al.; licensee BioMed Central Ltd. 2014
Published: 24 January 2014
Prediction of the structural classes of proteins can provide important information about their functionalities as well as their major tertiary structures. It is also considered as an important step towards protein structure prediction problem. Despite all the efforts have been made so far, finding a fast and accurate computational approach to solve protein structural class prediction problem still remains a challenging problem in bioinformatics and computational biology.
In this study we propose segmented distribution and segmented auto covariance feature extraction methods to capture local and global discriminatory information from evolutionary profiles and predicted secondary structure of the proteins. By applying SVM to our extracted features, for the first time we enhance the protein structural class prediction accuracy to over 90% and 85% for two popular low-homology benchmarks that have been widely used in the literature. We report 92.2% and 86.3% prediction accuracies for 25PDB and 1189 benchmarks which are respectively up to 7.9% and 2.8% better than previously reported results for these two benchmarks.
By proposing segmented distribution and segmented auto covariance feature extraction methods to capture local and global discriminatory information from evolutionary profiles and predicted secondary structure of the proteins, we are able to enhance the protein structural class prediction performance significantly.
KeywordsProtein structural class prediction problem Structural features Evolutionary features Segmented auto covariance Segmented distribution Support Vector Machine (SVM)
Protein structural class prediction problem is defined as categorizing a given protein into one of the four structural classes namely, all-α, all-β, α + β, and α/β . Knowledge of the structural classes of proteins can also provide important information about their functionalities and overall folding types [2, 3]. Therefore, protein structural class prediction problem is considered as an important step towards the protein structure prediction problem. Despite the importance of this problem, finding a fast and accurate computational approach to solve this problem when the sequence similarity rate is low still remains an unsolved problem for bioinformatics and computational biology.
During the past two decades, a wide range of studies, using machine learning-based methods, have been conducted to solve this problem [4, 5]. These studies can be categorized into two groups. The first group consists of studies that have tried to address this problem by proposing novel classification techniques [6, 7]. They proposed a wide range of classification techniques based on different learning algorithms such as, Bayesian based learners , Meta-classifiers [9–13], Support Vector Machines (SVM) [14–17], Artificial Neural Network (ANN) [18–20], and ensemble classifiers [21–25]. Among a wide range of classification techniques used to tackle this problem, SVM classifier has attained the best results for this task [5, 22, 26, 27]. The second group consists of studies that have mainly focused on proposing novel features that capture local and global discriminatory information to address protein structural class prediction problem such as sequence based information [10, 28–30], pseudo amino acid composition [31–33], physicochemical-based information [15, 22, 28, 34–36], and structural based information [5, 33, 37–40]. The most important enhancements in protein structural class prediction accuracy have been based on relying on these techniques rather than exploring the impact of classification techniques. These recent enhancements were mainly because of extracting features from Position Specific Scoring Matrix (PSSM) profiles  as well as structural information extracted from the predicted secondary structure of proteins .
The most significant enhancement by solely relying on the PSSM for feature extraction was achieved by [16, 26, 40]. They used PSSM profiles to extract sequence order information based on the concepts of dipeptide composition, auto covariance and composition of the amino acids. They used entire protein sequence as a general entity to extract these features. Hence, the auto covariance and dipeptide composition calculated along an entire protein sequence were used as its local descriptor. Further enhancement for protein structural class prediction accuracy has been achieved by including structural information extracted from the predicted secondary structure of the proteins using PSIPRED . By adding these features to the extracted features from the PSSM, the protein structural class prediction accuracy has been significantly improved especially when the sequence similarity rate was low [27, 37, 43]. Similar to the features extracted from the PSSM, the whole protein as a general entity was used to extract these features as well. Despite all the recent efforts on extracting effective features to capture local and global discriminatory information from evolutionary and structural profiles, the protein structural class prediction accuracy have not been improved significantly since the study of Mizianty and Kurgan in 2009 [5, 6].
In this study, we propose segmented auto covariance and segmented distribution feature extraction methods to capture more local sequence order information from evolutionary and structural profiles. We also employe the concept of occurrence and composition feature groups to capture global sequence order information based on evolutionary, and structural profiles. First, by solely relying on the PSSM profiles for feature extraction, we enhance the protein structural class prediction accuracy by over 15% and 5% for 25PDB and 1189 benchmarks respectively compared to similar studies . These enhancements highlight the potential discriminatory information embedded in the PSSM that have not been adequately explored in the literature. Then, by exploring our proposed feature extraction techniques to include structural information derived from the predicted secondary structure using SPINE-X , we achieve up to 92.2% and 86.3% prediction accuracies respectively for 25PDB and 1189 benchmarks and enhance the overall protein structural class prediction accuracy even further by 7.9% and 2.8% better than previously reported results found in the literature [5, 6, 27].
To evaluate the prediction performance of our proposed approaches, we employe two benchmarks namely 25PDB and 1189. These two benchmarks have been widely used for protein structural class prediction problem. The 25PDB was introduced by  consisting of 1673 proteins with less than 25% sequence similarities in average (the homology-range between 22% and 45%). This benchmark extracted from 25% PDBSELECTED which includes high-resolution non-homologous proteins from the Protein Data Bank (PDB) . Therefore, it is considered as an appropriate representative of benchmarks consisting of proteins in twilight zone (proteins with sequence similarities between 20% and 45%) for protein structural class prediction problem. Hence, in this study, the 25PDB benchmark is used as the main source to investigate the effectiveness of our proposed model.
The properties of 1189 and 25PDB benchmarks.
α + β
Feature extraction methods
In this study, we use PSSM profiles to extract evolutionary-based information as well as predicted secondary structure using SPINE-X to extract structural-based information. PSSM is calculated by applying the PSI-BLAST  in which its cut off value (E) is set to 0.001 on our explored benchmarks (using NCBI's non redundant (NR) protein data base). Given a protein sequence, PSSM produces the substitution probability of the amino acids along its sequence based on their position with all 20 amino acids. PSSM consists of two L × 20 matrices (L is the length of a protein and the columns of the matrices represent 20 amino acids). The first matrix is called PSSM_cons and gives the log-odd of the substitution probability. The second matrix is called PSSM_prob and gives the normalized substitution probability for each amino acid .
We also use predicted secondary structure using SPINE-X which was recently proposed by  and attained better results than PSIPRED on predicting protein secondary structure (especially for the coded area). Given a protein sequence, SPINE-X produces a L × 3 matrix (which will be referred to SPINE-M for the rest of this study) including the normalized probability of contribution of a given amino acid based on its position along the protein sequence to build one of the three secondary structure elements namely, α-helix, β-strands, and coils. It also return a transformed version of the protein sequence (also extracted from the SPINE-M) in which each amino acid along the protein sequence is replaced with H (represents helix), E (represents strand), or C (represents coil) based on its tendency to incorporate in building one of these secondary structure elements. We will refer to this sequence as the structural consensus sequence. It is expected that predicted secondary structure using SPINE-X provides significant structural information for the protein structural class prediction problem similar to or even better than PSIPRED due to its better performance .
Consensus sequence-based occurrence
where P ij is the substitution probability of the amino acid at location i with the j-th amino acid in the PSSM_cons. In the second step, we replace the amino acid at i-th location of original protein sequence by the j-th amino acid to form the consensus sequence. Note that the PSSM_cons is used in this study for feature extraction (which it is normalized using min-max method) as it was used in the literature [26, 27].
After calculating evolutionary consensus sequence, we count the occurrence of each amino acid (for all 20 amino acids) along this sequence and produce corresponding feature group (AAO). Similarly, we calculate the occurrence of each secondary structure element (for all three elements) in the structural consensus sequence and produce the corresponding feature group (SSEO). Occurrence feature group as the global descriptor of the proteins is used in this study instead of composition of the amino acids (occurrence of amino acids divided by the length of protein sequence) since it maintains the length information which is disregarded in the composition feature group .
where S ij is the normalized probability of the occurrence of the j-th secondary structure element at location i of the protein sequence in the SPINE-M. It was shown that using semi-composition method is able to provide more discriminatory information compared to extracting composition of the amino acids feature group from the original protein sequence . This feature group is also able to provide important global discriminatory information about the substitution probability of the amino acids as well as normalized frequency of secondary structure elements.
This method is specifically proposed to add more local sequence order information about how the amino acids based on their substitution probability with each other (extracted from the PSSM) as well as their tendency to incorporate in one of the secondary structure elements (extracted from SPINE-M) are distributed along the protein sequence. We propose this segmentation method in the manner where segments of a protein sequence are of unequal lengths and each segment is represented by a distribution feature which is computed as follows. First, for the PSSM, to extract the segmented distribution feature group (PSSM-SD), we compute the total sum of substitution probability of the j column of the PSSM . Then, we start from the first row of the PSSM and compute the partial sum of the substitution probability of the amino acid amino acid j, for the first i amino acids which is given by . Using the distribution factor F P (which is a parameter investigated in this study), we find out the maximum value of index i such that partial sum S1 is less than or equal to the F P % of total sum (T j ). Thus we can say that the first ?6? substitution probabilities contribute to F P % of the total sum (T j ). We use ?6? to define the ending location of the first segment, while its beginning point is taken to be 1 (which represents the first row of the PSSM). The distribution feature of this segment is given by ?6?. In a similar manner, we find out the number of first amino acids of the protein sequence that contribute to 2F P %, 3F P %, ..., 50% of T j (50% of T j starting from the first row of the PSSM), respectively. Indices , are used to define the ending locations of segments 2, 3, ..., 50/F P , respectively; while the beginning location of all these segments remains to be 1. Hence, the distribution features for these segments are computed as . Note that we have thus computed 50/F P distribution features by processing the protein sequence starting from the first row of the PSSM in downward direction. We repeat this process starting from the last row of the PSSM in upwards direction to get another set of 50/F P features (to explore the rest of 50% of T j starting from the end of protein sequence corresponding to the last row of the PSSM). Thus, the total of 2× (50/F P ) = 100/F P distribution features are computed for each column of the PSSM.
Segmented auto covariance
Combining SPINE-seg and SPINE-AC, we build SPINE-SAC feature group consisting of 3 × (2K S + 2K S + K S )) features in total (4K S features in SPINE-seg and K S features in SPINE-AC).
Support Vector Machine (SVM)
where γ is the kernel parameter, x i and x j are input feature vectors. In this study, the γ in addition to the cost parameter C (which also called the soft margin parameter) of the SVM classifier are optimized using grid search algorithm implemented in the LIBSVM package. The grid search algorithm tries various pairs of γ and C values and selects the values with the best classification accuracy  (using 10-fold cross validation evaluation method). The range of gamma and C parameters to be searched in this algorithm are taken to be their default values used in the SVMLIB toolbox (these ranges were from 2-5 to 215 for C and from 2-15 to 23 for gamma). It is a simple algorithm as it has just two parameters to optimize (γ and C). Despite its simplicity, it has been shown to be an effective method to optimize these parameters .
Results and discussion
We first investigate the effectiveness of our proposed feature extraction methods to capture local and global discriminatory information from the PSSM. We compare their performances with similar studies that relied solely on the PSSM for feature extraction . In this step, we also explore the effective value for distance factor (K P ) in segmented auto covariance feature extraction method as well as segmentation factor (F P ) in segmented distribution method. To find the effective value for segmented auto covariance method, we study the K P value between 1 and 10 (similar to ). We also study the segmentation factor (F P ) in segmentation distribution between three values used in this study (25, 10 and 5). In the second step, we conduct a similar experiments using the SPINE-X for feature extraction. We investigate the effectiveness of our proposed feature extraction method to extract these features from the SPINE-M as well as the effective values for K S (between 1 and 10) and F S (among three values (25, 10, and 5) used in this study) in the similar manner. In the final step, we add the structural features extracted from the SPINE-M using our proposed methods to the extracted features from the PSSM and compare our results with the best results found in the literature for the protein structural class prediction problem [5, 6, 27].
To explore the impact of the distance factor on the segmented auto covariance method, 10-fold cross validation is adopted as it was widely used in similar studies [26, 45]. In this paper, we have used k-fold cross validation where k = 10 to measure the prediction performance. We also provide these performance results using k-fold cross validation as a function of k where k = 2, 3, 4, ..., 10 in Additional File 1. In the 10-fold cross validation, the benchmark is divided into ten non-overlapping subsets called fold. Then in each iteration, the combination of nine folds is used for training purpose and the remained fold is used for testing purpose. This process repeats for all 10 folds to be used as the testing set. We also use Jackknife cross validation to report our overall achieved prediction accuracy as well as prediction accuracy achieved for each structural class individually to compare them with previous studies. In this method, in each iteration, all but one sample use as a training purpose while the remained sample is used for testing purpose. This process repeats for all the samples available in the benchmark to be used as the testing sample. Jackknife is considered as a computationally expensive approach for evaluation. Furthermore, it was shown in  that its performance is similar to 10-fold cross validation. Since it has been widely used to evaluate protein structural class prediction accuracy, it is also adopted in this study to enable us to directly compare our results with the state of the art results found in the literature [5, 6, 26, 27]. We will use the overall prediction accuracy (in percentage) as the main accuracy measurement to be able to directly compare our achieved results with previously reported results found in the literature which is defined as follows:
More information about these three measurement for protein structural class prediction problem can be found in  and . We will report sensitivity as well as specificity and MCC measures for all four structural classes for the best results reported in this study.
Exploring the impact of our proposed methods relying only on PSSM for feature extraction
Note that we optimized γ and C for K P = 1 and F P = 25 using grid algorithms on the 1189 benchmarks (to avoid over tuning) and used corresponding values for the rest of this study (γ = 0.055 and C = 500). We determine the parameters used in this study for feature extraction as well as employed classification technique on the 1189 benchmark while the 25PDB is not used at all and reserved to investigate the generality and effectiveness of our proposed model. However, our experiments have determined that there is no significant difference between the optimized parameters for the 25PDB and 1189 benchmarks for our extracted features.
As we can see in Figure 2 and Figure 3, our extracted feature vector significantly outperforms the results reported in  for all the values used for K P (between 1 and 10). It shows the effectiveness of the proposed segmentation-based method to explore discriminatory information embedded in the PSSM compared to use of whole protein sequence as a general entity. It also shows that by using segmented auto co-variance method, even by using very low values for K P , we can achieve to high prediction accuracy since it is able to explore adequate local sequence order information (also emphasis on the impact of segmented distribution method). We report up to 89.6% prediction accuracy (using jackknife cross validation) by adjusting K P to 4 (20 + 20 + 5 × K P (= 4) × 20 + 80 = 520 features in total) which is 15.5% better than 74.1% prediction accuracy achieved by reproducing  experiment (using K P = 9 in AAC_PSSM_AC) for the 25PDB benchmark (Figure 2). Similarly, we achieve up to 79.7% prediction accuracy by adjusting K P to 4 which is 5.1% better than 74.6% prediction accuracy achieved by reproducing  experiment (using K P = 6 in AAC_PSSM_AC) for the 1189 benchmark (Figure 3). Since the best results for both 25PDB and 1189 benchmarks are achieved by setting K P to 4 (the achieved results do not differ significantly for different values used for K P (between 1 and 10) which highlights the effectiveness of segmentation technique rather than the effect of the distance factor (K P ) to extract this feature group), it is adopted as a distance factor to extract features for segmented auto covariance from the PSSM for the rest of this study.
We also repeat this experiment to explore the impact of segmentation factor F P in segmented distribution feature extraction method. The prediction accuracies achieve by adjusting the segmentation factor to 10 and 5 are not improved (which even by increasing K P , they are reduced) compared to the achieved results by adjusting this parameter to 25. It highlights the sufficiency and effectiveness of adopting F P = 25 as the segmentation factor compare to use of 10 and 5. In other word, using four segments is able to effectively provide adequate discriminatory information for this task better than increasing the number of segments to 10 or 20.
The impact of the proposed feature extraction groups (using PSSM for feature extraction) proposed in this study to enhance protein structural class prediction accuracy (in %).
Combination of features
PSSM-AAC + PSSM-SAC
PSSM-AAC + PSSM-SAC + PSSM-SD
PSSM-AAC + PSSM-SAC + PSSM-SD + AAO
Exploring the impact of our proposed methods relying only on SPINE-X for feature extraction
In this step, we investigate the impact of our proposed feature extraction method on the SPINE-X for feature extraction. We build a feature vector based on our proposed methods in this study relying solely on the SPINE-M for feature extraction. We extract SSEO (occurrence of the secondary structure elements from predicted secondary structure using SPINE-M (3 features)), SPINE-SSEC (semi-composition from SPINE-M (3 features)), SPINE-SAC (segmented auto covariance were K S adjust to 1 to 10 in 10 different experiments (K S × 5 × 3 features)), and SPINE-SD (segmented distribution where segmentation factor adjusts to 25 (4 × 3 = 12 features)) feature groups. The combination of these feature groups is referred as SPINE-S (SSEO + SPINE-SSEC + SPINE-SD + SPINE-SAC = SPINE-S). The protein structural class prediction results are obtained in this subsection using the Jack-knife cross validation method.
The impact of the proposed feature extraction groups (using SPINE-M for feature extraction)proposed in this study to enhance protein structural class prediction accuracy (in %).
Combination of features
SPINE-AAC + SPINE-SAC
SPINE-AAC + SPINE-SAC + SPINE-SD
SPINE-AAC + SPINE-SAC + SPINE-SD + SSEO
Exploring the impact of our proposed method using both PSSM and SPINE-X for feature extraction
Comparison of the results reported for the 25PDB benchmark (in percentage %)
Comparison of the results reported for the 1189 benchmark (in percentage %)
Adding structural features to evolutionary features extracted in our experiments enhances the results for up to 2.4% and 6.6% better than relying solely on evolutionary features for the 25PDB and 1189 benchmarks respectively. This emphasis on the impact of structural information extracted from the SPINE-X in general for the protein structural class prediction problem.
The specificity (in percentage) and MCC measurements for the best results: (a) for the 25PDB benchmark; (b) for the 1189 benchmark
In this study we proposed novel segmented distribution and segmented auto covariance feature extraction methods to capture more local and global discriminatory information from evolutionary profile and predicted secondary structure of proteins. We first extract the corresponding features from the PSSM in addition to the occurrence of the amino acids extracted from evolutionary consensus sequence and semi-composition extracted from the PSSM. Then by applying SVM to the extracted features, we enhanced the protein structural class prediction accuracy for low-homology protein sequences (twilight zone) up to 15.5% for the 25PDB benchmark and 5.1% for the 1189 benchmark better than similar studies that relied solely on the PSSM for feature extraction . Our results supported the idea that potential sequence order information embedded in the PSSM has not been adequately explored in the literature.
In continuation, we added similar features extracted from the predicted secondary structure using the SPINE-X (segmented distribution, segmented auto covariance of the normalized probability of secondary structure elements, occurrence of secondary structure elements extracted from the structural consensus sequence, and semi-composition of the secondary structure elements extracted from the SPINE-M) to previously extracted features from the PSSM. By incorporating structural information, we achieved up to 92.2% and 86.3% for the 25PDB and the 1189 benchmarks which were respectively up to 7.9% and 2.8% better than previously reported results found in the literature for these two benchmarks that have been widely used for the protein structural class prediction problem [5, 6, 27].
We are currently investigating the effectiveness of our proposed techniques in this study to tackle protein fold recognition. We are aiming to develop our protein structural class, and fold prediction server which will be publicly available in the near future. We also aim at exploring the-state-of-the-art feature reduction techniques on our extracted features to investigate the possibility of further feature reduction for these tasks.
Publication of this article funded by Griffith University and National ICT Australia (NICTA).
NICTA is funded by the Australian Government through the Department of Communications and the Australian Research Council through the ICT Centre of Excellence Program.
This article has been published as part of BMC Genomics Volume 15 Supplement 1, 2014: Selected articles from the Twelfth Asia Pacific Bioinformatics Conference (APBC 2014): Genomics. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcgenomics/supplements/15/S1.
- Chothia C: The nature of the accessible and buried surfaces in proteins. Journal of Molecular Biology. 1976, 105 (1): 1-12. 10.1016/0022-2836(76)90191-1.PubMedView Article
- Chou KC: Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics. 2005, 21 (1): 10-19. 10.1093/bioinformatics/bth466.PubMedView Article
- Chou KC, Zhang CT: Prediction of protein structural classes. Critical Reviews in Biochemistry and Molecular Biology. 1995, 30: 275-349. 10.3109/10409239509083488.PubMedView Article
- Chou KC: Some remarks on protein attribute prediction and pseudo amino acid composition. Journal of Theoretical Biology. 2011, 273 (1): 236-247. 10.1016/j.jtbi.2010.12.024.PubMedView Article
- Zhang S, Ding S, Wang T: High-accuracy prediction of protein structural class for low-similarity sequences based on predicted secondary structure. Biochimie. 2011, 93 (4): 710-714. 10.1016/j.biochi.2011.01.001.PubMedView Article
- Ding S, Zhang S, Li Y, Wang T: A novel protein structural classes prediction method based on predicted secondary structure. Biochimie. 2012, 94 (5): 1166-1171. 10.1016/j.biochi.2012.01.022.PubMedView Article
- Li ZC, Zhou XB, Dai Z, Zou XY: Prediction of protein structural classes by chou's pseudo amino acid composition: approached using continuous wavelet transform and principal component analysis. Amino Acids. 2009, 37: 415-425. 10.1007/s00726-008-0170-2.PubMedView Article
- Wang ZX, Yuan Z: How good is prediction of protein structural class by the component-coupled method?. Proteins: Structure, Function, and Bioinformatics. 2000, 38 (2): 165-175. 10.1002/(SICI)1097-0134(20000201)38:2<165::AID-PROT5>3.0.CO;2-V.View Article
- Cai YD, Feng K, Lu W, Chou K: Using logitboost classifier to predict protein structural classes. Theoretical Biollogy. 2006, 238: 172-176.View Article
- Feng KY, Cai YD, Chou KC: Boosting classifier for predicting protein domain structural class. Biochemical and Biophysical Research Communications. 2005, 334 (1): 213-217. 10.1016/j.bbrc.2005.06.075.PubMedView Article
- Niu B, Cai YD, Lu WC, Li GZ, Chou KC: Predicting protein structural class with adaboost learner. Protein and Peptide Letters. 2006, 13 (5): 489-492. 10.2174/092986606776819619.PubMedView Article
- Dehzangi A, Karamizadeh S: Solving protein fold prediction problem using fusion of heterogeneous classifiers. INFORMATION, An International Interdisciplinary Journal. 2011, 14 (11): 3611-3622.
- Dehzangi A, Phon-Amnuaisuk S, Manafi M, Safa S: Using rotation forest for protein fold prediction problem: An empirical study. Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics. 2010, 217-227.View Article
- Anand A, Pugalenthi G, Suganthan PN: Predicting protein structural class by svm with class-wise optimized features and decision probabilities. Journal of Theoretical Biology. 2008, 253 (2): 375-380. 10.1016/j.jtbi.2008.02.031.PubMedView Article
- Li ZC, Zhou XB, Lin YR, Zou XY: Prediction of protein structure class by coupling improved genetic algorithm and support vector machine. Amino Acids. 2008, 35 (3): 581-590. 10.1007/s00726-008-0084-z.PubMedView Article
- Liu T, Zheng X, Wang J: Prediction of protein structural class for low-similarity sequences using support vector machine and psi-blast profile. Biochimie. 2010, 92 (10): 1330-1334. 10.1016/j.biochi.2010.06.013.PubMedView Article
- Dehzangi A, Sattar A: Protein fold recognition using segmentation-based feature extraction model. Proceedings of the 5th Asian Conference on Intelligent Information and Database Systems. 2013, ACIIDS05 Springer ???, 345-354.View Article
- Cai YD, Zhou GP: Prediction of protein structural classes by neural network. Biochimie. 2000, 82 (8): 783-785. 10.1016/S0300-9084(00)01161-5.PubMedView Article
- Jahandideh S, Abdolmaleki P, Jahandideh M, Asadabadi EB: Novel two-stage hybrid neural discriminant model for predicting proteins structural classes. Biophysical Chemistry. 2007, 128 (1): 87-93. 10.1016/j.bpc.2007.03.006.PubMedView Article
- Jahandideh S, Abdolmaleki P, Jahandideh M, Hayatshahi SHS: Novel hybrid method for the evaluation of parameters contributing in determination of protein structural classes. Journal of Theoretical Biology. 2007, 244 (2): 275-281. 10.1016/j.jtbi.2006.08.011.PubMedView Article
- Chen K, Kurgan LA, Ruan J: Prediction of protein structural class using novel evolutionary collocation-based sequence representation. Journal of Computational Chemistry. 2008, 29 (10): 1596-1604. 10.1002/jcc.20918.PubMedView Article
- Dehzangi A, Paliwal KK, Sharma A, Dehzangi O, Sattar A: A combination of feature extraction methods with an ensemble of different classifiers for protein structural class prediction problem. IEEE Transaction on Computational Biology and Bioinformatics (TCBB). 2013
- Kedarisetti KD, Kurgan LA, Dick S: Classifier ensembles for protein structural class prediction with varying homology. Biochemical and Biophysical Research Communications. 2006, 348 (3): 981-988. 10.1016/j.bbrc.2006.07.141.PubMedView Article
- Yang JY, Peng ZL, Chen X: Prediction of protein structural classes for low-homology sequences based on predicted secondary structure. BMC Bioinformatics. 2010, 11 (Suppl 1): 9-10.1186/1471-2105-11-S1-S9.View Article
- Dehzangi A, Phon-Amnuaisuk S, Dehzangi O: Enhancing protein fold prediction accuracy by using ensemble of different classifiers. Australian Journal of Intelligent Information Processing Systems. 2010, 26 (4): 32-40.
- Liu T, Geng X, Zheng X, Li R, Wang J: Accurate prediction of protein structural class using auto covariance transformation of psi-blast profiles. Amino Acids. 2012, 42: 2243-2249. 10.1007/s00726-011-0964-5.PubMedView Article
- Mizianty M, Kurgan LA: Modular prediction of protein structural classes from sequences of twilight-zone identity with predicting sequences. BMC Bioinformatics. 2009, 10 (1): 414-10.1186/1471-2105-10-414.PubMedPubMed CentralView Article
- Cai YD, Liu XJ, Xu XB, Zhou GP: Support vector machines for predicting protein structural class. BMC Bioinformatics. 2001, 2 (1): 3-10.1186/1471-2105-2-3.PubMedPubMed CentralView Article
- Deschavanne P, Tuffery P: Exploring an alignment free approach for protein classification and structural class prediction. Biochimie. 2008, 90 (4): 615-625. 10.1016/j.biochi.2007.11.004.PubMedView Article
- Zhou GP: An intriguing controversy over protein structural class prediction. Journal of Protein Chemistry. 1998, 17: 729-738. 10.1023/A:1020713915365.PubMedView Article
- Chou KC: Prediction of protein structural classes and subcellular locations. Current Protein and Peptide Science. 2000, 1: 171-208. 10.2174/1389203003381379.PubMedView Article
- Ding YS, Zhang TL, Chou KC: Prediction of protein structure classes with pseudo amino acid composition and fuzzy support vector machine network. Protein and Peptide Letters. 2007, 14 (8): 811-815. 10.2174/092986607781483778.PubMedView Article
- Kurgan LA, Chen K: Prediction of protein structural class for the twilight zone sequences. Biochemical and Biophysical Research Communications. 2007, 357 (2): 453-460. 10.1016/j.bbrc.2007.03.164.PubMedView Article
- Cao YF, Liu S, Zhang L, Qin J, Wang J, Tang K: Prediction of protein structural class with rough sets. BMC Bioinformatics. 2006, 7 (1): 20-10.1186/1471-2105-7-20.PubMedPubMed CentralView Article
- Sharma A, Paliwal KK, Dehzangi A, Lyons J, Imoto S, Miyano S: A strategy to select suitable physicochemical attributes of amino acids for protein fold recognition. BMC Bioinformatics. 2013, 14 (233): 11-
- Dehzangi A, Phon-Amnuaisuk S: Fold prediction problem: The application of new physical and physicochemical-based features. Protein and Peptide Letters. 2011, 18 (2): 174-185. 10.2174/092986611794475101.PubMedView Article
- Kurgan LA, Zhang T, Zhang H, Shen S, Ruan J: Secondary structure-based assignment of the protein structural classes. Amino Acids. 2008, 35: 551-564. 10.1007/s00726-008-0080-3.PubMedView Article
- Yang JY, Peng ZL, Yu ZG, Zhang RJ, Anh V, Wang D: Prediction of protein structural classes by recurrence quantification analysis based on chaos game representation. Journal of Theoretical Biology. 2009, 257 (4): 618-626. 10.1016/j.jtbi.2008.12.027.PubMedView Article
- Dehzangi A, Paliwal KK, Lyons J, Sharma A, Sattar A: Enhancing protein fold prediction accuracy using evolutionary and structural features. Proceeding of the Eighth IAPR International Conference on Pattern Recognition in Bioinformatics. PRIB. 2013, 196-207.
- Dehzangi A, Paliwal KK, Lyons J, Sharma A, Sattar A: Exploring potential discriminatory information embedded in pssm to enhance protein structural class prediction accuracy. Proceeding of the Eighth IAPR International Conference on Pattern Recognition in Bioinformatics. PRIB. 2013, 208-219.
- Altschul SF, Madden TL, Schaffer AA, Zhang JH, Zhang Z, Miller W, Lipman DJ: Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Research. 1997, 17: 3389-3402.View Article
- Jones DT: Protein secondary structure prediction based on position-specific scoring matrices. Journal of Molecular Biology. 1999, 292 (2): 195-202. 10.1006/jmbi.1999.3091.PubMedView Article
- Shen HB, Song JN, Chou KC: Prediction of protein folding rates from primary sequence by fusing multiple sequential features. Biomedical Science and Engineering. 2009, 2: 136-143. 10.4236/jbise.2009.23024.View Article
- Faraggi E, Zhang T, Yang Y, Kurgan L, Zhou Y: Spine x: Improving protein secondary structure prediction by multistep learning coupled with prediction of solvent accessible surface area and backbone torsion angles. Journal of Computational Chemistry. 2012, 33 (3): 259-267. 10.1002/jcc.21968.PubMedPubMed CentralView Article
- Kurgan LA, Homaeian L: Prediction of structural classes for protein sequences and domains - impact of prediction algorithms, sequence representation and homology, and test procedures on accuracy. Pattern Recognition. 2006, 39: 2323-2343. 10.1016/j.patcog.2006.02.014.View Article
- Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The protein data bank. Nucleic Acids Research. 2000, 28 (1): 235-242. 10.1093/nar/28.1.235.PubMedPubMed CentralView Article
- Murzin AG, Brenner SE, Hubbard T, Chothia C: Scop: A structural classification of proteins database for the investigation of sequences and structures. Journal of Molecular Biology. 1995, 247 (4): 536-540.PubMed
- Liu T, Jia C: A high-accuracy protein structural class prediction algorithm using predicted secondary structural information. Journal of Theoretical Biology. 2010, 267 (3): 272-275. 10.1016/j.jtbi.2010.09.007.PubMedView Article
- Sharma A, Lyons J, Dehzangi A, Paliwal KK: A feature extraction technique using bi-gram probabilities of position specific scoring matrix for protein fold recognition. Journal of Theoretical Biology. 2013, 320 (0): 41-46.PubMedView Article
- Chou KC: Progress in protein structural class prediction and its impact to bioinformatics and proteomics. Current Protein and Peptide Science. 2005, 6: 423-436. 10.2174/138920305774329368.PubMedView Article
- Vapnik VN: The Nature of Statistical Learning Theory. 1995, Springer, ???View Article
- Chang CC, Lin CJ: Libsvm: a library for support vector machines. 2001
- Costantini S, Facchiano AM: Prediction of the protein structural class by specific peptide frequencies. Biochimie. 2009, 91 (2): 226-229. 10.1016/j.biochi.2008.09.005.PubMedView Article
- Zhang S, Ye F, Yuan X: Using principal component analysis and support vector machine to predict protein structural class for low-similarity sequences via pssm. Journal of Biomolecular Structure and Dynamics. 2012, 29 (6): 1138-1146. 10.1080/07391102.2011.672627.View Article
- Kurgan LA, Cios KJ, Chen K: Scpred: Accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences. BMC Bioinformatics. 2008, 9: 226-10.1186/1471-2105-9-226.PubMedPubMed CentralView Article
- Zhang TL, Ding YS, Chou KC: Prediction protein structural classes with pseudo amino acid composition: approximate entropy and hydrophobicity pattern. Theoretical Biology. 2008, 250: 186-193. 10.1016/j.jtbi.2007.09.014.View Article
- Qiu JD, Luo SH, Huang JH, Liang RP: Using support vector machines for prediction of protein structural classes based on discrete wavelet transform. Journal of Computational Chemistry. 2009, 30 (8): 1344-1350. 10.1002/jcc.21115.PubMedView Article
- Chen C, Zhou X, Tian Y, Zou X, Cai P: Predicting protein structural class with pseudo-amino acid composition and support vector machine fusion network. Analytical Biochemistry. 2006, 357 (1): 116-121. 10.1016/j.ab.2006.07.022.PubMedView Article
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.