Mechanistic insights into mode of action of novel natural cathepsin L inhibitors
© Tyagi et al.; licensee BioMed Central Ltd. 2013
Published: 9 December 2013
Skip to main content
Volume 14 Supplement 8
© Tyagi et al.; licensee BioMed Central Ltd. 2013
Published: 9 December 2013
Development of a cancerous cell takes place when it ceases to respond to growth-inhibiting signals and multiplies uncontrollably and can detach and move to other parts of the body; the process called as metastasis. A particular set of cysteine proteases are very active during cancer metastasis, Cathepsins being one of them. They are involved in tumor growth and malignancy and have also been reported to be overexpressed in tumor cell lines. In the present study, a combinatorial approach comprising three-dimensional quantitative structure-activity relationship (3D QSAR), ligand-based pharmacophore modelling and search followed by cathepsin L structure-based high throughput screening was carried out using an initial set of 28 congeneric thiosemicarbazone derivatives as cathepsin L inhibitors. A 3D QSAR was derived using the alignment of a common thiosemicarbazone substructure. Essential structural features responsible for biological activity were taken into account for development of a pharmacophore model based on 29 congeneric thiosemicarbazone derivatives. This model was used to carry out an exhaustive search on a large dataset of natural compounds. A further cathepsin L structure-based screen identified two top scoring compounds as potent anti-cancer leads.
The generated 3D QSAR model showed statistically significant results with an r2 value of 0.8267, cross-validated correlation coefficient q2 of 0.7232, and a pred_r2 (r2 value for test set) of 0.7460. Apart from these, a high F test value of 30.2078 suggested low probability of the model's failure. The pharmacophoric hypothesis chosen for searching the natural compound libraries was identified as DDHRR, where two Ds denote 2 hydrogen donors, H represents a hydrophobic group and two Rs represent aromatic rings, all of which are essential for the biological activity. We report two potential drug leads ZINC08764437 (NFP) and ZINC03846634 (APQ) obtained after a combined approach of pharmacophore-based search and structure-based virtual screen. These two compounds displayed extra precision docking scores of -7.972908 and -7.575686 respectively suggesting considerable binding affinity for cathepsin L. High activity values of 5.72 and 5.75 predicted using the 3D QSAR model further substantiated the inhibitory potential of these identified leads.
The present study attempts to correlate the structural features of thiosemicarbazone group with their biological activity by development of a robust 3D QSAR model. Being statistically valid, this model provides near accurate values of the activities predicted for the congeneric set on which it is based. These predicted activities are good for the test set compounds making it indeed a statistically sound 3D QSAR model. The identified pharmacophore model DDHRR.8 comprised of all the essential features required to interact with the catalytic triad of cathepsin L. A search for natural compounds based on this pharmacophore followed by docking studies further screened out two top scoring candidates: NFP and AFQ. The high binding affinity and presence of essential structural features in these two compounds make them ideal for consideration as natural anti-tumoral agents. Activity prediction using 3D QSAR model further validated their potential as worthy drug candidates against cathepsin L for treatment of cancer.
Cancer is a condition characterized by unregulated growth and division of cells that have become abnormal and can invade adjoining parts of the body. Cancerous cells arise as a consequence of mutations in the critical genes. According to the world cancer report, an estimated number of 7.6 million fatalities were recorded in 2008 and 12.7 million new cases were diagnosed. This number is expected to rise to 21 million by 2030 . A series of proteolytic enzymes are a pre-requisite for the tumor cells to undergo metastasis in which tumor cells travel to distant organs and form new tumors [2–6]. Cysteine proteases are a group of such proteolytic enzymes that are characterized by a cysteine residue in their active site region [7–13]. Cathepsins are a subfamily of 11 human lysosomal cysteine proteases included in the papain family . Most of them have been found to be involved in tumor growth and malignancy. Cathepsin L is a globular endopeptidase which plays an important role in vital physiological processes and is reported to be overexpressed in various human tumors [15–17]. Knowledge of this family of proteases and their inhibitors can prove to be a major breakthrough in cancer management and thus is the subject of interest for the present study . Various inhibitors have been characterized and studied extensively against cathepsins, for e.g. nitriles , azepanone analogues  and disulfides  among others. In the present study we focus on the thiosemicarbazone moiety that has been utilized previously in the development of anticancer agents by inhibition of cathepsin L.
Thiosemicarbazones incorporate an important class of N, S-donor ligands , and are basically schiff bases obtained by condensation of thiosemicarbazides with an aldehyde or ketone . They first appeared in the 50's as drugs against tuberculosis and leprosy [24, 25]. Later, their antiviral properties were reported which led to a huge research in this area resulting in commercialization of methisazone also named as Marboran, to treat smallpox . Benzophenone thiosemicarbazone derivatives have earlier been reported as potential therapeutics against malaria, sleeping sickness and chagas' disease [27–30]. Recently, antitumor activity of KGP94, a functionalized benzophenone thiosemicarbazone derivative, was evaluated for breast cancer against cathepsin L . Triapine (3-aminopyridine-2-carboxaldehyde thiosemicarbazone) has already been evaluated as ribonucleotide reductase inhibitor for anticancer therapy . Apart from these, various other derivatives of thiosemicarbazones such as thiophene, pyridine and fluorene have also been tested as inhibitors of cathepsin L and their IC50 values have been reported [33, 34].
A fast and accurate approach to search for novel therapeutics against various cancers is the need of the hour. In silico methods involving ligand based drug design are viable approaches to speed up the drug discovery process. 3D QSAR has emerged as a robust technique in rational drug design to predict the biological activities of the prospective inhibitors using the knowledge of three-dimensional properties of the ligands through a chemometric approach. It develops statistically significant models to guide synthesis of novel inhibitors on the assumption that the extent of receptor binding directly relates to its biological activity [35, 36]. In 3D QSAR, molecular structures are represented by a set of numbers called as descriptors. For QSAR model development, the receptor binding site is considered to be rigid and the ligand molecules should belong to a congeneric series . From a pool of molecular descriptors, optimal variables are chosen using a stochastic method. Molecular fields, which are basically steric and electrostatic interaction energies, are calculated and a molecular field analysis model is predicted . The model thus generated is evaluated for its robustness by determining its capacity to predict the activity of compounds not belonging to the training set. This validation is done based on the calculation of statistical parameters. On the other hand, a pharmacophore is a molecular framework that carries the essential features responsible for a drug's biological response . Features like aromatic rings, hydrogen donors and acceptors, hydrophobes and positively and negatively ionisable chemical groups are marked and the resulting pharmacophoric hypothesis is scored for its validity. Natural compounds in good alignment with such a hypothesis can be taken as potent drug leads.
In this study, a congeneric dataset comprising of 28 thiosemicarbazone derivatives was first chosen to build a 3D QSAR model that evaluates the activity of the ligands against cathepsin L. And we also find out the molecular features essential for their activity using the pharmacophore model. Despite the continuous efforts in the direction of finding novel cathepsin L inhibitors, there are no clinical agents available in human clinical trials yet . This study establishes the use of thiosemicarbazone derivatives by contributing towards understanding its essential characteristics as potent anti-cancer candidate and thus paves way for an accelerated evaluation of novel thiosemicarbazone-based lead candidates using the predicted QSAR model.
In this study, a congeneric series of thiosemicarbazone derivatives with inhibitory properties against human cathepsin L were selected for 3D-QSAR model development [33, 34]. The 2D structures of the template molecule and 61 derivatives were drawn using Chemsketch  which were then aligned with the most active molecule (reference molecule). A total of 28 molecules were selected on alignment with the thiosemicarbazone template based on lower RMSD values, which indicate optimal alignment. These 2D structures were converted to 3D using Vlife Engine platform of VLifeMDS  and later energy minimized using the force field batch minimization utility with default parameters. These optimized compounds were finally used for 3D-QSAR model development.
The 28 aligned compounds along with their pIC50 values were given as input for force field calculation. For 3D QSAR, a force field was computed keeping default grid dimensions and including steric, electrostatic and hydrophobic descriptors while keeping dielectric constant at the default value (1.0). The charge type chosen for computation was Gasteiger-Marsili. The values calculated for the descriptors along with their grid points were arrayed upon the worksheet and the invariable columns were removed using QSAR tools.
Using advanced data selection wizard, the column containing the activity values (pIC50) of the compounds was selected as the dependent variable and the rest as independent variables. After manual selection of the test set, the unicolumn statistics of both the test and the training sets were calculated. This analysis provided validation of the chosen training and test sets. A critical step in QSAR model development is the selection of optimal variables from the available set of descriptors which set out a statistically significant correlation of the structure of compounds with their biological activity. Using the variable selection and model building wizard, the model was built by stepwise-forward method . All the values were kept default except the number of descriptors in the final equation which was changed to 4 and variance cut-off as 0.1. This variable selection method can be combined with a number of different regression analysis techniques like partial least squares (PLS) , partial component regression , k nearest neighbour  among others by selecting the appropriate combination. In the present study, we report a 3D QSAR model built using PLS.
Many statistical parameters like n (number of compounds in regression), k (number of variables), degree of freedom, optimum component (number of optimum PLS components in the model), r2 (squared correlation coefficient), F-test (Fischer's value), q2 (cross-validated correlation coefficient), pred_r2 (r2 for external test set), Z score (randomisation test), best_ran_q2 (highest q2 value in the randomisation test) and best_ran_r2 (highest r2 value in the randomisation test) need to be taken into account to consider the model as a robust one. For a model to be statistically significant, the following conditions should be considered: r2, q2 > 0.6 and pred_r2 > 0.5 [1, 2]. Since, F-test gives an idea of the chances of failure of the model, a value greater than 30 is considered to be good. On the other hand, low standard error values establish absolute quality of the model.
where and are the actual and predicted activities of the th molecule (i = 1-24 except 9: refer Additional file 1) in the training set, respectively, and is the average activity of all the molecules in the training set.
where and are the actual and predicted activities of the th molecule (i = 25-29: refer Additional file 1) in the test set, respectively, and is the average activity of all the molecules in the training set.
where is the q2 value calculated for the actual data set, is the average q2 and is the standard deviation calculated for various models built on different random data sets.
Using the same set of compounds as taken for the 3D QSAR model development, we embarked upon a search for similar anti-cancer natural compounds. The essential features responsible for a molecule's biological activity are represented through a pharmacophoric hypothesis, which is then used for a rigorous search for compounds constituting the same features. The pharmacophore model was created using the Phase module of Schrodinger . It is a 5-step procedure which is carried out by selecting the 3D optimized molecules, prepared using Ligprep and manually entering their activity values (pIC50). A number of hypotheses were generated along with their respective set of aligned conformations. Using Phase, an exhaustive search was done for a lead molecule based on the pharmacophore after selecting the best hypothesis amongst them.
The compounds screened after pharmacophore-based search were further evaluated for their inhibitory potency against Cathepsin L by using Schrodinger's Glide docking platform [49, 50]. It works by creating a cubic grid (10 side) around the user-specified critical residues and directing the approaching ligand at the specific site. An extra precision (XP) docking was carried out to screen 7409 compounds obtained after pharmacophore based screening, of which those lying above the specified threshold were chosen. XP docking serves the purpose of correlating good poses with good scores and discarding the false positives.
A 3D-QSAR model development works to find a statistical correlation between the structures and activity of chemical compounds by calculating 3D molecular descriptors which include steric, electrostatic and hydrophobic points marked on the 3D spatial grid. After selecting the Gasteiger-Marsili charges for computing the force field grid, the invariable columns were removed which reduced the descriptor number from 2971 to 2944. pIC50 was selected as the dependent variable while the calculated 3D descriptors were chosen as independent variables. The test set constituting the compounds A3, A5, A9, A19 and A34 (Additional file 1) was selected manually after which the unicolumn statistics were calculated for both the training and test set compounds.
Unicolumn statistics for training and test set for Cathepsin L inhibiting compounds
The statistical parameters calculated for developed 3D-QSAR model
Best Rand r2
Best Rand q2
Alpha Rand r2
Alpha Rand q2
Z Score Pred r2
best Rand Pred r2
alpha Rand Pred r2
Statistical values of all the pharmacophore hypotheses generated for virtual screening
where W's are the weights and S's are the scores
We selected a common pharmacophore hypothesis comprising of common chemical features of the aligned active molecules from the congeneric set. The final hypothesis, DDHRR.8 was chosen based on high selectivity as well as the survival score which yields the best alignment of the active set ligands. Along with the site score (0.318008), vector score (0.908012) and volume score (0.581835) DDHRR.8 was the best choice for searching a compound library.
Top scoring compounds screened using the selected pharmacophore hypothesis
Predicted activity (using 3D QSAR model)
We used a combined approach to screen potent cathepsin L inhibitors that promised to emerge as important leads in cancer research owing to the role that cathepsin L plays during tumor development and metastasis. A congeneric set belonging to the thiosemicarbazone class of molecules which are known to inhibit human cathepsin L was chosen to build a 3D-QSAR model and a pharmacophore model. The former related the structure of the molecule with its activity quantitatively while validating the relationship using statistical parameters whereas the later pointed out the minimal structural features critical for a molecule for its biological activity and also provided an insight into the mode of binding with the target. Using these two approaches of ligand based drug designing we screened a chemical library based on the pharamacophoric hypothesis and then predicted their activity using the 3D QSAR model. The compounds obtained after pharmacophore-based search were docked at the active site (catalytic triad) of cathepsin L to further substantiate its role as a cathepsin L inhibitor. The two top scoring compounds NFP and APQ show good binding affinity with cathepsin L. This study presents a comprehensive view of the correlation between the structure and activity of these molecules along with their mode of binding with the target protein. This study progresses the use of thiosemicarbazone moiety as anti-tumoral and suggests further investigation into the role of human cathepsin L in the propagation of metastasis. Results of this study will also guide the design of potent anti-tumorals based on cathepsin L inhibition to further strengthen already available drug batch against cancer.
AG is thankful to Jawaharlal Nehru University for usage of all computational facilities. AG thanks support from Department of Science and Technology, Government of India.
AG is also thankful to University Grants Commission, India for the Faculty Recharge Position.
AG would like to acknowledge financial support from Department of Science and Technology, Government of India towards publication of this article.
This article has been published as part of BMC Genomics Volume 14 Supplement 8, 2013: Selected articles from the International Conference on Intelligent Biology and Medicine (ICIBM 2013): Genomics. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcgenomics/supplements/14/S8.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.