Structural insights into mode of actions of novel natural Mycobacterium protein tyrosine phosphatase B inhibitors
© Dhanjal et al.; licensee BioMed Central Ltd. 2014
Published: 24 January 2014
Skip to main content
© Dhanjal et al.; licensee BioMed Central Ltd. 2014
Published: 24 January 2014
Tuberculosis has become a major health problem being the second leading cause of death worldwide. Mycobacterium tuberculosis secretes a virulence factor, protein tyrosine phosphatase B (mPTPB) in the cytoplasm of host macrophage which suppresses its natural innate immune response and helps the pathogen survive and proliferate in the phagosome. The present study aims at indentifying potent inhibitors of mPTPB by using computational approaches of ligand based molecular modeling and docking studies.
A 3D QSAR model was developed using a set of benzofuran salicylic acid based mPTPB inhibitors with experimentally known IC50 values. The model was generated using the statistical method of principle component regression analysis in combination with step wise forward variable selection algorithm. It was observed that steric and hydrophobic descriptors positively contribute towards the inhibitory activity of the ligands. The developed model had a robust internal as well as external predictive power as indicated by the q2 value of 0.8920 and predicted r2 value of 0.8006 respectively. Hence, the generated model was used to screen a large set of naturally occurring chemical compounds and predict their biological activity to identify more potent natural compounds targeting mPTPB. The two top potential hits (with pIC50 value of 1.459 and 1.677 respectively) had a similar interaction pattern as that of the most potent compound (pIC50 = 1.42) of the congeneric series.
The contour plot provided a better understanding of the relationship between structural features of substituted benzofuran salicylic acid derivatives and their activities which would facilitate design of novel mPTPB inhibitors. The QSAR modeling was used to obtain an equation, correlating the important steric and hydrophobic descriptors with the pIC50 value. Thus, we report two natural compounds of inhibitory nature active against mPTPB enzyme of Mycobacterium tuberculosis. These inhibitors have the potential to evolve as lead molecules in the development of drugs for the treatment of tuberculosis.
Tuberculosis (TB) is an infectious disease caused by Mycobacterium tuberculosis (Mtb). It has become a major health problem being the second leading cause of death worldwide, after human immunodeficiency virus (HIV). According to the Global Tuberculosis Report, 2012 by World Health Organization, 8.7 million new cases of TB, 13% of which were co-infected with HIV and 1.4 million deaths from TB were estimated in 2011. TB is most prevalent in Asia and Africa with India and China alone accounting for about 40% of the global cases .
Mtb survives as an intracellular pathogen and replicates in the macrophages of its host organism. It disrupts the normal biochemical pathway of the phagosomes involved in defense against intracellular pathogens by phosphorylation or dephosphorylation of the host's proteins. A variety of cellular functions like proliferation, migration, apoptosis, immune response etc. require post translational modification of proteins by the process of tyrosine phosphorylation. In normal physiological conditions a balance is maintained between the activity of protein tyrosine kinases (PTKs) and protein tyrosine phosphatases (PTPs). Impairment of this controlled regulation may lead to anomalous tyrosine phosphorylation, which is believed to be responsible for many human diseases like cancer, diabetes and auto immune disorders among others. Thus, PTPs and PTKs are important targets for many diseases with high therapeutic value [2–5]. Mtb secretes a virulence factor, protein tyrosine phosphatase B (mPTPB) in the cytoplasm of host macrophage which suppresses the natural innate immune response of the phagosome against the TB infection by blocking the ERK1/2 and p38 mediated IL-6 B production and preventing host cell apoptosis by activating the Akt pathway [6, 7]. This prevents the phagosome from maturating into a phagolysosome for the destruction of invaded pathogen. To investigate the role of PTPB in pathogenesis of Mtb, a mutant strain of PTPB was created and the ability of the parent and the mutant strain to survive in the host macrophages was compared. In this experiment, it was found that the disruption of mPTPB gene resulted in 70-fold reduction in the bacterial load in the spleen of guinea pigs. Complementary strain, obtained after reintroducing the gene into the mutant strain, regained the ability to infect the guinea pigs at rates comparable to the parent strain . Beresford et al. also studied the growth of mycobacteria in resting macrophages in order to mimic the infection in a susceptible host (where IFNγ activation may be impaired). Their study showed that in the absence of inhibitors of mPTPB, intracellular growth of mycobacterium increased. However when treated with a potent inhibitor, the intracellular mycobacterial growth decreased substantially . All these studies suggest that mPTPB is a potential target against which inhibitors can be designed to develop new and effective anti-tuberculosis agents.
Today many drugs are available for clinical use to treat TB, but the current treatment lasts for six to nine months. During the course of treatment, the pathogen develops resistance against these drugs which results in Multi-Drug resistant Tuberculosis (MDR TB) and eventually lead to untreatable extensively drug resistant Tuberculosis (XDR TB) . To overcome the problem of growing drug resistance, identification of new targets which are essential for survival and replication of the pathogen has become an urgent need. For the purpose of finding drugs against novel targets we require fast and reliable computational techniques for cost-effective evaluation of large virtual databases of chemical compounds in order to identify a limited set of candidates which can be synthesized and examined experimentally for their biological activity. Quantitative structure activity relationship (QSAR) is a powerful approach being used to establish a correlation between the physiochemical properties of the chemical compounds and their biological activity to obtain a reliable statistical model. This model serves as a valuable tool for the design of new chemical entities and to predict their activity. The QSAR model so developed facilitates identification of promising lead candidates, thus decreasing the number of compounds required to be synthesized and tested in vitro .
Zhou B et al., reported a benzofuran salicylic acid-based mPTPB inhibitor (I-A09) which showed modest potency and selectivity . But the inhibitor was not effective for therapeutic clinical use. The chemistry-oriented approach was used to modify the core structure of I-A09 to obtain a highly potent and selective mPTPB inhibitor which also showed considerably good in vivo efficacy . Additional file 1 mentions benzofuran salicylic acid derived compound series so developed along with their IC50 values. We have used this compound series containing 18 compounds for building the 3D-QSAR model and to identify the molecular features essential for effective interaction between the inhibitors and the active cleft of the mPTPB enzyme. The model thus generated using the same series of representative inhibitors was then used to predict the activity of a large dataset of natural compounds. The compounds whose predicted biological activity was greater than the most potent inhibitor of the congeneric series were then analyzed using in silico docking studies to elucidate their mode of interaction with the mycobacterium phosphatase.
A data set consisting of 18 novel inhibitors of mPTPB derived from 6-hydroxy-benzofuran-5-carboxylic acid scaffold was taken from a previously reported study . These inhibitors were highly selective for mPTPB over all other PTPBs which were examined. The reported biological activity data (IC50 values in µM) for these inhibitors was converted into logarithmic scale (pIC50) to be used for QSAR study.
A molecular field was computed for a grid of points in space around the aligned molecules using Merck molecular force field. Descriptors representing hydrophobic, electrostatic and steric energies between the atoms of the aligned molecules and a methyl probe with +1 charge placed at each lattice point of the grid were computed. These molecular descriptors describe how each of the inhibitory molecules binds to the target in its active site. For the external validation of the model, the data set was divided into training and test set using the approach of random selection to avoid any kind of bias. The training set (75% of the total molecules in the data set) with known biological activity was used to generate the 3D QSAR model. The test set, compounds of which were not included for building the model, was used to challenge the generated model to assess its predictive effectiveness.
The model was generated using statistical method of principle component regression analysis (PCA) in conjunction with stepwise forward variable selection algorithm. pIC50 value was used as dependent variable and the descriptors as independent variables. Software generates a large number of molecular descriptors that can be used for the QSAR study. Because of this huge data, the choice of selection of appropriate descriptors having a considerable role in governing the biological activity of interest becomes difficult. Thus, the success of QSAR model greatly depends on the statistical method being employed for the model generation. PCA method is used when the number of molecular descriptors is much more than the number of observations in the system. It carefully excludes the group of variables with high internal correlation. It efficiently reduces the number of independent variables to be used in the QSAR model by removing all possible redundancy and limiting the variables with descriptor values to a smaller set of uncorrelated variables . Various parameters were set for the execution of stepwise principle component regression analysis. The cross correlation limit was set as 0.5, maximum number of variable in final equation as 2 (n/5, where n is number of compounds in training set), term selection criteria as r2, variance cut-off as 0 and scaling as auto scaling.
To establish a QSAR model two types of validations are required - internal and external. For internal validation leave-one-out cross validation method was used. In this method one observation was taken as validation data and the rest of the observations made up the training set. The coefficients of QSAR model were estimated using this new training set which were then used for predicting the activity of the test compound. The procedure was repeated until all the compounds had once served as a test compound. The predictive ability of the model was then assessed using the cross validated r2 and q2 . External validation was done by predicting the activities of the compounds of the test set which were not used for model generation.
A data set consisting of 1,69,109 natural compounds by 10 different suppliers was obtained from ZINC database  in SMILES format. The pIC50 values were predicted for these natural compounds using the generic prediction platform of VlifeMDS. The prediction was done based on the QSAR model generated using the congeneric series consisting of 18 mPTPB inhibitors. The most potent compound in this series had a pIC50 value of 1.42. So the natural compounds with predicted activity above this threshold were selected for further analysis as they could prove to be more potent and selective novel candidates to be used as mPTPB inhibitors.
The crystal structure of protein tyrosine phosphatase B of Mtb origin was obtained from Protein Data Bank [PDB ID: 1YWF] . The protein structure was pre-processed by removing water molecules and all non-bonded heteroatoms using Accelyrs Viewerlite 5.0 . This processed protein was further prepared using Schrödinger's protein preparation wizard . Hydrogen were added and optimized to the structure. In further preparation steps bad contacts were removed, bond lengths were optimized, disulfide bonds were created, protein terminals were capped and selenomethionine residues were converted to methionine. The missing residues were fixed manually. The natural compounds with predictive pIC50 values above 1.42 were prepared for docking studies to study their mode of interactions with mPTPB. LigPrep's ligand preparation protocol was used to prepare these natural compounds. It generated different tautomeric, stereochemical and ionization variants of the small molecules along with energy minimization and flexible filtering.
A grid was generated at the active site of the prepared protein structure using the Glide docking module of Schrödinger . The active site of PTPs lies in the P loop motif. CysX5Arg defines the consensus sequence of this loop. Catalytic Arg acts as a general acid in the reaction mechanism. Presence of histidine just before the active site cysteine makes it a better nucleophile. Therefore, residues His 160-Arg 166 constitute the active site of mPTPB . Prepared natural compounds were subjected to docking using Glide's extra precision docking protocol. The two top scoring compounds were investigated to study their molecular interactions with the protein molecule. The hydrophobic interactions and H-bonds were calculated using the Ligplot program . H-bonds were taken into consideration when the distance between acceptor-donor atoms was less than 3.3 Å, with maximum hydrogen-acceptor atom distance of 2.7Å and acceptor-H-donor angle greater than 90°.
Unicolumn statistics for the training and the test set.
Statistics of the significant model generated using SW-PCA.
Training Set Size (n)
Test Set Size
Degree of freedom
Statistical measures with their minimum recommended values.
Minimum recommended values
number of descriptors in a model (statistically n/5 descriptors in a model)
degree of freedom (n-k-1) (higher is better)
cross-validated r2 (>0.5)
Error term for q2
r2 for external test set (>0.5)
Error term for pred_r2
The positive coefficient of S_1214 indicated that positive steric potential is preferred in that region and hence substitution of bulky groups will result in increased activity of the compounds.
Hydrophobic field descriptor (H_1071) also had a positive coefficient which suggested that the presence of more hydrophobic groups in this region would enhance the activity of the inhibitors. Presence of charged or polar groups around that grid point is not preferred for effective inhibitor design. The model provided a 3D fingerprint of the compounds which helped in developing a relationship of physiochemical parameters with structure and biological activity, making it capable of predicting activities of novel compounds. Thus, the 3D QSAR model generated can be used for fishing out novel natural compounds with inhibitory activity against mPTPB.
List of natural chemical compounds with their pIC50 value predicted on the basis of the generated 3D QSAR model.
ZINC IDs of natural compounds
The second compound S-((3S,10R,13R)-10,13-dimethyl-17-octyl-2,3,4,7,8,9,10,11,12,13,14,15,16,17-tetradecahydro-1H-cyclopenta[a]phenanthren-3-yl) nonanethioate also showed good binding affinity for mPTPB. It had an activity value of 1.677. Arg63 was involved in hydrogen bond formation while residues participating in hydrophobic interactions were Phe80, Pro81, Leu83, Phe98, Tyr125, Met126, Phe133, Arg136, Phe161, Met206, Val231 and Leu227 and van der waal interactions were Ser57, Glu60, His94, Lys164, Asp165 and Arg166 (Figure 7c). For ease in writing, these two screened compounds have been henceforth referred to as ESA and DTP. It was observed that all the three compounds had almost similar orientation or docking conformation, with ligands docked at the same position and interacting with the residues of P loop motif which forms the active site of mPTPB (Figure 7d). But the interactions of ESA (XP docking score = -7.62 kcal/mole) and DTP (XP docking score = -7.59 kcal/mole) with the mycobacterium phosphatase were stronger in comparison to comp10 (XP docking score = -6.75 kcal/mole). ESA was occupying more space in the cavity and was involved in more hydrophobic interactions, indicating a much stronger binding. DHP also showed intense binding by formation of hydrogen bond and multiple hydrophobic and van der waal interactions with the residues of the same cavity where comp10 fits in. Hence we can strongly suggest that these two compounds can potentially inhibit mPTPB enzymatic activity.
A 3D QSAR model was generated for a congeneric series of 6-hydroxy-benzofuran-5-carboxylic acid derivatives having inhibitory activity against mPTPB. The model was generated using statistical method of principle component regression analysis in conjunction with stepwise variable selection method. The statistical measures r2, q2, F-test and standard error for the training set and the pred_r2 for the test set fulfilled the conditions for a model to be considered robust and predictive. The developed model was used to predict the activity values for a large set of natural compounds. The top scoring compounds were analyzed to find their mode of interactions with the mycobacterium phosphatase. We finally reported two natural compounds ESA and DTP which have high activity values of 1.459 and 1.677 respectively. They had a better affinity for mPTPB in comparison to the most potent compound of the congeneric series with pIC50 of 1.42, as observed from the docking score and the interaction pattern between these compounds and the mycobacterium protein. The present study provides substantial evidence for considering these natural compounds as prospective leads against tuberculosis having enhanced mycobacterium phosphatase inhibitory activity and low toxicity to human cells. Thus, 3D QSAR is an attractive discipline which not only provides graphical results that are often less attractive for scientific community but also has the ability to forecast the activity or potency of compounds being considered for inhibition of target protein. As QSAR approach already plays an important role in lead structure optimization, it is anticipated that it will soon become essential for handling large amount of data generated using combinatorial chemistry.
AG is thankful to Jawaharlal Nehru University for usage of all computational facilities. AG is grateful to Department of Science and Technology, Government of India. AG is also thankful to University Grants Commission, India for the Faculty Recharge Position.
AG would like to acknowledge financial support from Department of Science and Technology, Government of India towards publication of this article.
This article has been published as part of BMC Genomics Volume 15 Supplement 1, 2014: Selected articles from the Twelfth Asia Pacific Bioinformatics Conference (APBC 2014): Genomics. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcgenomics/supplements/15/S1.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.