Fragment based group QSAR and molecular dynamics mechanistic studies on arylthioindole derivatives targeting the α-β interfacial site of human tubulin

Background A number of microtubule disassembly blocking agents and inhibitors of tubulin polymerization have been elements of great interest in anti-cancer therapy, some of them even entering into the clinical trials. One such class of tubulin assembly inhibitors is of arylthioindole derivatives which results in effective microtubule disorganization responsible for cell apoptosis by interacting with the colchicine binding site of the β-unit of tubulin close to the interface with the α unit. We modelled the human tubulin β unit (chain D) protein and performed docking studies to elucidate the detailed binding mode of actions associated with their inhibition. The activity enhancing structural aspects were evaluated using a fragment-based Group QSAR (G-QSAR) model and was validated statistically to determine its robustness. A combinatorial library was generated keeping the arylthioindole moiety as the template and their activities were predicted. Results The G-QSAR model obtained was statistically significant with r2 value of 0.85, cross validated correlation coefficient q2 value of 0.71 and pred_r2 (r2 value for test set) value of 0.89. A high F test value of 65.76 suggests robustness of the model. Screening of the combinatorial library on the basis of predicted activity values yielded two compounds HPI (predicted pIC50 = 6.042) and MSI (predicted pIC50 = 6.001) whose interactions with the D chain of modelled human tubulin protein were evaluated in detail. A toxicity evaluation resulted in MSI being less toxic in comparison to HPI. Conclusions The study provides an insight into the crucial structural requirements and the necessary chemical substitutions required for the arylthioindole moiety to exhibit enhanced inhibitory activity against human tubulin. The two reported compounds HPI and MSI showed promising anti cancer activities and thus can be considered as potent leads against cancer. The toxicity evaluation of these compounds suggests that MSI is a promising therapeutic candidate. This study provided another stepping stone in the direction of evaluating tubulin inhibition and microtubule disassembly degeneration as viable targets for development of novel therapeutics against cancer.


Background
Tubulin inhibition has been considered a viable avenue for drug development in cancer management for a very long time [1][2][3][4]. It plays a major role in the formation of microtubule assembly. Microtubules are polar cytoskeletal filaments that either takes part in the formation of mitotic spindle and interphase networks or more complex formations like ciliary axoneme, centrioles and basal bodies [5][6][7]. Their disorientation, either through inhibition of tubulin polymerization or by blocking microtubule assembly, leads to metaphase arrest of cell division [8][9][10]. Colchicine, the most commonly used metaphase arrest agent, has a similar mode of action [11]. Apart from these, combretastatin A-4 and the Catharanthus alkaloids vincristine and vinblastine are also tubulin assembly inhibitors [12,13]. Combretastatin A-4 phosphate was found to block blood flow in capillaries supporting cancerous cells by completely disrupting the endothelial cell integrity leading to rapid tumor cell death [14,15]. These inhibitors lead to microtubule disorganization and renders cell apoptosis [16]. However, the use of these previously identified tubulin inhibitors were restricted on account of excessive toxicity, drug resistance and scarce bioavailability [17][18][19][20]. Thus, development of novel tubulin assembly inhibitors is the need of the hour.
A variety of chemical compounds are known to bind to the α-β interfacial site of tubulin. These chemical compounds can be grouped under following classes (a) group targeting the lumenal site of α subunit i.e. taxol (b) group stabilizing microtubule polymerization i.e. vinblastine and (c) group targeting α-β interfacial site of tubulin that leads to cell apoptosis i.e. colchicine [21][22][23]. We are interested in the colchicine-like group of compounds for cancer management. The present study is based on a class of tubulin-inhibitors known as arylthioindoles that bind to the colchicine-binding site of β-tubulin close to the interface with α-tubulin. All the previous studies undertaken in this respect involved Bos taurus tubulin protein assembly comprising of chains A, B, C, D and E of which A and C belong to the α unit and B and D belong to the β unit [24,25]. Many previously known tubulin inhibitors consisted of the indole nucleus in the core structure and hence are touted to be one of the most potent compounds against tubulin polymerization [17,26]. Arylthioindoles were also found to be potent inhibitors of the growth of MCF-7 human breast carcinoma cells [19].
Development of accurate and time effective drug discovery techniques is the need of the hour to propagate search for novel anti-tumorals. Exploiting one of the recent and innovative approaches known as fragment based group quantitative structure activity relationship (G-QSAR) [27], the relationship between different molecular fragments and their biological activity can be correlated and studied in detail giving site-specific clues for modification [28]. Such modifications in terms of substituents added or removed lead to activity enhancement. The knowledge of such modifications is based on various molecular descriptors calculated and used for G QSAR model construction. Various such studies have been reported and have proved to be very useful [29][30][31], many of them to discover cancer therapeutics [32]. These descriptors are calculated for various fragments defined by the user. The optimal subset of descriptors is chosen by any one of the variable selection methods which are most likely to describe all the physicochemical properties of the congeneric series required for their biological activity. Thus, it gives a better idea about which substitution site should be populated with which particular substituent for activity enhancement [33].
In this study, we search for tubulin inhibitors having a similar binding mode as that of colchicine at the α-β interfacial site. Arylthioindole moiety is known to be a potent anti-tubulin agent and has been studied very often for its anti-cancer properties but drug toxicity and less bioavailability were the problems encountered [34]. In order to exploit this avenue further, we created a robust, accurate and predictive G-QSAR model to enhance our understanding of arylthioindole derivatives as anti-cancer compounds in terms of structural requirements needed for drug development. Based on the G-QSAR model, we identified novel therapeutic compounds with improved tubulin assembly inhibition and potent anticancer activities. The compounds were validated for their interactive properties with the colchicine binding site of tubulin by docking analysis. The resultant top two compounds were also evaluated for their absorption, distribution, metabolism, excretion and toxicity (ADMET) properties.

Compound dataset for model development
In this study, a congeneric series of 42 tubulin inhibitors belonging to the arylthioindole class of compounds [20,35] were selected for G-QSAR model development. Due to higher root-mean-square-deviation (RMSD) values, 6 compounds (6b, 15, 20b, 24, 28 and 41b in Additional file 1) were rejected and the model was built using 36 arylthioindole derivatives. The 2D structures were drawn using Marwin Sketch [36]. They were converted to 3D by Vlife Engine platform of VLifeMDS and later energy minimized using the force field batch minimization utility with default parameters [37]. These optimized compounds were finally used for G-QSAR model development. The template is used as a structural moiety common to all the compounds of the congeneric dataset with the substitution sites marked by unknown (dummy) atoms. In this case there were six substitution sites, depicted by R1 to R6 in the template.

Computation of Molecular Descriptors
After selecting the congeneric set of molecules and the template for model development, physicochemical descriptors were calculated using the calculate descriptors dialog of the G-QSAR module in VLife MDS. The software provides a large number of molecular descriptors belonging to the 2-dimensional and 3-dimensional classes. In case of Group QSAR, 2D descriptors are selected. All descriptors were chosen except dipole moment, electrostatic, semi-empirical and hydrophobicity as they are 3D descriptors and Information Theory-based descriptors. In G-QSAR, a variety of descriptors along with alignment independent descriptors for the fragments were calculated. Of the total 1027 descriptors calculated for all the six substitution sites, 298 were selected and the invariable descriptors were removed. Invariable descriptors are those which have same quantitative value for each data-point and thus should be discarded for their inefficacy in G-QSAR model development.

G-QSAR model development
For data selection using the advanced data selection wizard, the training and test set compounds were chosen after selecting the activity pIC50 as dependent variable and all the calculated descriptors as independent variables. The test set including the compounds 10, 14, 18, 21, 22, 27b, 29, 30, 31, 35b (available in Additional file 1) was selected using random selection method with 80% compounds in the training set. After calculating the unicolumn statistics for the selected training and test set molecules, the stepwise-forward variable selection method along with PLS (Partial Least Square) [38] as the regression method for building the model was chosen through the variable selection and model building wizard. Keeping the crosscorrelation limit set at 0.5, Ftest In at 4.0, Ftest Out at 3.0, term selection criteria being r 2 , variance cut-off at 0.1 with auto-scaling, the model was built.

Model validation
Many statistical parameters like n (number of compounds in regression), k (number of variables), degree of freedom, optimum component (number of optimum PLS components in the model), r 2 (squared correlation coefficient), F-test (Fischer's value), q 2 (cross-validated correlation coefficient), pred_r 2 (r 2 for external test set), Z score (randomisation test), best_ran_q 2 (highest r 2 value in the randomisation test) and best_ran_r 2 (highest r 2 value in the randomisation test) need to be taken into account to consider the model as a robust one. For a model to be statistically significant, the following conditions should be satisfied: r 2 , q 2 > 0.6 and pred_r 2 > 0.5. Since, F-test gives an idea of the chances of failure of the model, a value greater than 30 is considered to be good. On the other hand, low standard error values establish absolute quality of the model.

Internal and external validation
For internal validation using leave-one-out method, the cross-validated coefficient, q 2 is calculated using the given equation: where y i and y i are the actual and predicted activities of the ith (i = 1-26 in Additional File 1) molecule in the training set, respectively, and y mean is the average activity of all the molecules in the training set.
For external validation, the pred_r 2 value that gives an account of the statistical correlation between predicted and actual activities of the test set compounds was calculated as follows: where y i and y i are the actual and predicted activities of the ith molecule (i = 27-36 in Additional File 1) in the test set, respectively, and y mean is the average activity of all the molecules in the training set.
To avoid the risk of chance correlation, Y randomisation test was carried out by comparing the resultant linear model with those derived from random data set [39]. Various models were built on random datasets generated by rearranging the molecules in the training set so as to compare them with the obtained G-QSAR model on the basis of Z-score. A Z-score value is calculated by the following formula: where h is the q 2 value calculated for the actual data set, μ is the average q 2 and σ is the standard deviation calculated for various models built on different random data sets.

Combinatorial library generation
A combinatorial library was generated using the LeadGrow module of VLife MDS using the template used before and a list of substituent chemical groups to be added to only three substitution sites R4, R5 and R6 based on the calculated descriptors. A library of 4257 compounds was created by substituting electronegative and electropositive atoms, alkyl groups, cyclic groups, aromatic rings and other bulky groups. Their biological activities were predicted using the G-QSAR model obtained.

Protein modelling and ligand docking
In order to investigate the extent of interactions involved between the most active compounds obtained from the combinatorial library, these compounds were docked to the homology modelled B-chain of tubulin protein using Schrodinger's Glide Module [40][41][42]. The homology model was obtained using Modeller 9.11 [43][44][45][46] taking Bos taurus tubulin B-chain [PDB ID: 1SA0] as the template [12]. The resultant homology model was validated using online PSVS suite [47] which comprises a number of software namely, DSSP [48,49], pdbStat (5.9), AutoAssign [50], RPF Analysis [51,52], PDB validation server, Verify3D (1.0) [53], ProsaII (2003) [54], PROCHECK (3.5.4) [55]. Along with these, Molprobity [56] programs and other software are also present. The tubulin homology model was optimized for docking using Schrodinger's protein preparation wizard [57]. A total of 48 compounds were docked targeting the α-β interfacial site of tubulin heterodimer. Glide works by creating a cubic grid (20 Å side) around the user-specified critical residues and directing the approaching ligand at the specific site. Extra-precision (XP) docking was used to screen compounds with high binding affinity for tubulin. XP docking serves the purpose of correlating good poses with good scores and discarding the false positives [58]. Screening of compounds by docking for potential candidature as therapeutics has been a popular avenue in computational drug design [59][60][61]. The various interactions involved between the highly active compounds were evaluated using Ligplot [62].

Molecular Dynamics simulations of the modelled protein and docked complexes
To obtain energetically stable conformation of the modelled protein target and to get an insight into the stability of protein-ligand complexes, molecular dynamic simulations were carried out using Desmond Molecular Dynamics module [63][64][65] of Schrodinger Maestro by applying optimized Potentials for Liquid Simulations (OPLS) all-atom force field 2005. Prepared protein-ligand complexes were solvated with TI4P water model in a triclinic periodic boundary box for MD simulations. To prevent the direct interaction of protein complex with its own periodic image a boundary box is created and distance between protein complex and box wall is kept at 10 Å. Steepest descent method was used to minimize the energy of the prepared structures for a maximum of 5000 steps till a threshold of 25 kcal/mol/Å is achieved, which was then followed by Low-memory Broyden-Fletcher-Goldfarb-Shanno quasi-Newtonian minimizer until a convergence threshold of 1 kcal/mol/Å was met. Other parameters were kept as default for system equilibration. MD simulations were carried out for 10 ns at a constant temperature of 300 K, pressure 1 atm and at time step of 2 femtoseconds (fs). Long range electrostatic interactions were calculated using smooth particle mesh Ewald method [66] which was occurring during the MD simulations and coulombic short range interaction was calculated using a cut-off scheme, with a cut-off radius of 9 Å for calculation. The protein ligand complex was prepared for MD simulations using the above mentioned parameters. MD Simulation was then carried for a time period of 10 ns.
The root mean square deviation values (RMSD) for two top scoring ligands were calculated for the entire simulations trajectory with reference to their respective first frames. Radius of gyration (ROG) analyses was carried out for all the frames MD simulation of IkB kinase beta and ligand complex.

Calculation of ADMET properties
The toxicity of final two compounds was evaluated using the Quikprop module [67]of Schrodinger suite which predicts various molecular properties and also provides ranges for comparing these properties with 95% of already available drugs. The analysis was followed by a toxicity analysis using an online webserver, admetSAR [68] which reads the smiles format and results in a number of ADMET values. AdmetSAR is a knowledge based tool comprising of ADMET related properties taken from large literatures which are further used to predict properties of unknown compounds.

G-QSAR model developed for arylthioindole derivatives targeting a-b interfacial site of tubulin
We report a fragment based group QSAR model based on a congeneric series of arylthioindole moiety as the template targeting the α-β interfacial site of tubulin. Colchicine is also known to bind at the same site, though arylthioindoles were found to have much higher activity than colchicine against tubulin polymerization. Inhibition of tubulin polymerization can be a potent deterrent to cell division and hence can be seen as an avenue in developing cancer therapeutics. The tubulin D chain (β unit) structure was modelled based on the close homologous protein belonging to Bos taurus.
The G-QSAR model obtained can be represented by the following linear equation: where pIC50 is the negative logarithm of IC 50 activity values and was taken as the dependent variable in the model development. A total of 6 substitution sites were marked on the arylthioindole moiety which became the basis of fragmenting the derivatives for model development, of which 3 sites namely, R4, R5 and R6 were taken into account. All 2D molecular descriptors are calculated for each fragment generated. The five molecular descriptors selected on which the model is based are H Acceptor Count for the substitution site R4, slogp for the site R5 and chi2, NitrogensCount and Molecular Weight for the site R6. The multiplied numerical terms associated with the descriptors are the respective coefficients and the last numerical term is the regression constant. Using the model generated, if the values of these descriptors are known for novel compounds, the biological activity in terms of pIC50 can be determined. Out of these, only R4-H Acceptor Count and R6-Molecular Weight are positively contributing to the biological activity and the rest are negatively contributing. To determine the crucial structural features required for good activity, these descriptors are needed.
R4-H Acceptor Count: This descriptor signifies the number of hydrogen bond acceptor atom and corresponds to the R4 site which was originally substituted by either hydrogen atom or oxymethyl (-OCH3) group in the series taken for model development. As it is a positive contributor, the presence of hydrogen bond acceptor atoms renders high biological activity to the molecule. The percentage of its contribution in enhancing the biological activity was 33.877 %.
R6-chi2: In theoretical terms, this descriptor stands for a retention index of second order which is derived directly from gradient retention times. The retention index of a chemical compound is the normalised retention time which is done to convert these into system independent constants in chromatographic analysis. In chromatography, retention time is the time required for a solute to migrate or elute from the column and this property depends upon the physical properties and behaviour of the molecule. The descriptor was calculated for the substitution site R6 present on the indole core and was mainly substituted by groups like hydrogen atom, halogens like Cl, I, Br, F and other groups like NH2, NO2, CH3, OCH3, OCH2CH3, OCH(CH) 3 etc. This descriptor contributed negatively to the biological activity to an extent of 23.018 %.
R5-slogp: This descriptor also contributed negatively to the biological activity up to an extent of 17.542% and corresponds to the substitution site R5. Slogp describes the value of log of octanol/water partition coefficient (including implicit hydrogens). It is an atomic property model and calculates logP value from the correct protonation state structure. The R5 site was mainly substituted with hydrogen atoms or oxymethyl (-OCH3) groups.
R6-NitrogensCount: As the name suggests, this descriptor stands for the number of nitrogen atoms present in the compound. This physicochemical descriptor contributes negatively to the activity of arylthioindole derivatives by 15.952%. This suggests that a highly electronegative group, in this case, specifically nitrogen is deterrent for the molecule's activity if present at the R6 position. It should avoid a substitution involving nitrogen atoms.
R6-Molecular Weight: This descriptor signifies molecular weight of the compound and is positively contributing in the biological activities by 9.611%. This indicates the importance of substituents with higher molecular weight present at the R6 substitution site.

Model evaluation and validation of G-QSAR model developed for arylthioindole derived compounds Unicolumn statistics for the chosen training and test set
For model evaluation and validation the complete dataset is divided into training set and test set. The developed model is validated by predicting the inhibitory activity (in terms of pIC 50 ) of the training set (known as internal validation) and the test set (known as external validation). The test set chosen can always be evaluated beforehand using the unicolumn statistics ( Table 1). The unicolumn statistics can be interpreted in terms of the maximum and minimum of training and test set. The min of test set should be equal or more than the min of training set and the max of the test set should be equal or less than the max of training set. This data is in absolute compliance with the conditions mentioned and shows that the test set is interpolative (derived within the min-max range of the training set). The relative difference of mean and point density distribution (along mean) of the two sets can be derived from the average and standard deviation. In this case, as the average value in the test set is slightly higher than the training set, the presence of relatively more active molecules as compared to the inactive ones is indicated. Also, a higher standard deviation for the training set indicates that training set constitutes widely distributed activity of the molecules as compared to the test set.

Validation of the final G-QSAR model
The G-QSAR model is evaluated on the basis of certain statistical parameters for both internal and external validation. The number of compounds in the training set was specified by N which is 26. Considering the correlation coefficient, r 2 (0.8512), cross-validated correlation coefficient q 2 (0.7175), pred_r 2 (0.8968), low standard error value, r 2 _se (0.1363), q 2 _se (0.1878) and pred_r 2 _se (0.1127), the model can be stated to be a robust one. Along with this, a high F-test value (65.7699) implied that the model is 99.999 % statistically valid with less than 1 in 10000 chance of failure.
Also with this, the randomization test shows confidence of 100% (Alpha Rand r 2 =0.00000) that the generated model is not random and hence is chosen as the G-QSAR model. Other important statistical parameters have been determined and Z-score is highlighted to emphasize its importance in QSAR model validation ( Table 2). The values of selected descriptors for each compound in the dataset have been provided (Additional file 2). The Z score gives an idea about how far away is the observed value from the mean. A Zscore_r 2 of 6.48008, Zscore_q 2 of 4.73992 and Zscore_pred_r 2 of 1.70900 statistically validates the significance of the obtained G-QSAR model. In order to get a better idea of the validity of the model, p values were determined for each correlation coefficient using the corresponding Z scores. The P value_r2 and P value_q2 of less than 0.0001 tells that the hypothesis is statistically significant. The P value_pred_r2 of 0.0875 is although, not quite statistically significant.
The robustness of the model is better understood through the linear graphical representation between actual and predicted activities of the 36 compounds along with the contribution plot for each descriptor (Figure 1). The linear graphical representation shows the extent of variation between the actual and predicted activities of the congeneric set. The larger the distance of training and test set points from the regression line, more is the difference between the actual and the predicted activity values. The contribution of each descriptor specifies the properties that should be present in the drug lead for enhancing its inhibitory activity. Presence of descriptors with positive contribution increases its inhibitory activity while descriptors with negative contribution decrease the same. Radar plots for training and test sets are given (Figure 2). The radar graphs depict the difference in the actual and predicted activities for the training and the test sets separately by the extent of overlap between blue (actual activity) and red (predicted activity) lines. The radar plot for training set represents a good r 2 value if the two lines show a good overlap while for the test set a good overlap represents high pred_r 2 value.
Activity prediction of combinatorial library generated using arylthioindole moiety as template A combinatorial library was constructed resulting in 4257 compounds after substituting the R4, R5 and R6 sites with various chemical groups for which the descriptors were calculated. The three sites R4, R5 and R6 were populated by single atoms like N, C, O and halogens like Cl, F, Br and I, by alkyl groups and aromatic rings like phenoxy, phenyl, furan, pyrrole, pyridine, imidazole, 2-thiophene etc along with alkenes, acids, aromatic rings, aliphatic rings and other groups such as -O-CH3, -O-C2H5, amide, cyanide, cyanate, isocyanate, -C=N, -N=C, azo, hydrazo, benz etc were also added. The biological activities of these compounds were predicted using the G-QSAR model. 48 arylthioindole derivatives were further selected to obtain mechanistic insights into their inhibitory properties based on high predicted activity scores and extrapolation values. Ideally, the compounds with extrapolation values being zero or close to zero are considered good for further analysis. The compound with the highest predicted activity score of 6.047, temp0592 was substituted with an electronegative H-bond acceptor NH2 group at the R4 site, a hydrogen atom at the R5 site and a sulphur atom at the R6 site which has a molecular weight of 32.065. It is heavier than CH2OH having a molecular weight of 31.021 which was substituted at the same site in temp0598 and has lower activity value of 6.042 as R6-Molecular Weight is a positively contributing descriptor. Similarly, temp0596 goes further down on activity value as R6 site is substituted by the ethyl group and hence having lesser molecular weight. The presence of H bond Acceptor at the R4 site enhances the inhibitory activity of arylthioindole derivatives as in temp0592, temp1656, temp2454 and many more. All these observations can be summed up as the presence of an electronegative H bond acceptor at the R4 site and a high molecular weight group  at the R6 site enhances the activity of arylthioindoles. But an increased slogp value at the R5 site, increased retention index chi2 and nitrogen atom count at the R6 site decreases the biological activity. These observations establish the importance of such substituents at these sites to enhance the tubulin inhibitory activity of arylthioindole derivatives. The predicted activity values with respect to the three substitution sites R4, R5 and R6 have been provided for 20 top scoring compounds ( Table 3).

Evaluation of screened compounds docked against a-b interfacial site of tubulin heterodimer 1. Validation of homology based model of human tubulin
The homology based protein structure of human tubulin was validated using PSVS validation suite. The conformational shifts in the protein model encountered after molecular dynamic simulations has been depicted ( Figure 3A). After obtaining the MD simulated structure, Ramchandran plot analysis was carried out to find the percentage of amino acid residues falling in the sterically allowed and disallowed regions. According to PROCHECK analysis, 90.4 % residues fall under the most favoured regions and no residues in the disallowed regions. Of the 431 residues, 34 were glycine and 20 were proline and were analysed separately. A model having over 90% residues falling in the favoured region can be considered to be good one on account of the analysis of 118 structures of resolution 2.0 Å. Thus, the human tubulin model is a good one and can be considered for further analysis. The Ramchandran plot analysis using PROCHECK has been depicted ( Figure 3B). According Richardson's Lab molprobity analysis, 97 % of the residues fall under the favoured regions and 0.2 % under the disallowed regions. Just one outlier ASN347 was encountered. As Molprobity analyses Glycine, Proline and pre-proline separately owing to basic physicochemical differences in comparison to other amino acids, 4 different Ramchandran plots are given ( Figure 3C). The average PROCHECK G factor value for phi-psi angles was -0.15 and for all dihedral angles was -0.23. The G score is the log odds score of observed distribution of torsion angles and covalent geometries and quantifies the goodness of the structure. The high the value of G score, better is that dihedral angle in the structure in terms of falling under the favourable regions in a Ramchandran plot. As the average G score is negative, high number of residues was seen having distortions in dihedral angles.

Screening of combinatorial library through docking and evaluation of the top scoring compounds
The 48 compounds having high predicted activity values chosen from the combinatorial library were docked against the tubulin assembly targeting the α-β interfacial  site of the homology based protein structure of tubulin chain B. The top two compounds showing maximum affinity for the colchicine binding site were selected. Both the target-ligand complexes obtained were MD simulated and their interactions observed. The first compound with high affinity for tubulin, temp1662 named 5-(hydroxymethyl)-3-{[3-(hydroxymethyl) phenyl] sulfanyl}-H-indol-2-ol (HPI) ( Figure 4A) was involved in both hydrogen and hydrophobic interactions. The O atom of Pro243 formed a 2.72 Å long hydrogen atom with O1 atom of HPI and the rest others like Phe242, Gly244, Gln245, Leu246, Met323, Ala352, Val353, Cys354 and Thr351 were found to be involved in hydrophobic interactions ( Figure 5A). After performing MD simulations in two slots of 10ns each, the stable average structure was obtained from 10 to 20ns. New interaction patterns could be seen post simulation with a 3.18 Å long H bond formed between the S atom of HPI and N atom of Gln245 which was involved in hydrophobic interaction before MD simulations. HPI seemed to have shifted from the earlier position owing to new hydrophobic interactions being formed including, Asn247, Pro243, Thr238 and Cys239. Rest all interactions were conserved with Leu246, gly244, Met323, Thr351, Ala352, Val353 and Cys354 ( Figure 5B). The second compound temp3789 named 3-[(3-methoxyphenyl) sulfanyl]-5-methyl-1H-indol-2-ol (MSI) ( Figure 4B) was also found to be involved in both hydrogen bonds as well as hydrophobic interactions with the colchicine binding site in the modelled human tubulin protein. Leu246 and Thr351 were involved in forming hydrogen bonds of distances 2.97 and 3.29 Å respectively with MSI and Gly244, Gln245, Met323, Lys350, Ala352 and Val353 were involved in hydrophobic bonding. The hydrogen bonds were formed between Nitrogen atom of Lys246 with the O1 atom and Nitrogen atom of Thr351 with O2 atom of MSI ( Figure 5C). After MD simulations of 10 ns, the interactions were seen to be reduced to hydrophobic interaction with Met323, Gln245 and Val353 ( Figure 5D). The RMSD graphs for depicting the course of three MD simulations i.e. tubulin homology model ( Figure 6A), ligand complexes with HPI ( Figure 6B) and MSI ( Figure 6C) have been given. When compared to the interactions involved between the most active arylthioindole derivative and tubulin, similar mode of action was discovered forming a 2.86Å long H-bond between N atom of Lys246 with  O1 atom of the reference compound. Hydrophobic interactions were found to exist involving a few but same residues as those involved with HPI and MSI including Gly244, Gln245, Lys350, Ala352 and Val353 (Table 4). Thus, the two reported compounds HPI and MSI bind to the same colchicine binding site at the α-β interfacial cavity of the tubulin assembly and show good binding affinity (Figure 7). The predicted activities of HPI and MSI were 6.042 and 6.001 based on the G-QSAR model and thus, are in absolute concurrence with the docking results in terms of high binding affinity with the tubulin assembly having similar mode of action as that of colchicine but with much higher biological activity than the most active compound of the congeneric series taken for this study.

ADMET analysis of the two top compounds, HPI & MSI
A total of 50 ADME properties were calculated using the Quikprop module. The first property #star depicts the number of properties that fall out of the similarity criteria of 95% of known drugs. The values for HPI and MSI were 0 and 1 respectively and signify that no molecular property of HPI falls out of the similarity criteria while only one criterion falls out for MSI. Another property CNS (central nervous system activity) ranging from -2 (inactive) to 2 (active) resulted in a value of -2 for HPI and zero for MSI. QPlogBB is the predicted brain/blood partition coefficient which had a value of -1.416 and -0.218 for HPI and MSI respectively. Descriptors including SASA (solvent accessible surface area), its hydrophobic component FOSA (saturated carbon and attached hydrogen) and hydrophilic component FISA (N, O and H on heteroatoms) were also predicted within the recommended ranges for both compounds. The compound HPI had a value of 81.27% and MSI had a value of 100% for percent human oral absorption descriptor. The predicted skin permeability factor QPlogkp resulted in a value of -3.342 and -1.635 for the two compounds HPI and MSI respectively. Another set of descriptors including QPlogPC16 (Free energy of solvation in hexadecane), QPlogPoct (Free energy of solvation in octanol), QPlogPw (Free energy of solvation in water) give the distribution of compounds in the body. QPlogPo/w (Predicted octanol/water partition coefficient) is the partition coefficient that gives an idea of the hydrophobic nature of the chemical compounds and resulted in high values of 2.244 and 4.197 for MSI and HPI respectively suggesting their easy absorption through the lipid bilayer. It also finds out the number of various chemical groups present in the test compounds for e.g. amide, acid etc. The ionization potential and electron affinities (in electron Volts) for both the compounds were also obtained being 8.489 and 0.349 for HPI and 8.504 and 0.392 for MSI respectively. Lastly, it also evaluated the two compounds on the basis of Lipinski Rule-of-five and Jorgensen Ruleof-three violations which resulted in zero violations for both, HPI and MSI.
Certain other ADMET properties calculated by the online server admetSAR have also been evaluated. The absorption factors for HPI can be summarised in terms of  The graphical depiction of RMSD trajectory of (a) homology based tubulin model (from 0-10 ns) (b) target-HPI complex (from 10-20 ns) and (c) target-MSI complex (from 0-10 ns). blood brain barrier (BBB), human intestinal absorption (HIA) and CaCo2 permeability and resulted in a positive value (value above the prescribed threshold suggesting good permeability) with high probabilities. A compound with >30% HIA is absorbed easily. Similarly, CaCo2 permeability is taken as the in vitro model of human small intestinal mucosa as it provides a physical and biochemical barrier to ions and small molecules and hence must be tested for orally administered drugs. Under the metabolism category, HPI was found to be a non-substrate for CYP450 2C9 and 2D6, substrate for CYP450 3A4, inhibitor for 1A2 and 2C9 and non-inhibitor for 2C19 and 3A4 and thus, having a high cytochrome inhibitory promiscuity. Under the toxicity category, it was found to be non-carcinogenic but toxic for Fish, Tetrahymena and Honey Bee.
Similarly, MSI was also evaluated which resulted in positive BBB and HIA permeability but no CaCo2 permeability. Under the metabolism category, it was found to be a non-substrate for CYP450 2C9, 2D6 and 3A4, an inhibitor of 1A2, 2C9 and 2C19 and non-inhibitor of CYP450 2D6 and 3A4 and thus, showed high cytochrome inhibitory promiscuity like HPI. Under the toxicity category, it was found to be non toxic for AMES test thus a non-carcinogen while having high toxicity for fish and honey bee but no toxicity for Tetrahymena. Thus, MSI can be selected for further evaluation as the drug candidate having low toxicity values as predicted.

Conclusions
In this study, we reported a novel fragment-based group QSAR approach exploiting the disintegration of tubulin assembly leading to interruption in cell division and eventually apoptosis as a potent avenue in cancer management. Tubulin heterodimers when assembled form the major building blocks of microtubules and hence have important functional role in maintaining cell structural integrity, motility and division in terms of spindle fibre formation. Inhibition of the tubulin formation is seen as a viable option for deterring the growth of cancerous cells. The arylthioindole moiety has been considered to be a potential anti-tubulin compound having similar mode of action as other agents involved in cell division arrest like colchicine and combretastatin. A G-QSAR model can provide significant results in terms of the crucial chemical fragments required for enhancing the activity of an already present template compound known to be inhibitory to our target of interest. A statistically robust G-QSAR model with the test set comprising 10 compounds out of a total of 36 compounds gave an insight into the contribution of various substitutions at the three sites R4, R5 and R6 for which 5 descriptors were calculated namely, R4-H AcceptorCount, R6-chi2, R5-slogp, R6-NitrogensCount, R6-Molecular Weight. A combinatorial library was created by varying substituents at these sites and their activity was predicted based on the model. All compounds having an electronegative H-bond acceptor at the R4 site and a substituent with high molecular weight at the R6 site were found to have high activities (pIC50) while the presence of nitrogen  atom at R6 site resulted in decrease of biological activity of the arylthioindole derivatives. All highly active compounds were docked at the colchicine binding α-β interfacial site of the homology-based human tubulin chain B. After final analysis of the binding affinities and interactions involved, we reported two compounds, HPI and MSI with predicted activities of 6.042 and 6.001respectively in terms of pIC50. Out of the two, the second compound MSI promises to be the first choice for further evaluation as the drug candidate, being less toxic in comparison to HPI. The present study elucidated the crucial structural requirements to enhance the anti-tubulin activity of arylthioindole derivatives. The study provides a better understanding of the inhibitory roles of the identified compounds in microtubule disassembly and paves way for consideration of these compounds as potent anticancer leads.